Switch Statement

033: The Design of Everyday Things Ch 5: It's not your Fault

April 28, 2023 Matthew Keller Season 3 Episode 8
Switch Statement
033: The Design of Everyday Things Ch 5: It's not your Fault
Transcript
Matt:

Hello everyone And welcome to the switch statement podcast It's a podcast for investigations into miscellaneous tech topics

Jon:

This is episode eight in our series on the Design of Everyday Things by Don Nor.

Matt:

hey John, how are you doing?

Jon:

Hey Matt. I'm doing really well. how are you doing?

Matt:

I'm not doing great. So

Jon:

Oh, no.

Matt:

this morning I got an alert on my phone at seven 15 that we had this recording session. And I was at my girlfriend's place, which is about a 30 minute drive away.

Jon:

Oh, dude. And you were, you're clearly at your place right now, so

Matt:

I am at my place, uh, now, so I, I ran, no, uh, but um, it was, it was a major error. I, I might say,

Jon:

you did it on purpose to provide material for this session.

Matt:

I wish. That were the case. I wish that were true. And now this re raises a whole other philosophical question. Is it possible to make a mistake on purpose?

Jon:

That's what I was just about to say because he, it seems like the author would argue that mistakes are impossible. It's design problems.

Matt:

There's only design problems he does. He does touch on on this. Or like people are truly at fault.

Jon:

Oh yeah, yeah. Um, but, but I was gonna ask you, was there a design problem that contributed. To your mistake, like maybe Calendar could have told you last night like, Hey, I think you're about to forget.

Matt:

yeah, my, uh, mental framework does not allow me to assign blamed to anyone but myself for anything.

Jon:

You must be from Michigan.

Matt:

I, I think I secretly am from Michigan. I have never been, well, I've been to Michigan once, but there's some Michigan in my blood. Just to take a step back, as you may have, uh, gathered, this whole chapter is all about errors and why they happen as a result of a. And how to, how to minimize them. So,

Jon:

Yeah, the title, is this, the title of the chapter? Human Error Question mark. No Bad Design? Is that

Matt:

yes, that is the, that is the title of the chapter. Um,

Jon:

in my notes. I didn't know if that was just me editorializing. The first thing that struck me in this chapter, and I know I'm an engineer, I am biased, but the author is, is very biased in the direction of user error being less common than we think it is, which I, which I think is a fair stance to take. But I would also push back argue the contrarian side, which. If you are designing something that is even remotely non-trivial, you're always going to reach a point where you have to make compromises, design compromises, and a lot of those compromises will affect, the user experience. Uh, and it's, it's usually, you know, because of the common trade-offs that we're all used to, You know, you only have four engineers to work on this, so they can only produce so much. So you just need to decide like, oh, we're gonna cut that really amazing signifier that might, you know, prevent some mistake. um, so anyway, I know like I don't wanna beat a dead horse. We've like discussed this exact thing before, but I just wanted to preface this entire chapter with my engineering opinion.

Matt:

Well, just to put numbers to his opinion. He says he thinks roughly one to 5% of errors are actually human errors.

Jon:

Yeah,

Matt:

That just strikes me as low.

Jon:

agreed.

Matt:

They, you know, he has this other, uh, this other quote, which is that the estimates range between 75 and 95%, and that sounds way too high to me. Uh, anything less than 20%, I think, uh, sounds, sounds a little low, uh, for human error

Jon:

we're just not gonna be able to design like completely perfect objects and software. Like, it's just we that can't be reality. So

Matt:

there. It, it's always about, it's all about trade offs and it's

Jon:

yeah, you're right.

Matt:

If you are, like for one example, if you're putting guardrails so that people don't, don't do something dumb. Maybe that's dismissive. But if you're putting in guardrails so people don't delete system 32 on their machine, that might be annoying to a super user who's like, I need to delete system 32 right now. I don't know for what reason. If they understand, and this is something Mac Os does a ton, is they have all of these guardrails in place and, I think that Don Norman would approve of these things. It's like, yeah, for, for non super tech savvy users, they reduce the likelihood of making a mistake. But then for people who want to do something very specific or precise with the. It, it actually like makes the user experience worse. I think

Jon:

Yeah. yeah. actually, and I feel like this is something that Don Norman doesn't talk about. You know, like o operator expertise, reaching the point where all of this cushioning is just annoying. And, and this is something I discuss with my peers all the time at my job because a lot of what I do is, is talk about like, you know, new processes that can potentially be added to like our entire team, which is like 150 people or so. And we are very hesitant to add any new process. Even if we think there could be some benefit to the process, we are super hesitant to introduce it because it's just this additional hurdle that like every single person is gonna have to jump over.

Matt:

Oh

Jon:

it's, so it's, it's let's just trade offs. Like you say, like every single decision you make has trade offs.

Matt:

One of the things that I like, That he does is he keeps on coming back to this action framework where there's the layers of processing. And we've talked about this in a previous chapter where there's kind of this, this planning layer. I, I should, uh, should, uh, make sure I'm making, using the right terms here.

Jon:

This is the diagram where it's like, A planning layer and then there's like three boxes, which was basically like formulate a plan, I can't remember what the exact words are, and then there's like an execution phase and then there's sort of the three boxes going up where you'd like perceive what you just did

Matt:

Yeah, so,

Jon:

right thing.

Matt:

so the, the. Layers are kind of like you, you've got the plan layer, the specify layer, and the perform layer when you're kind of performing in action. Um, and he broadly breaks errors down to two subcategories. There are mistakes and there are slips.

Jon:

So a slip would be that you made the right plan, but you did the wrong thing Like you said, I'm gonna make a coffee this morning, but you accidentally put your coffee cup back in the fridge instead of putting the milk back in the fridge. Like that's a slip. Whereas a mistake on the other hand is you make the wrong plan. You might perform the actions correctly, but your plan was wrong to begin with, so you wind up doing the wrong, you know, ultimately achieving the wrong.

Matt:

Which this one feels a lot fuzzier to me because. Let's say you have a misunderstanding of the state that the system is in. If you make a plan that's based off of incorrect information, is that the plan being wrong?

Jon:

Yeah, I guess it would be. Yeah. So that that would be a mistake.

Matt:

Right? And so, so basically, Having an incorrect kind of world model that you use to construct a plan is kind of in like, cuz I guess my point is like, let's assume that world model is true. Like maybe the plan is great, you know what I mean? But like, um, you know, the, the, the information on which the plant is based is, is incorrect. So that's, maybe I'm splitting hairs here, but.

Jon:

No, no. It's an interesting distinction. I don't think he talks about it because I feel like in the book it's suggested that you know, you, you made the wrong plan, and they don't really give the reasoning for why you made the wrong plan.

Matt:

Right.

Jon:

if you misunderstand the state of the world and make the wrong plan versus the state of the world that's perceivable to you, leads you to making the wrong plan, those would both be mistakes. They're subtly different, but both mistakes I feel like in, in the postmortem. The ladder looks better for you. Where if you say, oh, this tooling showed this value, therefore the tooling was providing me the incorrect result, which caused me to create the wrong plan. You know, that's just, uh, slightly,

Matt:

I almost don't think people that would ever talk about, uh, that first kind where, where basically it's like, well, everything Joe saw was correct.

Jon:

Yeah.

Matt:

just, he just decided to do something absolutely bonkers. Like, you're never gonna like, see that in a, in a postmortem or it's it's gonna be like, Joe did X, Y, and Z and like that caused this other thing. We're not gonna be like, it was insane that Joe decided to do X, Y, and Z. Um,

Jon:

which is important, like we've talked about this before, but blame blameless culture is really, really significant in our industry. And it's also really important to this whole exercise because, Joe did do something insane. But there might be problems you can fix, and this almost goes back to the whole guardrail. Where maybe there's a guardrail you can add, maybe there's like a modal that you should pop up that says you're about to delete production. Do you want to continue And you have to click yes no. Um, and yeah, that might annoy a power user, but it might prevent another postmortem. So, uh, yeah, it's an interesting, it's all trade-offs.

Matt:

It's funny cuz I do have this cognitive dissonance where I do believe in those, those checks, but I think they can be at different levels. Like, I think a good example of this is the rename a file extension warning in Makos. I just don't believe that I would ever like accidentally. Touch the file extension. Unless I, you know, I really wanted to, um, also, like, it's the kind of thing where it's like, I also feel like the worst case scenario there for me is like not that bad. So it's like, even if I do do that wrong, it's like I have faith in myself that I can like get myself out of that, that kind of thing. So that's a, that's a kind of like warning. That is an irritant to me. But like I do, I do believe in kind of things like checklist manifesto, where,

Jon:

Oh, love checklists.

Matt:

It's so funny because there's, there's this tiny little bit of structure in, uh, you know, at, at our company, which is, there's a default, uh, poll request if you're not familiar, if you use GitHub for your code, um, when you know, whenever you're making a change. You create a poll request and, and there's this default description that gets put in there. And in that default description, there's a section that says tests. And there's like one default bullet point that gets put in there to oh, did you check this thing? And it's like, First off, just the fact that there's a section that says tests has, has saved me so many times. it's like even when you have a very short checklist, like I feel like it can, it, it can be very useful.

Jon:

very useful. very powerful. Yeah. Even for processes where you know what you're doing, because if there's a process that's more than three steps. I will forget a step. Like it just, there's just no doubt in my mind that I'll just miss a step. And that's why I use checklists all the time. I love them. Um, like we, we do these things called production reviews, and I literally, like, we wrote up this whole document on like, kind of how to do a production review. There's all these things you need to look at, like monitoring and alerting and all those things. And I literally created this, this checklist that people can just copy. Into like, you know, whatever, a dock or something and just go through it. And I think it's just makes the whole process way easier. I wanted to mention something that you, you brought up earlier, which is it's all these cushions that we add to software, you know, guardrails to prevent people from accidentally doing the wrong thing. And he mentions this thing in the book called Deliberate Violations.

Matt:

Hmm.

Jon:

Which I just wanted to talk about cause I wanted to get your take on this. Like, I feel like I see this all the time where, you know, engineers design some amazing, like let's say permission system or like user and group system. Like I would, I would say Linux and Unix is an example of this. Where it's an extremely powerful system and you can set these fine grain permissions on all these facets of your software. And if you understand it, it's very powerful. But what I see a lot of engineers do is just running pseudo all over the place, not like setting proper, groups on things and, running commands as like a group that should be doing that. Just running pseudo all over the place. Which, you know, there's, there are times where you do need to run pseudo. I think it's a case where, you know, a p a A guardrail was designed in such a way that it's hard to understand, so people are just constantly doing deliberate violations.

Matt:

Man, you've just awoken this dormant memory of mine from my first internship at Google. I have no idea what I was trying to do, I was being restricted from doing it. And then I think I did something like Pseudo Sue, which is even like this next level of like, okay, just give me everything I want. I want it all, because pseudo Sue gives you a root shell, uh, so you can just sue. You can just get yourself into all sorts of hilarious, uh, just shenanigans. So, um, but I, I told my nearby, like wise elder, like, oh yeah, yeah, I solved this problem. I just did like su oh, Sue and like, whatever. And then he's like, are you kidding? Like, you just, you just like walked up and just like, all right, boom. but all of my experience was on my own machine up until that point. And I think this exactly gets back to your point where you're saying like, you haven't even internalized it as something you're not supposed to do because everyone was doing it all of the time. Like in university, you're just constantly being like, pseudo blah, pseudo blah, pseudo blah, and I think that is a failure of design in the system because it's like if you constantly have to do pseudo to do things, like something is weird, like that should be a very rare thing that you're doing and I think, I mean, I'm not using Linux much these days, but if I don't feel like I have to run that as much. But maybe that's just because I'm, I'm trying to, you know, we are trying to learn how to set it up so that you just have the permissions to do what you're supposed to be able to.

Jon:

Yeah, I, I've seen this be true for other things too. This goes back to the whole trade-offs discussion. Like if you're designing a guardrail, you need to realize that if you design it poorly, people are gonna start skipping that guardrail. And they'll reach a point where it's kind of like, boy who cried wolf. They'll just completely ignore the guardrail. And in, in cases where that guardrail is, you know, is there for a, a good reason like someone's about to. delete production, they'll just ignore it and they will delete production. So yeah, it, it's, it's funny because you might think the right idea is to just put all these guardrails in place, but that can backfire in pretty sinister ways.

Matt:

Well, right, and it's funny that you sinister because I was gonna say, there's actually even a more sinister implementation of that, which is. It's almost like deliberate, deliberate violations, I guess meta deliberate violations where management is like, no one should be doing that. Like people should not be doing that, but like we kind of need them to do that. So what we're gonna do is we're gonna put a policy in place that you're not allowed to do that, and then put this like weakest guardrail up in front of it that is just so easy to bypass.

Jon:

Yeah.

Matt:

So that there's this kind of facade of legal protection something bad goes wrong. And then they can be like, all right, Jim Stevens bypass the guardrail and, uh, we're gonna fire him. You know? And, he gets into this where it's like when you do these root cause investigations, they're intentionally looking for someone to blame.

Jon:

Yeah, no, that, that's the thing. And yeah, like management puts these things in place, which is basically a cya for them. Uh, that they can just blame, blame the employee, which is sad. Um, well, what don't you say, we stopped there. There's still a lot to talk about for this chapter. There's like the classification of slips and mistakes, but maybe we can cover that. Yeah. To be continued.

Matt:

Yeah. Tune in next Fortnite for, uh, the exciting conclusion.

Jon:

Yeah, for the types of slips and types of mistakes.

Matt:

All right, well, I'll see you there, John.

Jon:

See you there.