Imagine your doctor just informed you it’s time to get screened for colon cancer. She recommends a colonoscopy – a test to look inside your colon using a flexible tube with a miniature camera on the end. She explains that about one in every sixteen people develops colon cancer but most survive when it is caught early. Few survive if the cancer is detected late. She says colonoscopy is the best test to screen for cancer. Even if you do not have colon cancer, you might still have a colon polyp – this could be a forerunner of cancer, and colonoscopy will allow removal of the polyp before it becomes cancer.
You agree to get the test, even knowing it is inconvenient, expensive, requires you to consume a liquid diet for the better part of a day, and mandates ingesting a nasty “prep” medicine to purge your body from within. But you decide it’s worth it, so you call to schedule the test.
Consider these questions: Does it matter what time of day you get the test? Should you fight for the first appointment of the day? Or just take what they give you?
Maybe these are loaded questions. But what does time of day have to do with it? Isn’t a bad colonoscopist bad all day, and a good colonoscopist good all day?
We know that good truckers aren’t good all day. When truckers are wired up with nod detectors, cameras in the cab, and rumble detectors indicating an off road adventure, we learn that they sometimes get sleepy and crash. If they don’t crash, then they awaken just before crashing. It’s disquieting to watch footage of a trucker staving off sleep while wrestling a loaded 80,000-pound vehicle at seventy-five miles per hour. The problem is so serious that researchers performed a randomized controlled trial of caffeinated beverages versus no caffeine for truckers. The caffeine helped.
In the airline industry, most cockpit errors and air traffic control mishaps occur at the end of work shift – a fact addressed by the Federal Aviation Administration through extensive work shift regulations. In anesthesia and surgery, operating room complications increase the longer the surgical team is at work. And doctors in training make more errors the longer they stay in the hospital. All of these professionals run the risk of making mistakes the longer they’re at work. The smartest, most adept, most skilled professionals can err. As the Institute of Medicine wrote in its seminal report about medical mistakes: “To err is human.”
Even parole judges – a presumably objective and stalwart group – yield strikingly disparate conclusions as the day goes on. In an extraordinary study conducted in Israel, investigators tracked parole rates as a function of time of day. The judges evidently felt generous after lunchtime and snacks, granting 65% of parole requests in the period following their nutrition breaks. But as the judges continued to work, the parole rate dropped; it eventually bottomed out at zero for the cases adjudicated towards the end of a shift.
Cognitive errors accrue during prolonged and repetitive activities. My colleagues and I figured this might happen during colonoscopy. There were several reasons to suspect this. For one, doctors usually perform colonoscopies repetitively during a prolonged work shift; this could lull then into a complacent trance. Colonoscopists are also pressured to complete as many procedures in as little time as possible; this pays the bills, but might also lead to rushing. Also, distracting activities burgeon in the operating room as the day progresses, like incoming phone calls, nursing shift changes, and an accumulating “to do” list. In short, there are plenty of reasons to suspect that performance might drop as the day progresses.
We conducted a study to compare time of day and polyp detection. It turned out that polyps were common, but they became mysteriously less common as the day progressed. Early-morning procedures found more polyps than procedures conducted later in the day. Hour-by-hour, there were fewer and fewer polyps detected. This seemed like a case of pervasive under-calls.
We sent in our results for publication. The editors didn’t like it. They said we must have made an error in reporting all these errors, like failing to account for another key factor. For example, the quality of bowel preparation tends to be better early in the morning, so maybe that alone explained it. Or, people with a family history of cancer might get nervous and schedule themselves for early appointments; they are more likely to have polyps, so we might have detected more polyps in the morning only because anxious, high-risk people scheduled themselves for the morning. Or maybe the most experienced doctors worked in the morning and the junior doctors worked later in the day.
But the same thing happened no matter how we analyzed the data: later in the day meant fewer polyps. We accounted for the patients’ history of cancer and polyps, their family history, the prep medicine they received, the quality of their bowel preparation, the seniority of the doctors, and even the length of the procedure (we sat there and timed every procedure with a stopwatch). Nothing changed. No matter if we examined pre-cancerous polyps or completely benign polyps, the doctors found fewer polyps as the day progressed. It seemed incontrovertible that time of day mattered. The editors gave in and published our study.
The New York Times ran a story about the article. Before long, I started receiving facetious “thank you” emails from colleagues informing me their patients all wanted to be first in line. One doctor said: “I can only schedule one person first every day. Somebody has to be second!”
Other investigators repeated our study in different settings. They found the same thing. Whether in Ohio, Minnesota, or Wisconsin, colonoscopists detected fewer and fewer polyps as the day progressed.
We figured that if the effect were real then we could try to reverse it. One idea was to hold colonoscopists accountable to quality control standards. This would involve auditing every doctor to monitor performance across different times of day. We could also incentivize doctors with performance pay. We might even incentivize the nurses who watch the screen right alongside the doctors.
But these were onerous, expensive and controversial solutions. A more reasonable option was to provide a simple reminder to be vigilant, like a well-crafted poster on the wall to shake doctors out of complacency. The vaunted Notre Dame football team, for example, has a sign in the locker room that says: Play Like a Champion Today. The players slap the sign as they spill out of the locker room and onto the field. Great stuff.
Beyond helping to score more touchdowns, informational posters also encourage people to take the stairs instead of escalators, promote more frequent hand washing in intensive care units, and increase use of influenza vaccinations in primary care clinics. Did we need to test a Play Like a Champion poster for colonoscopists? We created a simple poster and hung copies of it around the work areas (but just out of sight from the patients…). Here’s what the posters looked like:
What do you suppose happened when we hung up the posters?
Nothing. Nothing happened. In fact, things got even worse. The posters had no effect on the problem. It was crazy. Here we had a group of accomplished doctors working in an academic unit and acutely aware of the problem – yet unable to do anything about it.
We sent our results in for publication. The editors didn’t like it. They said we must have made an error. Again. Maybe the poster wasn’t big enough. Maybe there weren’t enough of them. Maybe they were the wrong colors. Maybe we hung them on the wall crooked. Or maybe these were just terrible doctors.
The last concern was the most specious. Were the thirty-three participating doctors just no good? Were they just incompetent hacks? Not really. As a group, these doctors exceeded quality standards across every benchmark. Compared to national means, they detected more polyps, took more time per case, and found more cancer than national means. They did their job. But time of day still mattered.
The editors eventually gave in and published our findings. The results were not sexy. Editors normally like positive results – like X Causes Y! Or, If You Take X, You Won’t Get Y! Positive titles sell copies. Our title was a big downer: Colonoscopy Yields Fewer Polyps as the Day Progresses Despite Using Social Influence Theory to Reverse the Trend.
So the posters didn’t work. There was something more fundamental going on – something intractable, or at least something resistant to posters. We inspected the data more closely. This time we monitored individual doctors, and the story became even more interesting. Although most doctors detected fewer polyps as the day progressed, some didn’t. Some detected more and more polyps as the day went on. Some detected the same amount regardless of the time of day. Even among those who detected fewer, there was a gradient – some fell off quickly, some steadily, and some just barely.
Taken as a whole, the group of doctors detected fewer polyps as the day progressed. Considered as individuals, however, some were immune to the effect of start time while others were strongly influenced by start time. Some colonoscopists are better threat detectors than others. But how could a doctor under-call so many polyps? How could this happen in the first place? To understand this quirk of intuition, we turn our attention to spotting (and missing) gorillas.
Looking for Gorillas (and Polyps)
Is your colonoscopist the type who detects fewer polyps as the day goes on? Is your colonoscopist able to even see the polyps in the first place? Does he have the patience, aptitude, knowledge, and perceptive abilities to detect and remove polyps – all of them, all day long?
Does he have the intuition, even? How do the best colonoscopists get the spider sense that a pre-cancerous polyp is nearby? How do they detect the threat when others miss it completely? If you have a polyp, you want an accurate colonoscopist who detects and removes the threat. You want a warranty on your colon.
Good colonoscopists combine experience with intuition to make the right call. I’ve trained scores of colonoscopists, so I have witnessed first-hand that they are not created equal. Some have an uncanny ability to detect threats. Others don’t. Some stay locked in, focused, and constantly observing. Others are looking at the screen, but have a distracted mind. Some know to inspect a fold of colon tissue because it seems oddly misshapen and then find a cancer lurking around the corner. Others drive right past the same fold in the colon, completely oblivious of the tumor lying in wait. Some notice a subtle discoloration or blotch while others are colorblind but don’t even know it. Some spend an extra moment to clean out a segment of bowel to reveal a suspicious lesion. Others are hungry, need to get lunch, and rush the procedure without washing out the area.
Don’t get me wrong. Most doctors are highly competent. But even among competent doctors, there are still subtle variations in aptitude. There are differences. I see these differences first hand as an instructor. The time of day studies prove that colonoscopists are imperfect and some are prone to missing polyps. A perfectly automated colonoscopy robot would be unfazed by time of day; it would find and remove every threat, every time.
But a perfectly automated threat detector would not be human. Humans make errors. For colonoscopists, the major risk is inattentional blindness, which is their inability to see what they’re literally seeing, simply because the brain doesn’t compute the visual field. A colonoscopist might stare directly at a polyp, but just not see it.
How is this possible? It’s easily possible. The most famous example of inattentional blindness is the “invisible gorilla,” an ingenious experiment developed by Christopher Chabris and Daniel Simon at Harvard University. In the experiment, you watch a video of six people passing a basketball back and forth. Three of the players are wearing white shirts and three are wearing black shirts. The task is simple: count how many times the white-shirted players pass the ball. The answer is fifteen times. Most people get that right. But about half the viewers completely miss the tall figure in a gorilla suit who traipses into the middle of the scene, vigorously chest beats, and nonchalantly walks off. It’s an astounding demonstration of our inability to see objects right before our eyes, especially when engaged in complex tasks. Our conscious self becomes easily distracted and misses things – even big, hairy, chest-beating gorillas.
You might feel cheated out of the video now that you’ve heard the punch line (assuming you haven’t already seen this widely-publicized experiment). If so, watch this video instead. Despite knowing the experiment is trying to fool you, you may still get fooled. I was.
The invisible gorilla experiment isn’t really fair. After all, we’re not prepared to see a gorilla, we weren’t asked to see a gorilla, and we would never expect to see a gorilla. Furthermore, we’re not necessarily trained to sharpen our attention to track dynamic visual information in the face of a complex task. In contrast, some professionals, like radiologists, are trained to inspect visual data carefully, find intricate patterns, and make life-or-death decisions on the basis of their acuity. Would radiologists miss the gorilla?
Another team at Harvard decided to test radiologists. They took a photo of a man in a gorilla suit, shrunk it down to the size of a matchbook, and then spliced the mini ape into real computerized tomography (CT) scans being reviewed by radiologists. Moreover, the researchers instructed the man in the gorilla suit to “shake his hand angrily” for the photo. This was not just a mini gorilla – it was an angry mini gorilla. Can you see the gorilla in this CT scan?
Alveolar Apes: Do you see the gorilla?
Surely, trained radiologists – professional lookers – wouldn’t miss a fierce hairy creature stowed between the heart and rib cage, suspended in a black wedge of lung tissue off to the right (do you see it yet? If not, a clue will come later). Or would they miss it?
Right. They would. In fact, 83% of the radiologists completely missed the gorilla hiding in plain sight. Eighty three percent! Surely the radiologists saw the gorillas (just as you are now, even if you can’t “see” it yet). Their eyes met the photons emanating from the angry gorilla man – but their brains did not register the information reaching their eyes. There was an utter disconnect. The radiologists’ brains were prepared to see what they were prepared to see, and not much else. That’s scary stuff.
Like radiologists, colonoscopists are trained to look. These doctors spend three years in advanced training learning how to look at the colon. They become very competent looking for polyps, cancers, and other lumps and bumps. Despite that training, the time-of-day studies reveal that colonoscopists “see” less as the day progresses. These doctors are not purposefully leaving behind polyps – that would be unprofessional and unethical. More likely, they just don’t see the polyps, despite seeing them. As a result, there are all sorts of new devices, cameras, lenses, and even dye sprays to help colonoscopists see what is before their eyes. And even these technologies often fail to help.
Some colonoscopists see more than others, yet they all have a pair of working eyes.
The Illusion of Confidence
In a perfect world, we would have a scorecard for distinguishing among colonoscopists. Not only would an accurate scorecard provide the first step to help low performing colonoscopists, but also it would help inform consumers about how to select among doctors.
To figure this out, we might audit each colonoscopist by posing test cases. Some cases would have pre-cancerous polyps, some benign polyps, and some no polyps at all. We would know the truth in advance. Perhaps a top expert would perform a colonoscopy first and then the test doctor would come in second, blinded to the results of the expert. Or we could use a virtual colon with a computer simulation (this technology actually exists). Whatever the technique, each doctor would perform the test and then judge the likelihood of a pre-cancerous polyp. If we repeated this hundreds of times we would learn the true accuracy of each colonoscopist at finding polyps.
But nobody has performed this study; probably nobody will ever perform this study. Most doctors will not subject themselves to hundreds of test cases. Many will altogether dismiss the notion of a scorecard. After all, doctors didn’t spend years in school and postgraduate training to have some research team assign them a grade. Grades are for school, not for colonoscopists. And most doctors think they’re great. Period. Just ask your doctor: “How good are you?” That’s considered an insult. And if you get an answer, it will probably be: “Really good.” Or, “I’ve been doing this for years, don’t worry.”
This overconfidence is pervasive not only among doctors, but among people. Decades of research show that we’re hardwired to believe we know more than we actually do and possess abilities greater than they actually are. It feels good to be certain. It looks good to be certain. Certainty sells. But we tend to overestimate our certainty – we constantly blow it.
The invisible gorilla experiment is a telling example: not only do half the observers miss the gorilla, but also most cannot believe they missed the gorilla in the first place. People are stupefied when they review the tape in slow motion and see the hairy beast meander into the scene. As Daniel Kahneman writes in Thinking, Fast and Slow: “The gorilla study illustrates two important facts about our minds: we can be blind to the obvious, and we are also blind to our blindness.” The former is a curious fact of cognitive psychology; the latter is a blow to our collective self-perception and a perilous trait undermining accurate threat detection. There are hundreds of other experiments showing the same thing – that we are overconfident… way overconfident. This is bad news for our gut feelings.
It turns out, however, that the top experts in every field are exactly those who avoid overconfidence. A telling example comes from the world of competitive chess, where a standard scorecard – the Elo scale – explicitly rates players. The Elo scale is a precise reflection of a player’s skill, so it allows us to rank-order competitors from most to least talented. The same researchers who brought us the invisible gorilla also asked chess players to self-evaluate their abilities. They found an inverse relationship between objectively rated versus self-rated skill. The lower the Elo score, the higher the player’s self-assessment of skill. And vice versa: the most skilled players were the most humble about their skills. In fact, the highest rated players tended to dismiss their Elo score, instead claiming the score overrated their abilities.
There are many other examples of this so-called illusion of confidence. In case-after-case, when researchers apply an objective scorecard of ability, they observe an inverse relationship between actual versus self-perceived performance. In a review of studies exploring this phenomenon, aptly titled: “Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments,” researchers described how people scoring in the bottom on tests of humor, grammar, and logic wildly overestimate their actual performance. For example, those in the twelfth percentile on various cognitive or intellectual scorecards rate themselves, on average, to be in the sixty-second percentile of ability. As true ability rises, self-assessments become more accurate.
What about in medicine? In a novel study performed at the University of Toronto, researchers asked family practice residents to interview an actor posing as a standardized patient. A panel of experienced judges evaluated each doctor and assigned a performance score that measured knowledge, rapport with the patient, emotional control, flexibility in conducting the interview, and questioning skills. Then the researchers asked the doctors to self-assess their performance for comparison. Just like with chess players, the doctors in the upper tier of performance tended to underestimate their skill, whereas those in the lower tier overestimated their skill. Even more interesting, after the doctors provided their self-assessment, the research team showed a set of videos of other doctors interacting with the same standardized patient. Some of the video showed very adept interviewers, while others showed nearly incompetent interviewers. Then the investigators asked the residents to re-assess their performance after viewing the videos. The residents in the highest performance group adjusted their self-assessment upwards to be in-line with the expert judges after watching the benchmark videos. In contrast, the residents in the lowest group were inconsistent: some adjusted their self-inflated scores even higher after watching the videos. In short, the worst doctors didn’t seem to know they were the worst. The best were modest about their performance.
The Toronto study jives with my own experience that physicians who are most humble about their ability tend to be the most experienced (there are certainly exceptions in both directions). One of my most respected senior colleagues, Hartley Cohen, when faced with a confusing patient, is famous for exclaiming in his regal South African accent: “I have absolutely no idea, whatsoever, what is going on here!” Students initially question his ability, but quickly realize he knows more than anyone, has seen more than anyone, and is in the best position to raise the white flag of defeat in confidence. Admitting defeat and acknowledging ignorance can be signs of bravery, knowledge, and skill.
But experience aside, we are all overconfident in one form or another, even if there are gradations of overconfidence. Our intuition makes errors of commission and omission. It leads us astray yet convinces us we’re on track. It tells stories that sound coherent and plausible, but are fanciful and fictitious. And it makes us feel confident when, in fact, we should be profoundly shaken by our lack of knowledge and insight.
– Commentary by Dr. Brennan Spiegel