An earlier version of this piece appeared on the Timmerman Report.
Note added 23Feb2016: Also realized that I was highly influenced by Regina Nuzzo’s piece on biases in scientific research (and solutions) in Nature, which has been nicely translated to comic form here.
Some people believe biology is facing a “Reproducibility Crisis.” Reports out of industry and academia have pointed to difficulty in replicating published experiments, and scholars of science have even suggested it may be expected that a majority of published studies might not be true. Even if you don’t think the lack of study replication has risen to the crisis point, what is clear is that lots of experiments and analyses in the literature are hard or sometimes impossible to repeat. I tend to take the view that in general people try their best and that biology is just inherently messy, with lots of variables we can’t control for because we don’t even know they exist. Or, we perform experiments that have been so carefully calibrated for a specific environment that they’re successful only in that time and place, and sometimes even just with that set of hands. Not to mention, on top of that, possible holes in how we train scientists, external pressures to publish or perish, and ever-changing technology.
Still, to keep biomedical research pushing ahead, we need to think about how to bring greater experimental consistency and rigor to the scientific enterprise. A number of people have made thoughtful proposals. Some have called for a clearer and much more rewarding pathway for reporting negative results. Others have created replication consortia to attempt confirmation of key experiments in an orderly and efficient way. I’m impressed by the folks at Retraction Watch and PubPeer who, respectively, call attention to retracted work, and provide a forum for commenting on published work. That encourages rigorous, continual review of the published literature. The idea that publication doesn’t immunize research from further scrutiny appeals to me. Still others have called for teaching scientists how to use statistics with greater skill and appropriateness and nuance. To paraphrase Inigo Montoya in The Princess Bride, “You keep using a p-value cutoff of 0.05. I do not think it means what you think it means.”
To these ideas, I’d like to throw out another thought rooted in behavioral economics and our growing understanding of cognitive biases. Would it help basic research take a lesson from clinical trials and introduce blinding in our experiments?
In the biopharma industry where incorrect results and interpretation can have a great financial impact, it seems like this could be a useful approach and that the incentives are aligned to make it happen. Let me distinguish up front that I’m talking about key hypothesis-testing experiments. Coming from a background in transcriptomics (now and forever referred to as “fishing expeditions”) I know that many kinds of experiments can’t have predetermined decision-making criteria. Discovery research is hypothesis-generating, and in that class of experiments the emphasis is on making sure the experiment itself was performed correctly.
I’d also exclude other laboratory work that I would describe as “tool-building.” Making recombinant DNA constructs, learning a protocol, extracting DNA—these are all important components of labwork, and necessary for the practice of science, but are not themselves experiments.
In the case of hypothesis-testing experiments, however, and especially experiments that will be used in decision-making, blinding could help us avoid some of the pitfalls of reasoning that social sciences and economics researchers have identified as they’ve developed the field of behavioral economics—essentially, the study of why and how humans are predictably irrational in certain ways. Two very damaging and challenging pitfalls of thinking are motivated reasoning and confirmation bias.
Motivated reasoning is simply how people tend to argue from a non-neutral position based on their personal motivations, often without conscious intent. Anyone who has watched year-end decision making at a company or division or department whose bonuses rely on achieving milestoneswill know what I’m talking about. A project that was unlikely at best back in March suddenly becomes grudgingly acceptable once it’s clear the only way to meet a goal is by letting that project advance. This problem can, I believe, be a particularly big risk in biopharma because the people involved are very, very smart. As a project manager friend once told me, “I can go to the literature for any project in our pipeline and I can find evidence for why that project should absolutely not move forward, and I can also find evidence for why that project is the best thing since Viagra®.”
And that brings me to the other problem: confirmation bias. This is our tendency to listen to and believe evidence that agrees with our already established notions. I think recent debates about global warming have provided a terrific example of this. It bears repeating that the Republican Party is the only major political party in the world that, as a matter of doctrine, does not believe climate change exists. The most striking thing about the arguments deniers put forward for why global warming isn’t happening is how they ignore or dismiss the vast majority of data because it doesn’t agree with their belief that global warming is a hoax and in fact is a world-wide plot by climate scientists and political opponents.
To which parenthetically I’ll just say: 1) as a member of the global scientific illuminati I have yet to receive my check in the mail, and 2) seriously, a global coalition of scientists who agree to work in solidarity, conformity and secrecy for years? These people really don’t know scientists.
That’s not to say scientists are immune to this kind of thinking! In scientific research, confirmation bias and motivated reasoning shoulder their way into the room and can mess with our interpretation of experiments and results. Get a result that doesn’t fit the expectation and many times something in the experimental setup gets blamed. The cells didn’t look right. The reagents might have gotten contaminated. That donor sometimes doesn’t work. This is motivated reasoning that often leads to the experiment being repeated. And then if the experimental result is what’s expected the next time around, great! We can move on. Confirmation of what we expected/hoped for.
I was once on a project team in which we were trying to confirm the biological activity of a protein that had been reported to have therapeutic activity in certain disease conditions. An activity assay was run a half-dozen times and repeatedly came up negative. And yet the experiment kept being repeated. Sometimes questions were raised about the experimental conditions. The department was hoping for good progress: the publications looked extremely convincing, and getting a good result would be a feather in everyone’s cap. In other words, there was a lot of motivation and incentive to get a good result. At last a positive result was obtained, and to the credit of our team leader, that result was actually viewed with some skepticism. However, there was also pressure to move onward.1
Blinding would help us avoid these pitfalls by forcing us to look at data impartially. The process could be something as simple as having a person whose job is to take racks of tubes full of reagents (experimental samples and controls) and plates of cells or cages of animals, and blinding them so that the people performing the experiment don’t know what’s in each tube, don’t know which animal is getting what treatment, don’t know which samples are where on the dosage curve.
The key is entered into a secure database, and at the same time, the experimenter is asked to pre-specify her criteria for validation of the hypothesis, as well as her analysis plan. How will replicates be combined and significance tested? What level of activity is necessary to say the hypothesis is successful? How about to provide positive data for hitting a project milestone? And immediately after an experiment has been completed, but before any results are unblinded or even calculated, the experimenter can also log any problems with cells, reagents, animals, whatever, that cropped up during the experimental process.
Once all that is done, the data can be assembled based on the analysis plan (also, preferably, by another party) and put into a presentation format. I would suggest the data continue being blinded. The experimenter, and anyone else interested in the experimental result, can take a look and log their impressions of the data. Strong trend? Clear result? Or questionable evidence that anything has had an effect? Or something in between? After all that’s been done, and only after, the experiment is unblinded.
Well. I can hear right now the scientists screaming that basic research isn’t the place for this. A blinding approach is unnecessarily cumbersome and besides scientists are trained—trained!—to not let emotions and other external factors sway their judgment. And I get that. We are trained to try to evaluate data impartially and rationally.
And at the same time I’d respond we’re also human. Research on behavioral economics and cognitive biases has produced a fascinating body of knowledge about how our brains, shaped by evolution and experience, sometimes move us to act in ways that depart from rationality and impartiality. If that weren’t the case, we wouldn’t need the double-blinded placebo controlled clinical trial in the first place. That structure is not there solely to prevent bias on the part of the patients. It’s also there for the investigators. Because we know that despite the best efforts of very intelligent, dedicated and trained scientists and medical workers, sometimes people will see what they want to see, not what’s really there.
I think, ultimately, the test would be having a company adopt a process like this and see what happens. Maybe take a couple of research units and ask them to do blinding, and compare how the science progresses versus others doing basic research in the conventional way. I’d love to see the results. The recent announcement by PLOS Biology that they would begin publishing meta-research—research on the process of research—as a way to look for better ways to do science suggests there is interest and an audience for experiments in how we do science.
On the other hand, maybe no company will want to take the risk, viewing blinding as providing a minor potential benefit at best. Interestingly enough, though, companies often do engage in blinding, but only externally, when providing compounds to an academic collaborator or a CRO. But I do think looking for solutions to reduce the problem of replication, given the money and time companies put into research, is an important thing to consider as drug development searches for every efficiency it can find.
1By the way, this is an example of another one of those tricks that prey upon cognitive biases: anecdotes. Our brains are geared to listed to stories and they have a disproportionate amount of influence on whether we find an argument believable. This is why politicians always tell stories. Whenever you hear a story, that’s a good sign to ramp up your critical thinking.
Kyle, You’ve very nicely framed the many arguments here, and I don’t disagree with anything you’ve said. The biggest problems I see to implementation of a blinding plan are cost and the human element. If Jim is going to blindly work on John’s experiment, then this would take away time from other things he had on his plate to do. How would the extra work get done for the same price? When I was a grad student, I injected large quantities of mice at a time, which often took hours. If this were blinded, who would have injected my mice for me? And suppose Jim doesn’t get along with John, and he is worried John will somehow ruin his experiment to get back at him for some perceived slight in the past. This happened to me as an undergrad when the bottom half of a critical test tube disappeared from a cabinet where it was stored. The challenges are large, the solutions difficult to come by. All in all, a good idea to include blinding if practical, but this is often not the case.
Hi Stewart,
I agree that cost is a big potential issue. Like a lot of things, though, I wonder how much would be saved in time and person hours by doing this. That’s one of the elements I would like to model at some point–how many false leads are pursued that would be prevented by doing more blinding, and therefore what is the price point at which blinding becomes cost effective.
The issue is the up-front costs. Even though blinding may help with the reproducibility issue, this will take time to figure out, and folks running the experiments will balk at spending the extra money up front since they won’t know if it will be worth the time and effort. And proving that the blinding helps will also be difficult, since we don’t know what the existing success/failure ratio is. How much of a change in this ratio would need to be seen in order to conclude that the blinding was of value? Blinding is a solid idea in theory, but getting this into practice will be tough. All of the scientists doing the experiments will assume it’s those sloppy other scientists, not themselves, that are plagued by biases and unwarranted justifications.