All opinions are my own and do not necessarily reflect those of Novo Nordisk
For another nice take on this topic see Paul Knoepfler’s blog post here.
One of the sacred (can I say sacred in reference to something scientific?) tenets of the scientific method is reproducibility. If something is real and measurable, if it’s a fact of the material world, then the expectation is that the result should be reproducible by another experimenter using the same methods as described in the original report. One of the most well known (among physicists anyway) examples of irreproducible data is the Valentine’s Day Magnetic Monopole detected by Blas Cabrera back in 1982. Great experimental data. Never repeated, and therefore viewed as insufficient proof for the existence of a magnetic monopole.
So it’s troubling that in the past few years there have been numerous stories about the lack of reproducibility for different scientific experiments. In biomedical science the number of reports on the difficulty of reproducing results has gotten so great that the NIH has begun thinking about how to confirm and require reproducibility of some kinds of experimental results. Just a few days ago another field, that of psychological priming, saw the publication of an article that the effects of “high-performance priming,” could not be reproduced. This is another field undergoing serious questioning about whether/why results don’t reproduce, with commentary from such luminaries as Daniel Kahneman.
Oddly, my own perspective is that all sides are right. Those critical of the irreproducibility of results are right–for us to move forward in science we need to be able to reproduce findings, otherwise we’re just mucking about with phenomenology. But at the same time I’m quite sympathetic to those whose results can’t be reproduced. I see some parallels to drug development. As we try to push forward through fields like biomedical science or psychology, we have moved past the lower hanging fruit and are now in the midst of much harder problems and more complicated experiments which by their nature are harder to reproduce. Drug development is also having problems harvesting the higher hanging fruit of better, more targeted and nuanced drugs. Assuming they exist.
What do I mean by saying experiments by their nature being harder to reproduce? Let me give a trivial example from biology. I have known labs that, when they are running out of fetal calf serum (FCS), which is used in their cell culture experiments, will order several lots of FCS from a vendor, test them all, and then select one for their experiments. In other words, some bottles of FCS work and some don’t, because every lot of FCS is isolated from a specific group of cows, and calves are different from one another.
And then once the lab finds that optimal batch, they’ll order every single bottle the vendor has. So right off the bat one can see the problem with reproducibility. If a lot of FCS is deemed the best for a set of experiments, no independent lab can now truly reproduce the experimental protocol. The lab cornering that lot of FCS has a very clear, logical goal: provide internal reproducibility. But the scientific community as a whole might never be able to repeat any reported results if those results are intimately tied to that particular lot of FCS.
Why do labs use FCS as opposed to some very specific chemical growth formulation? Because they have to. Experiments with cells, even when grown in a dish under very controlled conditions, are difficult and many cells don’t even grow, much less function, without FCS. And as we search for subtler readouts and behaviors as we try to dissect biology even further, the experiments approach the level of art, because the kind of variable elements of an experiment that I’m describing aren’t just limited to FCS–they pervade experimental design.
Similarly, experiments in priming, which I think are among the most interesting developments in behavioral psychology over the past several years, concern human behavior with all the randomness and uncontrollable elements that implies. It’s almost circular in a way. Psychological priming posits that different subtle inputs to people, like holding a cup of hot versus cold coffee, can influence how that person will later behave. Given this, trying to design an experiment that limits inputs only to the ones being tested is pretty much impossible, especially since we still have no comprehensive idea about what kinds of inputs do prime behavior, and whether those inputs are consistent and reproducible across cultures, genotypes, ages, etc.
So what do we do? I don’t know. I hope over time we’ll be able to build different ways of thinking about experiments and how to power them and measure the outputs to try and remove the uncertainties that lead to irreproducible results. Maybe the study of complexity will help and will allow us to discern signal among very noisy data. Maybe the ability to gather more and more data about our experiments will help as well. However it’s done, it’s good there’s attention being paid to this area, because we need to understand and try to elimate the irreproducibility as much as possible. Although, I have to admit, it would be a shame to put JIR out of business.