Should Basic Lab Experiments Be Blinded to Chip Away at the Reproducibility Problem?

An earlier version of this piece appeared on the Timmerman Report.

Note added 23Feb2016: Also realized that I was highly influenced by Regina Nuzzo’s piece on biases in scientific research (and solutions) in Nature, which has been nicely translated to comic form here.

Some people believe biology is facing a “Reproducibility Crisis.” Reports out of industry and academia have pointed to difficulty in replicating published experiments, and scholars of science have even suggested it may be expected that a majority of published studies might not be true. Even if you don’t think the lack of study replication has risen to the crisis point, what is clear is that lots of experiments and analyses in the literature are hard or sometimes impossible to repeat. I tend to take the view that in general people try their best and that biology is just inherently messy, with lots of variables we can’t control for because we don’t even know they exist. Or, we perform experiments that have been so carefully calibrated for a specific environment that they’re successful only in that time and place, and sometimes even just with that set of hands. Not to mention, on top of that, possible holes in how we train scientists, external pressures to publish or perish, and ever-changing technology.

Still, to keep biomedical research pushing ahead, we need to think about how to bring greater experimental consistency and rigor to the scientific enterprise. A number of people have made thoughtful proposals. Some have called for a clearer and much more rewarding pathway for reporting negative results. Others have created replication consortia to attempt confirmation of key experiments in an orderly and efficient way. I’m impressed by the folks at Retraction Watch and PubPeer who, respectively, call attention to retracted work, and provide a forum for commenting on published work. That encourages rigorous, continual review of the published literature. The idea that publication doesn’t immunize research from further scrutiny appeals to me. Still others have called for teaching scientists how to use statistics with greater skill and appropriateness and nuance. To paraphrase Inigo Montoya in The Princess Bride, “You keep using a p-value cutoff of 0.05. I do not think it means what you think it means.”

To these ideas, I’d like to throw out another thought rooted in behavioral economics and our growing understanding of cognitive biases. Would it help basic research take a lesson from clinical trials and introduce blinding in our experiments? Continue reading


Baseball, regression to the mean, and avoiding potential clinical trial biases

This post originally appeared on The Timmerman Report. You should check out the TR.

It’s baseball season. Which means it’s fantasy baseball season. Which means I have to keep reminding myself that, even though it’s already been a month and a half, that’s still a pretty short time in the long rhythm of the season and every performance has to be viewed with skepticism. Ryan Zimmerman sporting a 0.293 On Base Percentage (OBP)? He’s not likely to end up there. On the other hand, Jake Odorizzi with an Earned Run Average (ERA) less than 2.10? He’s good, but not that good. I try to avoid making trades in the first few months (although with several players on my team on the Disabled List, I may have to break my own rule) because I know that in small samples, big fluctuations in statistical performance in the end  are not really telling us much about actual player talent.

One of the big lessons I’ve learned from following baseball and the revolution in sports analytics is that one of the most powerful forces in player performance is regression to the mean. This is the tendency for most outliers, over the course of repeated measurements, to move toward the mean of both individual and population-wide performance levels. There’s nothing magical, just simple statistical truth.

And as I lift my head up from ESPN sports and look around, I’ve started to wonder if regression to the mean might be affecting another interest of mine, and not for the better. I wonder if a lack of understanding of regression to the mean might be a problem in our search for ways to reach better health.
Continue reading

Baseball, Bayes, Fisher and the problem of the well-trained mind

One of the neat things about the people in the baseball research community is how willing many of them are to continually question the status quo. Maybe it’s because sabermetrics is itself a relatively new field, and so there’s a humility there. Assumptions always, always need to be questioned.

Case in point: a great post by Ken Arneson entitled “10 things I believe about baseball without evidence.” He uses the latest failure of the Oakland A’s in the recent MLB playoffs to highlight areas of baseball we still don’t understand, and for which we may not even be asking the right questions. Why, for example, haven’t the A’s advanced to the World Series for decades despite fielding good and often great teams? Yes there’s luck and randomness, but at some point the weight of the evidence encourages you to take a second look. Otherwise, you become as dogmatic as those who still point to RBIs as the measure of the quality of a baseball batter. Which they are not.

One of the thought-provoking things Arneson brings up is the question of whether the tools we use shape the way we study phenomena–really, the way we think–and therefore unconsciously limit the kinds of questions we choose to ask. His example is the use of SQL in creating queries and the inherent assumptions of that datatype that objects within a SQL database are individual events with no precedence or dependence upon others. And yet, as he points out, the act of hitting a baseball is an ongoing dialog between pitcher and batter. Prior events, we believe, have a strong influence on the outcome. Arneson draws an analogy to linguistic relativity, the hypothesis that the language a person speaks influences aspects of her cognition.

So let me examine this concept in the context of another area of inquiry–biological research–and ask whether something similar might be affecting (and limiting) the kinds of experiments we do and the questions we ask.

Continue reading

Why Derek Jeter being a lousy defensive shortstop gives me hope for innovation in industry

All opinions are my own and do not necessarily reflect those of Novo Nordisk

Hat tip to Jeff Sullivan of for the article that sparked this idea.

It used to be we knew what a good defender was in baseball.  And Derek Jeter was a good defender.  He had balletic grace, he scooped up balls and threw them with flair and panache, with an all-but-patented jump-throw that made announcers gush and coaches shake their heads in awe.  He was the complete package, a player who could hit, field, throw and lead, a first ballot hall of famer.

Except that, when you look closely, it turns out his defense is lousy.

Defense used to be measured (still is, by many) via the eye test.  How does a player look when catching balls in play?  And this was backed up by the statistic of fielding percentage.  How many balls did a player field cleanly?  It makes intuitive sense.  The more balls a player fields correctly, why, the better defender he must be, right?

Except that’s only part of defense.  It’s nice if a player can catch a ball well.  But what about balls that get by him?  In the last decade or so, baseball analysts began studying the concept of range.  All things being equal, the realization came, range is actually more important than errors or how a player looks.  It’s one thing to catch everything that gets to within a few steps to the shortstop’s right and left.  It’s another thing entirely to catch 98% of everything spanning the third baseman’s left pocket to the grass on the far side of second base.  When you consider the huge number of balls that are hit in the vicinity of the shortstop every season, and the relative value of a hit versus an out, those extra feet of range translate into saved runs.  And saved runs contribute to wins.

Just as an aside, current defensive metrics suggest Derek Jeter has cost the Yankees over a hundred runs relative to an average shortstop over his career.  Still a hall of famer.  Not a great defender.

However, those saved runs and that increased range come with a cost.  By definition, the best shortstops will have more chances to make a fielding play, and if you make more chances, you are likely to make more errors.  Indeed, the very fact that a great fielding shortstop is able to get to more hard hit balls on the edge of his range may well lead to a lower overall fielding percentage as well as a higher number of errors.

Fortunately for those shortstops, baseball teams are getting smarter and are realizing the tradeoff is worth it.  Scouting reports regularly cite range in addition to how a player looks, and fielding percentage is low on the list of statistics an organization cares about in evaluating a player.

And that gives me hope for innovation in two ways.  The first is the point above about the eye test.  We trust what we see and feel.  However, that’s not always the complete story.  Often in trying to implement innovation, there’s a gut feeling by those doing the evaluation–this is innovation, that isn’t, I can tell.  Only anecdotal evidence suggests that no, in fact, often people can’t tell.  Just ask Kodak.  However, if baseball can come to realize that the eye test, while important, is just one part of the evaluation package, industries can also learn that lesson and look for other, possibly less subjective ways to measure innovation.

The second relates to two contradictory things that are often said about innovation, sometimes one right after the other.  We need to innovate.  And we need to de-risk it to make sure that it will work.  Unfortunately, there can be no real innovation without the very real risk of failure.  In an interview with Wired magazine, the inventor James Dyson is described as having worked his way through 5127 prototypes of his bagless vacuum cleaner before hitting success.  But if baseball can come to realize that a decreased probability of fielding success is actually a good thing when it means a shortstop is reaching defensive heights few others can, maybe industries can finally realize that failure, in the right cause, is something to be celebrated and embraced.