Metrics and the Heisenberg quality of gathering data about behavior

All opinions are my own and do not necessarily reflect those of Novo Nordisk

Thinking more about the Global Health Metrics Conference, one element that resonated was that measurement does not occur in a vacuum.  When metrics are gathered, and especially when they are gathered out in the open by global health surveys, for example, there’s the real issue of the act of measuring changing the validity of what’s being measured.  I’ve been thinking about this in the context of hiring and workplace management.  For example, if the media were to report that viewership of Khan Academy videos on YouTube was found to correlate highly with creativity in the workplace, I expect two things would happen.  One, viewership of Khan Academy would spike, and second, the metric would rapidly begin to lose what correlative and predictive power it had.  People would try to game the system.

In Global Health, where countries are incentivized to meet certain milestones, it requires real thought to either make sure the milestones are strongly causally related to the health goals, or else that the metrics undergo continual fine-tuning to ensure the desired effect.  If the metric were something like number of healthcare facilities, a country could ensure that number increases but there wouldn’t necessarily be a concomitant increase in actual health services delivery.  I’m sure these are topics the Global Health community wrestles with every day.

It’s kind of like with relationships.  While on the one hand, we can tell our partners what we want, and often see them do it, on the other hand don’t we really secretly want them to already know and behave accordingly, because somehow that’s more genuine?  It’s certainly why social science researchers often mislead their study subjects on the actual purpose of behavioral experiments.  Or, to quote from the movie Buckaroo Bonzai, “Character is what you are in the dark.”

Ultimately, it seems best to try to measure behavior as closely as possible to the desired outcome.  That’s why baseball is nice.  We want good hitters, and to find good hitters it’s simple:  we measure how well a player can hit.

A Genomics Researcher’s Take on the Global Health Metrics Conference 2013

All opinions are my own and do not necessarily reflect those of Novo Nordisk

Over the past three days I had the opportunity to attend the Global Health Metrics Conference here in Seattle.  This is not my field; I’m a genomics researcher working in biomedical research and drug development, but I’ve also been curious about what’s going on in the area of public and global health.  This seemed like a good place to get a crash course.  The Lancet has kindly published all the abstracts and I wanted to give my impressions of what I heard.

First takeaway:  I was surprised and intrigued by how many parallels I saw between the work I do (primarily transcriptomics and genomics) and the work I saw reported.  Sure, global health researchers use surveys rather than high throughput sequencing, and gather data on nations rather than patients, and deal with the complexities of culture and government instead of human biology, and work in the public sphere as opposed to the private, and use a completely different vocabulary than I do, but other than that it was really similar.  So similar I put together this table:

Biomedical Genomics Research Global Health  Metrics
Increasing amount and types of data Yup
Biomarkers Indicators
Growing emphasis on efficacy measurements ditto
Lots of Acronyms, NIH, AMA, EULAR, ADME GBD, DALYs, CDVS, USAID
Struggle to understand what tissue, cell, analyte to measure Struggle to characterize the right metric to demonstrate effects/efficacy
Gene X Environment interactions poorly understood Local environment effects beginning to be captured
Personalized medicine Nation specific solutions
Noisy data, lots of unknowns Maybe even noisier data and, yeah, unknowns
More focus on longitudinal studies Already there

And so on.  I’ll elaborate on a few more below.  Another immediate takeaway:  I wasn’t even aware of the Institute for Health Metrics and Evaluation (sorry guys).  Now that I am, it’s a place I’d like to visit.

One thing that really impressed me was the work that IHME has put into making the Global Burden of Disease survey lucid, simple and accessible.  The data presentation by Kyle Foreman and Peter Speyer (@Peterspeyer) was terrific.  Not so much for any specific piece of data (although the trends and findings are all pretty fascinating), but rather for their demonstration of the power of dynamic presentation and facile web-based tools.  Static powerpoint charts are clearly so last decade.  Anyone wanting to check out their presentation can go here, or even better just go directly to the site.  As a scientist who also works with large, multifactorial datasets, I know the struggle to condense that data into a usable, comprehensible form.  I think Peter and Kyle have done a great job, and I also like the potential crowdsourcing aspect of it.  As I’ve commented on before, crowdsourcing methods, whether via games or other techniques, have a real potential to fully utilize large datasets and also to solve big problems.

Of the many talks I heard, a few I’ll highlight, just for the specific points I took away.  On the first day, Tanya Marchant showed interesting and cautionary data about making sure that what you’re measuring really measures what you think you’re measuring.  In this case, measuring the presence of skilled birthing assistants as a proxy for maternal care during childbirth turns out to be incomplete because of other factors such as availability of basic medical supplies.  Reminds me of debates over things like how best to measure drug efficacy in clinical trials–for example, response versus progression free survival in oncology.

Joseph Dieleman presented his work on looking on the effects of external aid to developing nations for health.  In a perfect world, external aid would just be added to pre-existing health expenditures, and after aid expired, local governments would maintain spending at pre-aid levels, or even higher.  Well, turns out this isn’t always the way this happens.  Aid comes in, local health budget gets shifted “temporarily,” but temporarily turns to permanently when the external aid leaves.  One of the thoughts that went through my head during this conference was to remember the law of unintended consequences.

I enjoyed Michael Wolfson‘s talk on functional health status.  Coming from an industry that really likes it’s tried and true measures like HDL/LDL levels, the concept of looking holistically at factors relating to actually feeling good was a nice contrast, and food for thought.

Bruce Hollingsworth had a great quote in his part, “People need incentives to provide accurate data.”  Yeah.  Tell me about it.  In transcriptomics it’s been a mantra for years that “Garbage in, garbage out,” in terms of incoming biological sample integrity and resulting data quality.  From what I saw, the data you can get trying to measure Global Health is maybe even noisier than the kinds of data that I normally deal with.  My main conjecture for why all hope is not lost due to data quality in Global Health is that GH researchers are able to bias the indicators they sample towards things with (hopefully) real meaning, else they would be adrift in a sea of not very useful data.  Maybe they feel that way anyway?  Bruce also made the point that there are external factors, again, which influence health.  Even people who know where to go for the best treatment may not because the facility is too far away.  Location, location, location.

Speaking of garbage (but not in a bad way), David Phillip‘s talk later that day referred to the problem of trying to extract useful data out of vital health records full of things like garbage codes.  That is, causes of death that are supremely unhelpful from a public health perspective, such as (I’m exaggerating here) death by lack of life.  His work on extracting useful proportions from this data based on the overall data distribution reminded me of imputation techniques that are used in genomics.

There were many more engaging talks, and I also had great conversations at lunch with different people. I suppose I shouldn’t be surprised by the similarities.  I think many research fields these days are converging on a similar emphasis on big data, analytics, efficacy, and finding the right metrics.  I also appreciated the long view shown by so many of these programs.  One of the drawbacks of private industry is the prevalence, often, of the short term view.  I could wish we had the decades-long commitment shown by various Global Health initiatives.

The aspect I find daunting in Global Health is how much uncertainty that community is dealing with, which greatly affects efficacy and efficiency.  An intervention might be exactly the right one when viewed in isolation, but can be so easily derailed by external factors.  Like biology, like baseball, it seems the key thing is to find the metrics that at least tell you that you made the change you hoped for, with the understanding that what happens at the end is so often, unfortunately, out of our control.

Not losing versus playing to win in Baseball and drug development

All opinions are my own and do not necessarily reflect those of Novo Nordisk

Another in a series about parallels between baseball and drug development

A recent post by Phil Birnbaum, who runs a baseball research site, did a nice job of highlighting how he feels stastisical analysis may best serve baseball organizations:  by ensuring that they don’t make losing moves.  While everyone is trying to win, in an industry where so much is uncertain, in many cases, it may be most effective to  “First, concentrate on eliminating bad decisions, not on making good decisions better.  And, second, figure out what  everyone else knows, but we don’t.”

This is a terrific observation, and one he backs up through the body of his post with examples from baseball and gambling strategy.  I think it applies quite well to drug development too.  I’ve made the earlier conjecture that drug development can be thought of as existing on the adaptive landscape, with improvements to drugs or drug classes getting harder and harder as you climb up that mountain of efficacy.  But when you’re on a slope, it’s really easy to go sideways or backwards.  So, the analogy here is that drug development, like baseball, needs to throw a lot of resources (not just statistical and analytical ones, either) into preventing a bad decision.

This thinking is also influenced by the ecology of pharma and biotech.  Let me be very clear about my initial assumption:  drug development is filled with really smart people, almost all of whom are dedicated, sharp, innovative, and really interested in winning.  So is baseball, (well, except maybe for the Kansas City Royals).  But it’s hard to put together a good drug development pipeline.  Resources help.  Resources often help.  But they aren’t enough.  And since the talent is there, the explanation for lackluster drug development progress may partly be found in companies still making poor decisions on assets.

Let me zero in on the second part of the quote in the first paragraph:  “And, second, figure out what everyone else knows, but we don’t.”  Here’s something else that companies could possibly do differently:  share data.  A really fascinating blurb in ScienceInsider just highlighted an effort by people at Johns Hopkins to try and get clinical trial information published, as long as it’s been publicly released in other formats such as through litigation or Freedom of Information Act requests.  While all the companies would prefer this not happen, if it happens uniformly, that can only be good for drug development as researchers learn more about why given trials were halted or failed.  If R&D costs as much as it does, part of the reason lies in duplicated effort.

To conclude, let me throw out another thought on decision making:  send in the crowds.  Crowdsourcing as a method for making decisions has been tried in a number of contexts and often has been found to lead to better overall decision making than more traditional methods.  If we want to make decisions on, for example, which drugs should move forward, setting up a system to poll everyone in the organization in a controlled, anonymous way might be enlightening.  I know this would not be a popular development for people in the C-suite, since, after all, that is their domain.  And I believe the assertion Malcolm Gladwell makes in Outliers that initial, small differences in environment can eventually lead to great differences in ability down the road as individuals get training and experiences not widely available.  Therefore those who are in the C-suite are different in their knowledge and outlook and know more about strategic decisions.  But they still don’t know everything, they still are human, they still have biases.  And a technician working in a lab in Boston may have noticed something in his cell cultures that  no one else is aware of.  If we want to make good decisions, shouldn’t we make sure that everyone possible has a voice?

Supply chains in Drug Development?

This is a response I made to a recent post at Xconomy about the idea of drug development adopting a supply chain approach.  http://www.xconomy.com/san-diego/2013/05/31/test-the-supply-chain-model-this-market-driven-relationship-is-a-fail/.  All opinions are my own and do not necessarily reflect those of Novo Nordisk.

I really appreciate the ongoing conversation about how to fix the problems that appear to be facing drug development–specifically a lack of truly transformative, life- and health-changing new drugs.  I think the idea of a supply chain process in drug development is worth looking at.  However, I am not convinced it will actually fix the problem.

In this piece, Standish Fleming suggests a market driven process isn’t meeting the needs of drug development because the potential suppliers in the market (the startup biotechs) don’t have a clear view of what the eventual buyers (the pharma) really want as part of their strategic goals.  Alignment is often a good thing.  I believe many startups may not have a clear idea of what actually constitutes a good drug as many of them arise out of academia. This is not a criticism, just a statement of how the academic and industrial systems have different cultures, goals and knowledge bases. I also appreciate the point that, with capital harder to get via venture funds, pulling pharma in to replace that investment at an earlier stage requires some sacrifice of control on the part of the biotech, with a corresponding gain in risk sharing and predictability.

But I don’t think alignment is enough.  I worry instead that the key problem is one that’s been suggested by David Shaywitz and others–we just don’t understand enough about diseases to make the next generation of drugs.  It seems that the buyers themselves don’t have a clear idea of what is most likely to make a good drug.  As evidence, I’d suggest that if pharma really knew what they wanted, failures in Phase I-III would be far lower since drugs would never be tested in humans until pharma were sure of an 80-90% success rate.  Baseball aside, a 30% or lower success rate generally doesn’t make for a good business strategy, but that’s what we’ve got.  And I agree with the point that there are a lot of smart people working on the problem across pharma, so it’s not just a question of brainpower.

If pharma can’t easily predict what kinds of drugs will succeed, then this model may just swap out VC funding for pharma funding with the same net effect.  Also, the development of a drug is an incredibly long process.  For a pharma to be able to predict the market ten or more years ahead of time is adding another uncertainty yet.

Since I live in Seattle, I’d like to throw out the analogy of the Dreamliner.  A key reason the Dreamliner exists is because of 9/11.  Before that, Boeing was designing a supersonic passenger jet.  After 9/11, the pressure for nations to become more fuel-efficient to allow less involvement in the Middle East led Boeing to change course and design a plane that would instead be a model of efficient design.  So there’s an example in which changes in the market outside of a company’s control can render all its best plans moot.

Another point about the Dreamliner is that that project relied on a supply chain that ended up delaying launch for over a year.  I know people at Boeing and they have good project managers and good communicators and they told their suppliers exactly what they needed, and problems still arose.  Ever after launch, unexpected issues with batteries grounded the jets again.  How much messier might a supply chain relationship be between biotech and pharma?  Can deadlines and milestones be guaranteed when we won’t know until Phase I if we’re dealing with the next best thing in air travel or a flaming battery?

All this is not to say it couldn’t work; just that I’m skeptical.  I agree the current method seems inefficient and difficult to make work in the current funding environment.  I just wonder if maybe there is a third way.  Now, if only Bill Clinton could get into drug development…

Finding parallels between baseball and drug development

This piece was originally posted on Xconomy on March 1, 2013.  The views in it are my own and do not necessarily reflect the views of Novo Nordisk.

Consider a candidate.  Selecting that candidate takes thousands of hours of time and research–checking background, verifying data, assessing probabilities, projecting futures.  Once selected, more years of development follow, during which time the odds of success are less than 10%.  And if that candidate finally does make it, there’s just a small window of exclusivity before protection expires and that candidates goes out to the broader market.

I’m talking, of course, about baseball players.

So I’m a Mariners fan.  Have been since about 1999 (I moved to Seattle in 1996, so missed the big comeback year and it took me a little time to catch up).  And like all fans, I watch and hope, year after year, looking for signs of improvement, direction, some indication that there’s a plan. I keep looking towards some point in the not too distant future–let’s call it “next year” or even “year after next”–when I’ll once again be able to root for a winning team.

But while the Mariners may still be in what feels like eternal rebuilding, I’ve been able to find a silver lining in my fandom:  I’ve realized drug development seems to be learning from baseball.

Statistics, but the right ones.

There are a lot of parallels between the businesses of baseball and drug development.  Both involve long periods of development followed by limited periods of exclusivity for the product (drugs, players) being developed.  Resources (targets, talent) are rare.  Assets get traded or bought or sold.  There are the juggernauts and the mid-market and the small-market players.  And there’s the always-present need to keep doing more and finding better ways of winning, preferably with less.

One of the more fascinating developments in baseball has been the rise of a new statistical framework around the game.  Baseball has always been the most statistically conscious of sports, but it’s also been the most heavily invested in it’s own history and mythology.  Ken Burns is not making a 18 ½ hour documentary on Arena Football anytime soon.  That reverence for history means there has been a lot of resistance to new ideas.  For almost the entire modern era of baseball, certain statistics (ERA, W-L records, batting average, RBIs) have been the gold standard for performance.  Even though, when it comes down to it, they’re not really the best things to measure if you want to create a winning baseball team.

As Dave Cameron from Fangraphs has discussed on several occasions (like this one), statistics in baseball are how we figure out the the answers to questions.  We might be asking who’s the Most Valuable Player (*cough*Trout*cough*) or what kind of pitcher or hitter a given team should be trying to get through free agency, or whether a player can be expected to sustain his level of performance.  Some statistics like RBIs, venerated for years, are actually not that useful since they partially reflect circumstances outside a hitter’s control but are treated as a direct proxy for ability.  Albert Pujols would have trouble cracking 70 RBIs a year if he were batting 9th.

But okay, drug development.  Better statistics are making their way into drug development, exemplified by the increasing emphasis on Big Data.  Collaborations are getting larger and drug companies are trying hard to capture as much data as possible, whether it’s clinical, metabolic, transcriptional, genomic, proteomic or any other flavor that becomes possible.  However, the key will be figuring out which statistics, which measurements, are really relevant to the main questions drug companies want to answer:  why do people get sick and how can we figure out what a drug will actually do once it gets into the human body?  Are drug companies figuring this out?  And for the moment I’m leaving out biotech startups, since the current bar to taking advantage of Big Data is still beyond the reach of most small companies, at least right now.

In my view the answer is maybe.  The move towards biomarkers throughout the drug development pipeline is reassuring as it shows a realization that we need to measure outcomes more clearly and quickly.  There is also a greater recognition that bioinformatics is a key element of a drug development pipeline.  More importantly, there needs to be a recognition that specific outcomes (the game-winning RBI, the successful PhIII trial) aren’t necessarily justifications for the decision-making that came before.

 

Giving Richie Sexson a multiyear contract to join the Mariners in 2005 was a bad decision.  Advanced metrics had pegged his skillset as a poor fit for the Mariner’s home stadium, and his likelihood of sustaining his performance at that point in his career was low.  As it happened, he did fade away after a couple of years, but the key point is that even if he had performed reasonably throughout his contract, it would still have been a bad decision based on what we knew given our best tools at the time.  Pharma needs to develop those tools to not just gather more data, but figure out how to ask the right questions and trust what the data is saying..  However, it’s not clear that Pharma has reached its Moneyball moment.

Undervalued Assets

Which leads to another lesson from baseball: the under-appreciated asset.  Contrary to what some commentators have suggested, Moneyball wasn’t ultimately about Billy Beane deciding to draft only fat slow guys who could take a walk and get on base.  The real story was the concept of finding the market inefficiencies in Major League Baseball to get an edge.  Oakland plays in a lousy stadium with a putrid revenue stream and a snooty neighbor across the Bay who refuses to let Oakland move to San Jose.  In order to compete, their front office recognized it was necessary to find players and skill sets that were less valued by their competitor even though those skillsets were just as important to winning baseball games as more conventional talents.

During the year chronicled in Moneyball,  on base percentage (OBP) was batting average’s country mouse cousin.  Oakland’s insight was that batting average is just a proxy for not making outs, and in that respect OBP is a lot more important.  A player with a batting average of .300 who never walks makes an out 70% of the time.  But a player with a batting average of .250 and who walks 15% of the time only makes an out in 60% of his plate appearances. In baseball the single most important commodity is outs, of which you have but 27.  Oakland exploited this market inefficiency to get inexpensive guys who may not have always hit the ball, but still got on base.  Here’s the thing:  as market inefficiencies have shifted, so has Oakland’s strategy in player acquisition.

In drug development we can see some pharma searching for that kind of edge.  GSK and others are making a push into orphan diseases, turning an under-appreciated approach into what may become a glutted market to the latecomers.  How else might drug development search for an edge?  The logical answer is: every way it can.  As Jonah Keri described in his excellent book The Extra 2%,  Tampa Bay, another small market team in a lousy stadium deal has nevertheless managed to create a thriving, successful baseball club by taking advantage of every possible way it can compete, whether by taking on undervalued and risky assets or employing probably the most experimental and forward thinking manager in major league baseball.  If there is an advantage to be gained, Tampa Bay is exploring it.

Just recently, it’s been reported that Tampa Bay will face a reduction to their draft pool  budget in 2013 because they spent too much money on International Free Agents.  This might seem like a problem for a team that needs to build via their farm system because of limited revenue.  But it may actually be a calculated risk that they can get more for their money by overstepping MLB rules rather than committing too heavily into what’s thought to be an overall poor US draft cohort this year.  Drug development companies would be well advised to take this kind of approach and encourage a broad exploration of every way in which efficiencies might be gained, whether it’s in discovery, manufacturing, patient recruitment, therapeutic areas or technology.  And most importantly, companies need to set up mechanisms to broadly communicate the results, good and bad, and support and laud all of them, not just the ones that succeed.

Randomness and the unlucky bounce

Which brings me to a last insight from baseball.  Baseball is a probabilistic game.  The best players in the world get a hit three times out of ten.  Random factors, or at least factors so uncontrollable as to essentially behave like random factors, can influence the outcome of a game.  Just ask the Chicago Cubs.  In Moneyball, Billy Beane is described as saying, “My s*** doesn’t work in the playoffs,” by which he meant that the team he built was designed to perform above average, on average, over the course of a 162 game baseball season.  But the playoffs are fickle and any team can beat any other team in a best of 5 or best of 7 series.

We don’t appreciate how much randomness affects everything.  In The Drunkard’s Walk Leonard Mlodinow provides ample evidence about how little we really control everything around us, even though we might think we do.  He also shows the poor grasp people have on probabilities.  In baseball the probabilities are made manifest in the statistics we track, and maybe that’s helped drive the adoption, finally, of better statistical tools.

Baseball is full of random happenings.  Adam Dunn hit 40 home runs a year (more or less)  like a metronome for seven years, and then in 2011 was completely lost at the plate.  And then in 2012 he hit 41.  Drug development is full of randomness too.

Drug development sometimes seems to show a much more deterministic mindset.  I blame the successes of the 80s, when a whole raft of wonderful drugs entered the market, and lulled people into a sense that this kind of productivity could go on forever.  Pipelines came to be viewed by companies and analysts alike as though they were treadmills steadily pushing new drugs forward, as though making drugs was like manufacturing widgets.  And yet, even companies that create widgets (albeit very large and complex widgets) have problems meeting their deadlines and come against unexpected issues.  How much more uncertain is drug development, which deals with trying to figure out how biology works?

What lesson can drug development companies take?  Here there is one important difference:  even a failing baseball team often makes money, whereas a failing pharma company faces being bought or imploding.  On the other hand, poorly performing franchises in the Major Leagues have been threatened with being shut down, or at least moved, so perhaps there are still some parallels there.  One key learning is the value of stability.  Over-reaction to poor results can be deadly to the long-term health of a ballclub, or a company, as it can lead to the loss of talent due to mis-assignment of blame.  Another key point is diversification of revenue streams.  Some of the best positioned ballclubs are there because they have worked hard to increase revenue beyond box-office sales and the occasional t-shirt purchase.  Similarly, some of the best positioned Pharma companies are diversified players like Roche and Johnson and Johnson.

Maybe the most important lesson is to realize in a random world there is no way to guarantee success in drug development, and therefore, the goal is to set up the best processes, with clear measurements and benchmarks; to evaluate constantly but to intervene rarely; to work on increasing the probability of success.  The 2001 Mariners won 116 games and still didn’t even make the World Series.  And yet, few question that they were the best team that year by far.  The goal for the Mariners after that season was to evaluate how they got there, try to separate luck from skill, and attempt to replicate those elements that were under the control of the players and the front office.  That could be the approach taken in drug development as well.

Baseball, or the Movie Industry, or Oil exploration, or…

I’d love to delve into other concepts, like Value Over Replacement Player (VORP) and how we might apply that to drugs and scientists, but that could be a thought for another day.  As the drug development industry continues its struggle with how to carve out its future (because, you know, eventually there won’t be any more companies left to buy), it seems potentially fruitful to try and learn from other industries that have been faced with similar challenges.