Big data and baseball efficiency: the traveling salesman had nothing on a baseball scout

All opinions are my own and do not necessarily reflect those of Novo Nordisk

The MLB draft is coming up and with any luck I’ll get this posted by Thursday and take advantage of web traffic. I can hope! Anyway, Tuesday in Fangraphs I read a fascinating portrayal of the draft process, laying out the nuts and bolts of how organizations scout for the draft. The piece, written by Tony Blengino (whose essays are rapidly becoming one of my favorite parts of this overall terrific baseball site), describes all the behind the scenes work that happens to prepare a major league organization for the Rule 4 draft. Blengino described the dedication scouts show in following up on all kinds of prospects at the college and high school levels, what they do, how much they need to travel, and especially how much ground they often need to cover to try and lay eyes on every kid in their area.

One neat insight for me was Blengino’s one-word description of most scouts as entrepreneurs. You could think of them almost as founders of a startup, with the kids they scout as the product the scouts are trying to sell to upper layers of management in the organization. As such, everything they can do to get a better handle on a kid’s potential can feed into the pitch to the scouting director.

I respect and envy scouts’ drive to keep looking for the next big thing, the next Jason Heyward or Mike Trout. As Blengino puts it, scouts play “one of the most vital, underrated, and underpaid roles in the game.” While one might make the argument that in MLB, unlike the NFL or NBA, draft picks typically are years away from making a contribution and therefore how important can draft picks be?, numerous studies have shown that the draft presents an incredible opportunity for teams in building and sustaining success. In fact, given that so much of an organization’s success hinges on figuring out which raw kids will be able to translate tools and potential into talent, one could (and others have)  made the argument that scouting is a huge potential market inefficiency for teams to exploit. Although I’ll have a caveat later. But in any case, for a minor league system every team wants to optimize their incoming quality because, like we say in genomic data analysis, “garbage in, garbage out.”

As I was reading this piece, I started thinking about ways to try and create more efficiencies. And I started thinking about Big Data.  Continue reading

The power law relationship in drug development

All opinions are my own and do not necessarily reflect those of Novo Nordisk.

A few weeks ago a friend and I had the great opportunity to go see Nate Silver speak at the University of Washington. He’s a funny, engaging speaker, and for someone like me who makes his living generating and analyzing data, Silver’s work in sports, politics and other fields has been inspirational.  Much of his talk covered elements of his book, The Signal and the Noise, which I read over a year ago. It was good to get a refresher. One of the elements that particularly struck me this time around, to the point that I took a picture of his slide, was the concept of the power law and its empirical relationship to so many of the phenomena we deal with in life.

Nate Silver graph small

Figure 1: Slide from Nate Silver’s talk demonstrating the power law relationship in business–how often the last 20% of accuracy (or quality or sales or…) comes from the last 80% of effort.

Because I spend way too much time thinking about the business of drug development, I started thinking of how this concept applies to our industry and specifically the problem the industry is facing with creating innovative medicines.

Continue reading

The fall and rise of the LEGO Kingdom: A review of “Brick by Brick”

All opinions are my own and do not necessarily reflect those of Novo Nordisk.

When people ask me what I did growing up, they expect me to say “surf.” I know this because when I tell them what I did for fun their next question is always, “What, you didn’t surf?” I didn’t. Still haven’t learned. Instead I did a lot of the things boys all over the US did. I watched TV. I hung out at the mall and at fast food restaurants. And I played with LEGO.

The brick fundamentally hasn’t changed since I was a kid. My son has a bunch and the basic essence is still snapping things together with that satisfying “click,” and the gradual accretion of form and function from individual, generic elements. Kind of like how life evolves, you know? And yet at the same time LEGO has undergone great changes in packaging, themes, toy categories, and target audiences. Today it’s one of the most respected and recognized toy brands in the world. But something I hadn’t realized until reading “Brick by Brick” by David Robertson and Bill Breen is how close LEGO actually came to crashing and burning in the 90s and early aughts, before recovering to once again become a commercial powerhouse.

Continue reading

The innovators dilemma in biopharma part 3. What would disruption look like?

All opinions are my own and do not necessarily reflect those of Novo Nordisk.

h/t to @Frank_S_David, @scientre, and the LinkedIn Group Big Ideas in Pharma Innovation and R&D Productivity for links and ideas

Part 1 is here.

Part 2 is here.

In the previous parts to this series I’ve covered both why the biopharma industry is ripe for disruption, and what the markets might be that could support a nascent, potentially disruptive technology until it matures enough to allow it to supplant the current dominant industry players.  In this final part I’d like to ask what disruption would look like and provide some examples of directions and companies that exemplify what are, to my mind, these sorts of disruptive technologies and approaches. With, I might add, the complete and utter knowledge that I’m wrong about who and what specifically will be disruptive! But in any case, before we can identify disruption, it’s worthwhile to ask what are the key elements of biopharma drug development that serve as real bottlenecks to affecting  human health, since these are the elements most likely to provide an avenue for disruption. Continue reading

Big Data and Public Health: An interview with Dr. Willem van Panhuis about Project Tycho, digitizing disease records, and new ways of doing research in public health

All opinions of the interviewer are my own and do not necessarily reflect those of Novo Nordisk.

One of the huge and perhaps still underappreciated aspects of the internet age is the digitization of information. While the invention of the printing press made the copying of information easy, quick and accurate, print still relied on books and other printed materials that were moved from place to place to spread information. Today digitization of information, cheap (almost free) storage, and the pervasiveness of the internet have vastly reduced barriers to use, transmission and analysis of information.

In an earlier post I described the project by researchers at the University of Pittsburgh that digitized US disease reports over the past 120+ years, creating a computable and freely available database of disease incidence in the US (Project Tycho, http://www.tycho.pitt.edu/) This incredible resource is there for anyone to download and use for research ranging from studies of vaccine efficacy to the building of epidemiological models to making regional public health analyses and comparisons.

Their work fascinates me both for what it said about vaccines and also for its connection to larger issues like Big Data in Public Health. I contacted the lead researcher on the project, Dr. Willem G. van Panhuis and he very kindly consented to an interview. What follows is our conversation about his work and the implications of this approach for Public Health research.

vanPanhuis,Wilbert[brianCohen20131113] (12)_resized

Dr. Willem van Panhuis. Image credit: Brian Cohen, 2013

Kyle Serikawa: Making this effort to digitize the disease records over the past ~120 years sounds like a pretty colossal undertaking. What inspired you and your colleagues to undertake this work?

Dr. Willem van Panhuis: One of the main goals of our center is to make computational models of how diseases spread and are transmitted. We’re inspired by the idea that by making computational models we can help decision makers with their policy choices. For example, in pandemics, we believe computational models will help decision makers to test their assumptions, to see how making different decisions will have different impacts.

So this led us to the thinking behind the current work. We believe that having better and more complete data will lead to better models and better decisions. Therefore, we needed better data.

On top of this, each model needs to be disease specific because each disease acts differently in how it spreads and what effects it has. In contrast, however, the basic data collection process that goes into creating the model for each disease is actually pretty similar across diseases. There is contacting those with the records of disease prevalence and its spread over time, collecting the data and then making the data ready for analysis. There’s considerable effort in that last part, especially as Health Departments often do not have the capacity to spend a lot of time and effort on responding to data requests by scientists.

The challenges are similar–we go through the same process every time we want to model a disease–so when we learned that a great source of much of the disease data in the public domain is in the form of these weekly surveillance reports published in MMWR and precursor journals, we had the idea: if we digitize the data once for all the diseases that would provide a useful resource for everybody.

We can make models for ourselves, but we can also allow others to do the same without duplication of effort. Continue reading