The biological library to power drug development's GPT-moment

Everyone’s talking about biotech’s imminent GPT-like moment:

  • "There's no question that digital biology is going to be...one of the biggest revolutions ever...[F]or the very first time in our history, biology has the opportunity to be engineering, not science..."

    Jensen Huang, Berkeley Dean's Speakers Series, March 9th, 2024

  • "The thing I'm personally most excited about [with AI] is doing faster and better scientific discovery."

    Sam Altman, All In Pod, May 10th, 2024

  • "The next big game-changing revolution is in biology."

    Eric Schmidt, Time Magazine ("We Need to be Ready for Biotech’s ChatGPT Moment"), April 16th, 2024.

But we won't see anything close to a GPT-moment in biology without better data.

Our mission is to develop the world’s largest library of human biological data - and eventually the foundational ML models that will unlock its full potential.

This library becomes the foundation for modern biotechnology - the infrastructure underlying the cityscape.

It is supporting - not supplanting - the work that extant biotech companies are doing.

It allows them to focus on their particular disease, and on developing drugs - rather than logistics, legal agreements and data wrangling. 

Eventually - the foundation becomes not only the data, but also models trained on those data. This is one area where models won't become easily commoditized - because the data aren't in the public domain.

Organizations fine tune with data idiosyncratic to their use case (organ; disease) - and leverage our understanding of the rest of the body. Just like enterprises leverage LLMs and fine tune with or retrieve their proprietary assets.

The biological dataset most would expect two decades after the genomic revolution, in an era of electronic health records, doesn’t exist.