Summary & Insights
If we could accurately simulate a single human cell, predicting how it responds to any drug or genetic tweak, we would fundamentally rewire the pace of biomedical discovery. This is the ambitious moonshot at the heart of ARC Institute’s work: building foundation models for biology that create “virtual cells.” The conversation delves into why this approach is necessary, arguing that the traditional systems of scientific research—fragmented across disciplines and hampered by misaligned incentives—are too slow to tackle humanity’s most pressing health challenges. By physically colliding experts in neuroscience, immunology, machine learning, and genomics under one roof, ARC aims to cut through these knots and increase the “collision frequency” of ideas, all while developing the AI models that could one day let researchers run experiments at the speed of a neural network’s forward pass.
The virtual cell model is framed as the biological equivalent of AlphaFold, but with a broader scope. Instead of just predicting a protein’s structure, the goal is to map a cell’s manifold of possible states and predict the precise perturbations—like a drug or a edited gene—needed to shift it from a diseased state to a healthy one. The immediate, practical application is to serve as a co-pilot for lab scientists, suggesting testable hypotheses for target discovery. The discussion acknowledges this is a monumental task, with current capabilities placed somewhere between “GPT-1 and GPT-2” levels. Success would be measured not by abstract benchmarks, but by the model’s ability to rediscover Nobel Prize-winning biological transformations, like predicting that the Yamanaka factors can reprogram a skin cell into a stem cell.
A significant portion of the dialogue grapples with translating scientific breakthroughs into tangible drugs and business outcomes. The sobering statistic that 90% of drugs fail in clinical trials underscores a dual problem: we often target the wrong biological pathway, or we lack the right drug molecule to hit the right target effectively. Advanced AI models could dramatically improve the batting average in early discovery, but the subsequent bottlenecks of manufacturing, animal testing, and human clinical trials remain daunting, time-consuming, and expensive. The trillion-dollar market impact of GLP-1 drugs, however, offers a powerful blueprint, demonstrating that tackling large, society-wide health issues can create immense value and, in turn, fuel greater ambition across the entire biotech industry.
Ultimately, the vision extends far beyond a single cell. A reliable virtual cell model would be the fundamental unit from which to build simulations of tissues, organs, and eventually whole-body “digital twins.” This progression mirrors the evolution of AI itself, which started with narrow tasks like language translation before advancing toward general intelligence. The ambition is to compress the timeline of biological discovery from years to moments, enabling a future where millions of in-silico experiments could be run in parallel to rapidly identify treatments for aging, neurodegenerative diseases, and other complex conditions, fundamentally improving the human experience within our lifetime.
Surprising Insights
- Biology’s “Blurry Mirror”: A key premise for building virtual cell models with today’s data is that RNA (transcriptomic) measurements, while not a perfect picture, act as a “mirror” or echo for what’s happening at the protein and metabolic levels. When scaled to billions of data points, these lower-resolution signals can collectively reveal higher-order biological truths.
- Textbooks as Compressed, Incomplete Ground Truth: Textbooks are cited as a source of reliable “ground truth” for training models, but they are acknowledged to be a compressed, two-dimensional representation of biology filled with exceptions. The process of discovery is, in part, about finding those exceptions.
- The “Virtual Cell” as a More Practical Goal than a “Digital Twin”: The term “virtual cell” was chosen precisely because it is seen as a more scoped and rigorous goal than the popular concept of a “digital twin” or “digital avatar” of a human, which operates at a much higher and more abstract level of biological complexity.
- Incentives, Not Just Funding, Slow Science: Beyond well-trodden discussions about research funding, the conversation highlights how academic incentive structures—the need for individual researchers to publish and secure their own careers—actively discourage the deep, multidisciplinary collaboration needed to solve modern, complex problems.
Practical Takeaways
- Increase “Collision Frequency” in Teams: To tackle multidisciplinary problems, intentionally design organizations or projects that force experts from disparate fields (e.g., machine learning, genomics, chemical biology) to work in close physical proximity on shared flagship goals, breaking down silos.
- Focus Models on Practical, Testable Predictions: When developing AI for science, prioritize models that output concrete, testable hypotheses for laboratory experiments (e.g., “try these 12 perturbations”) over those that simply optimize for abstract benchmark scores.
- Participate in Open Challenges to Drive Progress: Engage with open competitions like the Virtual Cell Challenge (virtualcellchallenge.org), which offer prizes and curated datasets to benchmark and accelerate community-wide progress on foundational biological models.
- Bet on Scalable Measurement Technologies Today: In data-hungry fields like biology, invest in and generate data using the measurement technologies that are most scalable right now (like single-cell RNA sequencing), even if they are imperfect, as scaling laws can reveal patterns invisible at smaller data sizes.
- Design for the Bottlenecks You Can’t Yet Solve: Even if AI solves the design problem for perfect drug molecules, recognize that the bottlenecks of physical manufacturing, animal testing, and human clinical trials will remain. Allocate resources and innovation to compressing these downstream phases as well.
Can we make science as fast as software?
In this episode, Erik Torenberg talks with Patrick Hsu (cofounder of Arc Institute) and a16z general partner Jorge Conde about Arc’s “virtual cells” moonshot, which uses foundation models to simulate biology and guide experiments.
They discuss why research is slow, what an AlphaFold-style moment for cell biology could look like, and how AI might improve drug discovery. The conversation also covers hype versus substance in AI for biology, clinical bottlenecks, capital intensity, and how breakthroughs like GLP-1s show the path from science to major business and health impact.
Resources:
Find Patrick on X: https://x.com/pdhsu
Find Jorge on X: https://x.com/JorgeCondeBio
Stay Updated:
Find a16z on X
Find a16z on LinkedIn
Listen to the a16z Podcast on Spotify
Listen to the a16z Podcast on Apple Podcasts
Follow our host: https://twitter.com/eriktorenberg
Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
