MULTI-evolve Explorer

Interactive Protein Evolution Playground
Based on Tran et al., Science (2026) · Arc Institute · Patrick Hsu Lab

MULTI-evolve is a machine-learning-guided protein evolution framework that compresses months of iterative directed evolution into weeks. By training neural networks on ~200 strategically selected pairwise mutant measurements, it predicts which higher-order mutation combinations will produce hyperactive protein variants — without exhaustive screening.

~200
Variants needed for training
256×
Max improvement (APEX)
1 round
Experimental cycles needed

The MULTI-evolve Pipeline

From wild-type protein to hyperactive multi-mutant in five steps:

🧬
PLM Ensemble
Score all single mutations via multiple protein language models
🎯
Top Mutations
Select ~15-20 beneficial single mutations
🔬
Pairwise Testing
Measure all ~100-200 double-mutant combinations
🤖
Neural Network
Train on epistatic data to predict higher-order combos
🚀
Multi-Mutants
Design & test top predicted 5-7 mutation variants

Why Pairwise Epistasis?

Single mutations can improve function, but substantial gains require combining multiple mutations. The challenge: mutations don't combine linearly. Two mutations that each improve function 2× might produce 8× (synergy), 3× (additivity), or even 1× (antagonism) when combined.

By systematically measuring all pairs of beneficial mutations, MULTI-evolve learns these interaction rules and can extrapolate to predict 5-, 6-, and 7-mutation combinations.

Combinatorial Explosion

For a protein of 100 amino acids, the mutation landscape spans 20100 possible variants — more than atoms in the observable universe.

Traditional vs. MULTI-evolve

Epistatic Interaction Heatmap

Explore how pairwise mutations interact. Green = Synergy Red = Antagonism Amber = Additive. Hover cells for details. Select a protein system below:

Antagonism
Synergy Epistatic score = observed − expected (additive)

Epistasis Distribution

Synergy vs. Antagonism Frequency

What is Epistasis?

Epistasis describes the non-additive interaction between mutations. When two beneficial mutations combine, the result may be better (synergistic), worse (antagonistic), or exactly as expected (additive) compared to the sum of individual effects. MULTI-evolve's key insight is that pairwise epistatic patterns are sufficient to predict higher-order combinations.

For two mutations A and B with individual fitness effects fA and fB:

TypeConditionInterpretation
SynergyfAB > fA + fBMutations amplify each other
AdditivefAB ≈ fA + fBIndependent contributions
AntagonismfAB < fA + fBMutations interfere with each other

Protein Language Model Ensemble

MULTI-evolve combines predictions from multiple protein language models to identify beneficial mutations. No single model finds all the hits — ensembling is critical. Across 73 protein datasets, the ensemble identified ~20 beneficial mutations on average, vs. ~11 from any single model.

Model Comparison: Mutations Identified

Scoring Methods

MethodDescriptionStrength
Log-likelihood ratioDirect probability of mutation vs. wild-typeCaptures evolutionary conservation
Normalized scoringAmino acid bias correctionRecovers hidden beneficial mutations (e.g., A134P in APEX)
Key insight: Standard PLMs systematically penalize proline substitutions. Normalized scoring revealed A134P in APEX — a 53× activity boost that all standard methods missed.

Ensemble Coverage

Each model captures different aspects of protein fitness. Combining them provides broader coverage:

PLM Architecture Details

ModelTypeParametersTraining DataKey Feature
ESM-2Sequence650MUniRef50Evolutionary conservation
ESM-1vSequence650MUniRef90Variant effect prediction
ProtTrans-T5Sequence3BBFD+UniRef50Transfer learning
SaProtStructure-aware650MAF2 + UniRef503D context + sequence
ESM-IFInverse folding142MCATH structuresStructure → sequence
ProteinMPNNInverse folding1.7MPDB structuresDesign-oriented

Combinatorial Fitness Predictor

Select beneficial mutations to combine and predict the multi-mutant fitness. The model learns from pairwise epistatic interactions to extrapolate higher-order effects.

Select Mutations to Combine

Click mutations to toggle selection. Predicted fitness updates in real-time.

1.0×
Predicted fold-improvement over wild-type

Individual Mutation Effects

Predicted vs. Measured (Benchmark)

Directed Evolution Simulator

Compare traditional iterative directed evolution against the MULTI-evolve approach. Adjust parameters and watch how each strategy explores the fitness landscape.

15
200
50%

Fitness Over Rounds

Variants Tested (Cumulative)

MULTI-evolve Case Studies

Three protein engineering campaigns demonstrated the framework's power across diverse proteins.

APEX Peroxidase

Target: Enhanced peroxidase activity for proximity labeling in cells

Starting point: Wild-type APEX

Mutations found: Including A134P (53× alone), combined with 6 others

Best variant: 7-mutation APEX achieving 256× over wild-type, 4.8× over previously optimized APEX2

Key discovery: A134P was invisible to standard PLMs due to proline bias. Only normalized ensemble scoring revealed it.

VariantMutationsFold Improvement
Wild-type1.0×
APEX2 (prior art)A134P53×
MULTI-evolve best7 mutations256×

dCasRx for RNA Trans-Splicing

Target: RNA-guided trans-splicing for therapeutic RNA editing

Approach: Started from deep mutational scan of 11,000+ variants, extracted beneficial mutations, tested pairwise combinations

Best variant: 9.8× improvement in trans-splicing efficiency

Significance: Demonstrates MULTI-evolve can work with DMS pre-screening data

Anti-CD122 Antibody

Target: Therapeutic antibody targeting IL-2 receptor β chain (CD122)

Dual optimization: Simultaneously improved binding affinity AND expression

Best variants:

  • 2.7× binding improvement (to 1.0 nM KD)
  • 6.5× expression increase

Clinical relevance: Multi-objective protein optimization is critical for therapeutic development

Cross-Campaign Comparison

Bibliography & Sources

Primary research and key references underpinning the MULTI-evolve framework.

Rapid directed evolution guided by protein language models and epistatic interactions
Tran VQ, Nemeth M, Bartie LJ, et al. Science (2026)
Language models enable zero-shot prediction of the effects of mutations on protein function
Meier J, Rao R, Verkuil R, et al. NeurIPS (2021)
Evolutionary-scale prediction of atomic-level protein structure with a language model
Lin Z, Akin H, Rao R, et al. Science 379 (2023)
SaProt: Protein Language Modeling with Structure-aware Vocabulary
Su J, Han C, Zhou Y, et al. ICLR (2024)
Robust deep learning–based protein sequence design using ProteinMPNN
Dauparas J, Anishchenko I, Bennett N, et al. Science 378 (2022)
Evolved APEX2: an enhanced peroxidase probe for proximity labeling
Lam SS, Martell JD, Kamer KJ, et al. Nature Methods 12 (2015)
Programmable RNA editing with compact CRISPR-Cas13 systems
Konermann S, Lotfy P, Brideau NJ, et al. Nature Methods 15 (2018)
Deep models of protein evolution in time generate realistic evolutionary trajectories
PEINT framework. bioRxiv (2026)
From Prompt to Drug: Toward Pharmaceutical Superintelligence
Zhavoronkov A, et al. ACS Central Science (2026)

Open-Source Resources

ResourceDescriptionLink
MULTI-evolve code PLM predictions, neural network training, oligo design tools GitHub
ESM-2 Protein language model (650M params) GitHub
ProteinMPNN Structure-based sequence design GitHub
SaProt Structure-aware protein language model GitHub
Built by Q · Data from Tran et al., Science (2026) & Arc Institute · Not for clinical use