MULTI-evolve is a machine-learning-guided protein evolution framework that compresses months of iterative directed evolution into weeks. By training neural networks on ~200 strategically selected pairwise mutant measurements, it predicts which higher-order mutation combinations will produce hyperactive protein variants — without exhaustive screening.
From wild-type protein to hyperactive multi-mutant in five steps:
Single mutations can improve function, but substantial gains require combining multiple mutations. The challenge: mutations don't combine linearly. Two mutations that each improve function 2× might produce 8× (synergy), 3× (additivity), or even 1× (antagonism) when combined.
By systematically measuring all pairs of beneficial mutations, MULTI-evolve learns these interaction rules and can extrapolate to predict 5-, 6-, and 7-mutation combinations.
For a protein of 100 amino acids, the mutation landscape spans 20100 possible variants — more than atoms in the observable universe.
Explore how pairwise mutations interact. Green = Synergy Red = Antagonism Amber = Additive. Hover cells for details. Select a protein system below:
For two mutations A and B with individual fitness effects fA and fB:
| Type | Condition | Interpretation |
|---|---|---|
| Synergy | fAB > fA + fB | Mutations amplify each other |
| Additive | fAB ≈ fA + fB | Independent contributions |
| Antagonism | fAB < fA + fB | Mutations interfere with each other |
MULTI-evolve combines predictions from multiple protein language models to identify beneficial mutations. No single model finds all the hits — ensembling is critical. Across 73 protein datasets, the ensemble identified ~20 beneficial mutations on average, vs. ~11 from any single model.
| Method | Description | Strength |
|---|---|---|
| Log-likelihood ratio | Direct probability of mutation vs. wild-type | Captures evolutionary conservation |
| Normalized scoring | Amino acid bias correction | Recovers hidden beneficial mutations (e.g., A134P in APEX) |
Each model captures different aspects of protein fitness. Combining them provides broader coverage:
| Model | Type | Parameters | Training Data | Key Feature |
|---|---|---|---|---|
| ESM-2 | Sequence | 650M | UniRef50 | Evolutionary conservation |
| ESM-1v | Sequence | 650M | UniRef90 | Variant effect prediction |
| ProtTrans-T5 | Sequence | 3B | BFD+UniRef50 | Transfer learning |
| SaProt | Structure-aware | 650M | AF2 + UniRef50 | 3D context + sequence |
| ESM-IF | Inverse folding | 142M | CATH structures | Structure → sequence |
| ProteinMPNN | Inverse folding | 1.7M | PDB structures | Design-oriented |
Select beneficial mutations to combine and predict the multi-mutant fitness. The model learns from pairwise epistatic interactions to extrapolate higher-order effects.
Click mutations to toggle selection. Predicted fitness updates in real-time.
Compare traditional iterative directed evolution against the MULTI-evolve approach. Adjust parameters and watch how each strategy explores the fitness landscape.
Three protein engineering campaigns demonstrated the framework's power across diverse proteins.
Target: Enhanced peroxidase activity for proximity labeling in cells
Starting point: Wild-type APEX
Mutations found: Including A134P (53× alone), combined with 6 others
Best variant: 7-mutation APEX achieving 256× over wild-type, 4.8× over previously optimized APEX2
Key discovery: A134P was invisible to standard PLMs due to proline bias. Only normalized ensemble scoring revealed it.
| Variant | Mutations | Fold Improvement |
|---|---|---|
| Wild-type | — | 1.0× |
| APEX2 (prior art) | A134P | 53× |
| MULTI-evolve best | 7 mutations | 256× |
Target: RNA-guided trans-splicing for therapeutic RNA editing
Approach: Started from deep mutational scan of 11,000+ variants, extracted beneficial mutations, tested pairwise combinations
Best variant: 9.8× improvement in trans-splicing efficiency
Significance: Demonstrates MULTI-evolve can work with DMS pre-screening data
Target: Therapeutic antibody targeting IL-2 receptor β chain (CD122)
Dual optimization: Simultaneously improved binding affinity AND expression
Best variants:
Clinical relevance: Multi-objective protein optimization is critical for therapeutic development
Primary research and key references underpinning the MULTI-evolve framework.