MULTI-evolve Explorer

MULTI-evolve is a machine-learning-guided protein evolution framework that compresses months of iterative directed evolution into weeks. By training neural networks on ~200 strategically selected pairwise mutant measurements, it predicts which higher-order mutation combinations will produce hyperactive protein variants — without exhaustive screening.

~200

Variants needed for training

256×

Max improvement (APEX)

1 round

Experimental cycles needed

The MULTI-evolve Pipeline

From wild-type protein to hyperactive multi-mutant in five steps:

🧬

PLM Ensemble

Score all single mutations via multiple protein language models

→

🎯

Top Mutations

Select ~15-20 beneficial single mutations

→

🔬

Pairwise Testing

Measure all ~100-200 double-mutant combinations

→

🤖

Neural Network

Train on epistatic data to predict higher-order combos

→

🚀

Multi-Mutants

Design & test top predicted 5-7 mutation variants

Why Pairwise Epistasis?

Single mutations can improve function, but substantial gains require combining multiple mutations. The challenge: mutations don't combine linearly. Two mutations that each improve function 2× might produce 8× (synergy), 3× (additivity), or even 1× (antagonism) when combined.

By systematically measuring all pairs of beneficial mutations, MULTI-evolve learns these interaction rules and can extrapolate to predict 5-, 6-, and 7-mutation combinations.

Combinatorial Explosion

For a protein of 100 amino acids, the mutation landscape spans 20¹⁰⁰ possible variants — more than atoms in the observable universe.

Traditional vs. MULTI-evolve

Epistatic Interaction Heatmap

Explore how pairwise mutations interact. Green = Synergy Red = Antagonism Amber = Additive. Hover cells for details. Select a protein system below:

Antagonism

Synergy Epistatic score = observed − expected (additive)

Epistasis Distribution

Synergy vs. Antagonism Frequency

What is Epistasis?

Epistasis describes the non-additive interaction between mutations. When two beneficial mutations combine, the result may be better (synergistic), worse (antagonistic), or exactly as expected (additive) compared to the sum of individual effects. MULTI-evolve's key insight is that pairwise epistatic patterns are sufficient to predict higher-order combinations.

For two mutations A and B with individual fitness effects f_A and f_B:

Type	Condition	Interpretation
Synergy	f_AB > f_A + f_B	Mutations amplify each other
Additive	f_AB ≈ f_A + f_B	Independent contributions
Antagonism	f_AB < f_A + f_B	Mutations interfere with each other

Protein Language Model Ensemble

MULTI-evolve combines predictions from multiple protein language models to identify beneficial mutations. No single model finds all the hits — ensembling is critical. Across 73 protein datasets, the ensemble identified ~20 beneficial mutations on average, vs. ~11 from any single model.

Model Comparison: Mutations Identified

Scoring Methods

Method	Description	Strength
Log-likelihood ratio	Direct probability of mutation vs. wild-type	Captures evolutionary conservation
Normalized scoring	Amino acid bias correction	Recovers hidden beneficial mutations (e.g., A134P in APEX)

Key insight: Standard PLMs systematically penalize proline substitutions. Normalized scoring revealed A134P in APEX — a 53× activity boost that all standard methods missed.

Ensemble Coverage

Each model captures different aspects of protein fitness. Combining them provides broader coverage:

PLM Architecture Details

Model	Type	Parameters	Training Data	Key Feature
ESM-2	Sequence	650M	UniRef50	Evolutionary conservation
ESM-1v	Sequence	650M	UniRef90	Variant effect prediction
ProtTrans-T5	Sequence	3B	BFD+UniRef50	Transfer learning
SaProt	Structure-aware	650M	AF2 + UniRef50	3D context + sequence
ESM-IF	Inverse folding	142M	CATH structures	Structure → sequence
ProteinMPNN	Inverse folding	1.7M	PDB structures	Design-oriented

Combinatorial Fitness Predictor

Select beneficial mutations to combine and predict the multi-mutant fitness. The model learns from pairwise epistatic interactions to extrapolate higher-order effects.

Select Mutations to Combine

Click mutations to toggle selection. Predicted fitness updates in real-time.

1.0×

Predicted fold-improvement over wild-type

Individual Mutation Effects

Predicted vs. Measured (Benchmark)

Directed Evolution Simulator

Compare traditional iterative directed evolution against the MULTI-evolve approach. Adjust parameters and watch how each strategy explores the fitness landscape.

Beneficial mutations: 15

Variants per round: 200

Epistasis strength: 50%

Fitness Over Rounds

Variants Tested (Cumulative)

MULTI-evolve Case Studies

Three protein engineering campaigns demonstrated the framework's power across diverse proteins.

APEX Peroxidase

Target: Enhanced peroxidase activity for proximity labeling in cells

Starting point: Wild-type APEX

Mutations found: Including A134P (53× alone), combined with 6 others

Best variant: 7-mutation APEX achieving 256× over wild-type, 4.8× over previously optimized APEX2

Key discovery: A134P was invisible to standard PLMs due to proline bias. Only normalized ensemble scoring revealed it.

Variant	Mutations	Fold Improvement
Wild-type	—	1.0×
APEX2 (prior art)	A134P	53×
MULTI-evolve best	7 mutations	256×

dCasRx for RNA Trans-Splicing

Target: RNA-guided trans-splicing for therapeutic RNA editing

Approach: Started from deep mutational scan of 11,000+ variants, extracted beneficial mutations, tested pairwise combinations

Best variant: 9.8× improvement in trans-splicing efficiency

Significance: Demonstrates MULTI-evolve can work with DMS pre-screening data

Anti-CD122 Antibody

Target: Therapeutic antibody targeting IL-2 receptor β chain (CD122)

Dual optimization: Simultaneously improved binding affinity AND expression

Best variants:

2.7× binding improvement (to 1.0 nM K_D)
6.5× expression increase

Clinical relevance: Multi-objective protein optimization is critical for therapeutic development

Cross-Campaign Comparison

Bibliography & Sources

Primary research and key references underpinning the MULTI-evolve framework.

Rapid directed evolution guided by protein language models and epistatic interactions

Tran VQ, Nemeth M, Bartie LJ, et al. Science (2026)

doi:10.1126/science.aea1820

Language models enable zero-shot prediction of the effects of mutations on protein function

Meier J, Rao R, Verkuil R, et al. NeurIPS (2021)

ESM-1v preprint

Evolutionary-scale prediction of atomic-level protein structure with a language model

Lin Z, Akin H, Rao R, et al. Science 379 (2023)

doi:10.1126/science.ade2574 — ESM-2

SaProt: Protein Language Modeling with Structure-aware Vocabulary

Su J, Han C, Zhou Y, et al. ICLR (2024)

SaProt preprint

Robust deep learning–based protein sequence design using ProteinMPNN

Dauparas J, Anishchenko I, Bennett N, et al. Science 378 (2022)

doi:10.1126/science.add2187

Evolved APEX2: an enhanced peroxidase probe for proximity labeling

Lam SS, Martell JD, Kamer KJ, et al. Nature Methods 12 (2015)

doi:10.1038/nmeth.3179

Programmable RNA editing with compact CRISPR-Cas13 systems

Konermann S, Lotfy P, Brideau NJ, et al. Nature Methods 15 (2018)

doi:10.1038/s41592-018-0045-z — CasRx

Deep models of protein evolution in time generate realistic evolutionary trajectories

PEINT framework. bioRxiv (2026)

PEINT preprint

From Prompt to Drug: Toward Pharmaceutical Superintelligence

Zhavoronkov A, et al. ACS Central Science (2026)

doi:10.1021/acscentsci.5c01473

Open-Source Resources

Resource	Description	Link
MULTI-evolve code	PLM predictions, neural network training, oligo design tools	GitHub
ESM-2	Protein language model (650M params)	GitHub
ProteinMPNN	Structure-based sequence design	GitHub
SaProt	Structure-aware protein language model	GitHub

The MULTI-evolve Pipeline

Why Pairwise Epistasis?

Combinatorial Explosion

Traditional vs. MULTI-evolve

Epistatic Interaction Heatmap

Epistasis Distribution

Synergy vs. Antagonism Frequency

What is Epistasis?

Protein Language Model Ensemble

Model Comparison: Mutations Identified

Scoring Methods

Ensemble Coverage

PLM Architecture Details

Combinatorial Fitness Predictor

Select Mutations to Combine

Individual Mutation Effects

Predicted vs. Measured (Benchmark)

Directed Evolution Simulator

Fitness Over Rounds

Variants Tested (Cumulative)

Simulation Results

MULTI-evolve Case Studies

APEX Peroxidase

dCasRx for RNA Trans-Splicing

Anti-CD122 Antibody

Cross-Campaign Comparison

Bibliography & Sources

Open-Source Resources