Computational Antibody Papers

Filter by tags

All

Filter by published year

All

TitleKey points

2025-07-03
Better antibodies engineered with a GLIMPSE of human data
- language models
- A novel antibody-specific language model, trained on paired human antibody data, and explicitly designed for practical antibody engineering applications.
- The model was trained on a carefully curated dataset of productive, paired sequences, prioritizing biological fidelity over sheer volume or data heterogeneity.
- It uses a masked language modelling (MLM) objective. The initial version was based on RoBERTa, while later versions introduced custom architectural modifications tailored to antibody sequences.
- The model was benchmarked on recapitulating clinical humanization decisions and outperformed prior models such as Sapiens and AntiBERTa.
- It was applied to redesign an existing therapeutic antibody, generating variants with retained or improved affinity, reduced predicted liabilities, and confirmed in vitro performance, including CHO expression and binding assays.
2025-07-03
Mask-prior-guided denoising diffusion improves inverse protein folding
- protein design
- non-antibody stuff
- Novel inverse folding algorithm based on a discrete diffusion framework.
- Unlike earlier methods that focused on masked language modeling (MLM) (e.g., LM-Design) or autoregressive sequence generation (e.g., ProteinMPNN), this work introduces a discrete denoising diffusion model (MapDiff) to iteratively refine protein sequences toward the native sequence. The method incorporates an IPA-based refinement step that selectively re-predicts low-confidence residues.
- Structural input is limited to the protein backbone only, represented as residue-level graphs. All-atom information is not used for either masked or unmasked residues.
- On the CATH 4.2 full test set, their method achieves the best sequence recovery rate of 61.03%, outperforming baselines such as: ProteinMPNN: 48.63% PiFold: 51.40% LM-Design: 53.19% GRADE-IF: 52.63%
- MapDiff also achieves the lowest perplexity (3.46) across models.
2025-07-03
Antibody Design using Chai-2
- generative methods
- binding prediction
- protein design
- Introduced a novel model, Chai-2, that shows over 100× improvement in de novo antibody design success rates compared to prior methods.
- The model is prompted with the structure of the target, epitope residues, and desired antibody format (e.g., scFv or VHH).
- Benchmarking was performed on 52 antigens that had no known antibodies in the PDB, ensuring evaluation on novel, unbiased targets.
- Generated antibodies were structurally and sequentially dissimilar to any known antibodies, indicating that Chai-2 designs novel binders, not memorized ones.
- For VHH (nanobody) formats, the model achieved an experimental hit rate of 20%, validated in a single experimental round.
2025-07-03
Developing drug-like single-domain antibodies (VHH) from in vitro libraries
- experimental techniques
- ngs
- Novel library design technique for VHHs that produces developable and humanized antibodies without the need for further optimization.
- The authors built a humanized VHH phage display library using four therapeutic VHH scaffolds, incorporating CDR1 and CDR2 sequences from human VH3 germline genes (filtered for sequence liabilities) and highly diverse CDR3s from CD19⁺ IgM⁺ human B cells.
- CDR1 and CDR2 libraries were filtered via yeast display for proper folding and protein A binding, while CDR3s were refined to remove poly-tyrosine stretches to reduce polyreactivity.
- An improved library version incorporated CDR1/2 variants selected for heat tolerance and further depleted CDR3s with poly-tyrosine motifs, increasing stability and developability.
- VHHs were tested for expression, thermal stability, aggregation, hydrophobicity, and polyreactivity, showing that the V2 library yielded a higher proportion of drug-like antibodies with favorable biophysical properties.
2025-06-24
AbEpiTope-1.0: Improved antibody target prediction byuse of AlphaFold and inverse folding
- binding prediction
- structure prediction
- A novel method that repurposes AlphaFold-2.3 structure predictions and combines them with inverse folding–based machine learning models to assess antibody-antigen binding accuracy and specificity.
- They generate antibody-antigen complex models using AlphaFold-2.3 and evaluate them using the 'AbAgIoU' metric, which measures the overlap between predicted and true epitope/paratope residues — penalizing both missing and extra contacts.
- They demonstrate that the learned scores can distinguish true from incorrect antibody-antigen pairings (including swapped antibody scenarios), significantly outperforming random baselines.
- The method relies only on antibody and antigen sequences as input, using AlphaFold to model structures — making it applicable in real-world settings where experimental structures are unavailable.
2025-06-24
REPURPOSING ALPHAFOLD3-LIKE PROTEIN FOLDING MODELS FOR ANTIBODY SEQUENCE AND STRUCTURE CO-DESIGN
- generative methods
- protein design
- Novel method to design antibodies based on boltz-1.
- They added a sequence head to boltz-1 to perform simultaneous sequence/structure co-design.
- They employed data from SAbDab to fine-tune boltz-1 on antibody-antigen complexes.
- They compared to dyMEAN and DiffAB looking at amino acid recovery, RMSD and Rosetta InterfaceAnalyzer energy - their model does better on these computational benchmarks.
2025-06-24
Benchmark for Antibody Binding Affinity Maturation and Design
- binding prediction
- language models
- generative methods
- databases
- Benchmark of machine learning models for antibody-antigen binding affinity.
- A curated dataset of over 150,000 antibody-antigen complexes with associated experimental affinity values is compiled from literature.
- The benchmark compares a wide range of model types: language models, inverse folding models, graph-based, and diffusion-based generative models.
- Inverse folding models that are globally structure-aware perform best.
- General protein models like ESM-IF and ProteinMPNN outperform antibody-specific models such as AntiFold, DiffAb, and dyMEAN.
- Surprisingly, ESM-3 underperforms relative to ESM-IF, despite incorporating structural signals and improving upon earlier ESM models.
2025-06-24
NanoBinder: a machine learning assisted nanobody binding prediction tool using Rosetta energy scores
- binding prediction
- nanobodies
- Introduced a novel machine learning method (NanoBinder) to predict the binding probability of nanobody-antigen structural complexes.
- Positive (binding) complexes were sourced from the SAbDab database, which contains experimentally validated nanobody-antigen interactions.
- Negative (non-binding) complexes were generated by structurally aligning nanobodies from different binding complexes (with RMSD < 2 Å) and recombining them with unrelated antigens to create likely non-binding pairs.
- Extracted Rosetta energy features from each complex and trained several machine learning models, including Random Forests, SVMs, AdaBoost, and Decision Trees, to classify binders vs. non-binders. Random Forests showed the best performance.
- They selected antibodies with known antigen targets (e.g., IL-6) and grafted their CDRs onto nanobody scaffolds using Rosetta-based protocols. The resulting nanobody-antigen complexes were evaluated in silico using NanoBinder, and selected candidates were experimentally validated. The predictions showed good correlation with binding outcomes, particularly for identifying non-binders.
2025-06-05
Learning the language of protein-protein interactions
- language models
- binding prediction
- Novel LLM (MINT) that natively encapsulates protein protein interactions.
- MINT (Multimeric INteraction Transformer) extends the ESM-2 protein language model by incorporating a cross-chain attention mechanism. This allows it to process multiple protein sequences simultaneously while preserving inter-sequence relationships and contextual information critical for modeling protein-protein interactions.
- MINT was trained on a large, curated subset of the STRING database, consisting of 96 million high-quality physical protein-protein interactions and 16.4 million unique protein sequences. The training employed a masked language modeling objective adapted for multimeric inputs.
- MINT was benchmarked on several general protein interaction tasks including binary interaction classification, binding affinity prediction (PDB-Bind), and mutational impact prediction (e.g., SKEMPI and MutationalPPI). It consistently outperformed existing PLMs, achieving state-of-the-art performance on multiple datasets such as a 29% improvement over baselines in SKEMPI.
- MINT outperformed antibody-specific models (e.g., IgBert, IgT5, and AbMap) on the FLAB benchmark and SARS-CoV-2 antibody mutant binding prediction tasks. It showed >10% performance improvement on three FLAB datasets and a 14% gain in low-data settings (0.5% training data) for SARS-CoV-2 binding predictions.
2025-06-05
AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks
- developability
- generative methods
- protein design
- Novel generative modeling framework (AbBFN2) using Bayesian Flow Networks (BFNs) for antibody sequence optimization.
- Trains on sequences from Observed Antibody Space (OAS) combined with genetic and biophysical annotations, leveraging a denoising approach for both conditional and unconditional sequence generation. Targets include optimizing Therapeutic Antibody Profiler (TAP) annotations.
- Computationally validated for germline assignment accuracy, species prediction (humanness), and TAP parameter optimization.
- Combines multiple antibody design objectives into a unified, single-step optimization process, unlike existing software methods which are typically specialized for individual tasks.