Computational Antibody Papers

Filter by tags

All

Filter by published year

All

TitleKey points

2025-11-17
HeavyBuilder: Analysis of High-Throughput of Antibody Heavy Chain Repertoires in the Structural Space
- server/webapp
- structure prediction
- A heavy-chain–only version of ABodyBuilder2, removing the light-chain component entirely.
- The model is substantially faster than ABodyBuilder2 and comparable to IgFold or AlphaFold2 owing to (i) smaller embedding dimensions (128–256 vs 384 in ABB2), (ii) use of fewer submodels (3 vs 4), (iii) omission of the refinement step by default, and (iv) the inherently shorter sequence length of single heavy chains.
- Accuracy-wise, HeavyBuilder performs on par with ABodyBuilder2, IgFold, and AlphaFold2 for framework and CDRH1–H2, and is slightly better for CDRH3 (∼3.4 Å RMSD vs ∼4 Å for others). While ABodyBuilder2 achieves 2.99 Å on CDRH3, that figure depends on the inclusion of the paired light chain, so the authors note that it is not a fair comparison.
2025-11-17
Machine learning approaches for interpretable antibody property prediction using structural data
- binding prediction
- Description of two models for antibody property prediction , ANTIPASTI (CNN on structural correlation maps for affinity) and INFUSSE (Graph + ProtBERT hybrid for flexibility).
- ANTIPASTI predicts antibody–antigen binding affinity; INFUSSE predicts residue-level B-factors (flexibility).
- Both tested on curated antibody and antibody-antigen datasets (no new wet-lab validation, only structural data).
- B-factor prediction links sequence, structure, and local dynamics-showing that antibody flexibility is partly learnable from data. Trained only on antibody/antigen data and outperforms a baseline trained on generic proteins.
2025-10-28
Paraplume: A fast and accurate paratope prediction method provides insights into repertoire-scale binding dynamics
- paratope prediction
- Novel paratope prediction model.
- It predicts antibody paratopes from sequence alone by concatenating embeddings from six protein language models — AbLang2, AntiBERTy, ESM-2, IgT5, IgBert, and ProtTrans
- It does not require structural antibody data nor antigen data.
- Across three benchmark datasets (PECAN, Paragraph, MIPE), it outperforms all sequence-based and structure-modeling methods, achieving PR-AUC up to ~0.76 and ROC-AUC up to ~0.97.
- The training set is somewhat similar in size to previous methods so the better performance is not due to increase in number of structures in sabdab alone.
- It was benchmarked against a positional-likelihood baseline (predicting commonly binding positions) and surpassed it by a reasonable margin (PR-AUC ~0.73 vs. ~0.62).
2025-10-28
Germline-aware deep learning models and benchmarks for predicting antibody VH–VL pairing
- developability
- ngs
- Introduced a novel pairing predictor for VhVl chains with a clever strategy to sample negative pairs.
- Defines three negative sampling strategies:
- Random pairing, where heavy and light chains are shuffled without constraints.
- V-gene mismatching, where non-native pairs are generated by combining VH and VL sequences drawn from different V-gene families, but within biologically plausible V-gene segments. This captures realistic but unobserved combinations that could occur during recombination.
- Full V(D)J mismatching, where heavy and light chains are paired using completely distinct germline origins across V, D, and J gene segments. This produces negative examples that are maximally diverse yet biologically meaningful, reflecting combinations never seen in natural repertoires.
- Shows that the space of possible VH–VL germline combinations is far larger than what is observed in public datasets, revealing non-random biological constraints on pairing.
- Demonstrates that models trained on V-gene and especially VDJ mismatched datasets achieve the highest and most generalizable performance, outperforming existing methods such as ImmunoMatch, p-IgGen, and Humatch — confirming that biologically grounded negative sampling is key to robust VH–VL pairing prediction.
2025-10-28
peleke-1: A Suite of Protein Language Models Fine-Tuned for Targeted Antibody Sequence Generation
- language models
- generative methods
- Novel LLM suite for designing antibodies.
- Peleke-1 models were fine-tuned on 9,500 antibody–antigen complexes from SAbDab, each annotated with interacting residues identified from crystal structures.
- Structure was incorporated by annotating epitope residues explicitly in antigen sequences, allowing the LLMs to learn binding context without direct 3D input.
- Generated antibodies were assessed for humanness, structural validity, stability (FoldX), and binding affinity (HADDOCK3) across seven benchmark antigens.
- No wet-lab testing was performed.
2025-10-28
BoltzGen: Toward Universal Binder Design
- nanobodies
- protein design
- Novel protein design framework based on a unified all-atom diffusion model that performs both structure prediction and binder generation.
- It is fully open and free.
- Training setup resembles recent diffusion architectures (e.g., AlphaFold3, Chai), but its distinguishing feature is broad wet-lab validation across diverse target types.
- Experimental scale: generated tens of thousands of nanobody and protein designs for 9 novel targets (no homologous complexes in PDB).
- Results: tested 15 designs per target, obtaining nanomolar binders for 6 of 9 targets (≈66% success rate) — a notably strong experimental outcome.
2025-10-16
SimpleFold: Folding Proteins is Simpler than You Think
- structure prediction
- Novel protein folding predictor that shows that using a simpler model architecture one can get quite far.
- Architecture/training: SimpleFold swaps AF2/RF-style pair reps, triangle updates, MSAs, and equivariant blocks for plain Transformer layers trained with a flow-matching objective to generate full-atom structures; rotational symmetry is handled via SO(3) augmentation.
- Training data: It is not crystals-only like previous predictors, the model mixes ~160k PDB experimental structures with large distilled sets from AFDB SwissProt (~270k) and AFESM (≈1.9M; 8.6M for the 3B model), then finetunes on PDB + SwissProt. So practically this is not a head-to-head comparison with other methods as they started from the smaller x-al dataset.
- Performance: It’s competitive but generally below AlphaFold2/RoseTTAFold2/ESMFold on CAMEO22, while on CASP14 the 3B model beats ESMFold but does not surpass AlphaFold2; overall they claim ~95% of AF2/RF2 on most metrics, with especially strong results for ensemble generation.
2025-10-16
Accelerating antibody development: sequence and structure-based models for predicting developability properties via size exclusion chromatography
- developability
- Benchmarking of computational models for predicting antibody aggregation propensity (developability) using size-exclusion chromatography (SEC) readouts.
- Developed an experimental dataset of ~1,200 IgG1 antibodies, measured for monomer percentage and ΔRT (difference in retention time) relative to a reference.
- Evaluated four main prediction pipelines: Sequence + structure-based features (hand-crafted biophysical features from Schrödinger, using AlphaFold2 or ImmuneBuilder for structure). PLM (protein language model) pipeline (e.g., ESM2-8M, fine-tuned or LoRA-adapted). GNN (graph neural network) pipeline using residue graphs from predicted structures. PLM + GNN hybrid pipeline combining sequence embeddings with structural graphs.
- Two structure prediction tools were benchmarked: AlphaFold2 (high accuracy, slow) and ImmuneBuilder (faster, antibody-optimized, slightly less accurate).
- The sequence + structure feature model achieved the highest accuracy overall, but low sensitivity (missed many problematic antibodies).
- The PLM-only pipeline performed nearly as well and offered a much faster, high-throughput solution, making it attractive for early screening.
- The GNN and PLM + GNN approaches performed comparably, with GNN slightly better for ΔRT predictions but more variable.
- Using ImmuneBuilder instead of AlphaFold2 reduced sensitivity slightly but greatly improved speed without major loss of accuracy.
- So all pipelines performed similarly within a narrow performance range, but faster, less resource-intensive approaches (PLM and ImmuneBuilder-based pipelines) offer strong trade-offs for early-stage developability screening.
2025-10-16
An adaptive autoregressive diffusion approach to design active humanized antibodies and nanobodies
- developability
- They introduce a template-free diffusion model for antibody humanization.
- It receives CDR sequences, reconstructing the framework regions without needing humanized templates.
- Benchmarked against Sapiens, Humatch, Llamanade, and AbNatiV across multiple datasets (e.g., HuAb348, Humab25, Nano300), showing improved humanness, germline identity, and binding retention.
- Demonstrates preserved or enhanced binding and stability in vitro, though no direct ADA correlation analysis was performed.
2025-10-16
Revealing bias in antibody language models through systematic training data processing with OAS-explore
- language models
- databases
- Investigation how biases in the Observed Antibody Space (OAS) database, such as overrepresentation of a few donors and limited species or chain diversity, affect the performance and generalizability of antibody language models.
- The authors developed OAS-explore, an open-source pipeline to analyze, filter, balance, and sample OAS data by donor, species, chain type, and publication, enabling systematic assessment of data biases.
- By training 17 RoBERTa models on datasets with different compositions, they found that models struggle to generalize across chain types, species, individuals, and batches, and that even increased donor diversity alone does not guarantee better performance.
- They recommend systematic preprocessing, inclusion of more diverse data, and open sharing of datasets and pipelines to mitigate biases and improve antibody LM robustness.