Computational Antibody Papers

Filter by tags

All

Filter by published year

All

TitleKey points

2024-10-08
Toward enhancement of antibody thermostability and affinity by computational design in the absence of antigen
- developability
- generative methods
- Authors demonstrate that using scores from DeepAb one can sort mutations in an antibody that improve affinity and a series of other properties.
- The authors used the DeepAb structure prediction mode model to rank mutations based on their impact on structure prediction confidence, leading to the design of 200 novel anti-hen egg lysozyme (HEL) antibody variants.
- Single-point mutations from a deep mutational scanning (DMS) dataset (Warszawski et al.) were combined into multi-mutation variants (up to 7 mutations), and these variants were selected based on DeepAb scores for experimental testing.
- The designed variants were expressed and tested for thermostability, colloidal stability, and binding affinity to HEL.
- Large percentage of the variants showed improved thermostability (91%) and affinity (94%), with 10% showing significant increases in binding affinity.
- A subset of 27 high-performing variants was further tested for developability characteristics, including nonspecific binding, aggregation propensity, and self-association, ensuring their practical usability.
2024-10-08
AntiFormer: graph enhanced large language model for binding affinity prediction
- language models
- Novel language model applied to predicting antibody binding affinity in antigen-less manner.
- AntiFormer is a graph-based large language model that combines sequence information with graph structures to predict antibody binding affinity. Its dual-flow architecture includes a transformer-based encoder for sequence features and a graph convolutional network (GCN) for capturing structural relationships (from sequence!), offering enhanced prediction accuracy.
- AntiFormer was compared against advanced models like AntiBERTy and AntiBERTa, as well as basic transformer models with 6 and 12 layers, demonstrating superior performance across all evaluation metrics. It shows a better performance but not by a huge margin.
- The model's performance was evaluated using affinity datasets, including the Observed Antibody Space (OAS) database and an additional dataset containing 104,972 antibody sequences with annotated affinity values, highlighting its accuracy and efficiency.
2024-09-04
Adapting protein language models for structure-conditioned design
- language models
- binding prediction
- protein prediction
- Novel language model incorporating structural information, with demonstrated experimental ability to improve design of therapeutic antibodies.
- The new language model, ProseLM, builds upon Progen family of models from the same authors.
- Structural information in the form of structural adapter layers after language model layers, encoding backbone and associated functional annotations.
- Models with more parameters achieve much better perplexity. There is also some improvement by adding tangential context information such as ligands etc.
- They trained an antibody-specific version of ProseLM, only on SABDAB data and it does much better on sequence recovery even than the larger models.
- They use the model to propose mutations for Nivolumab ad Secukinumab, with mutations both in CDRs and Frameworks. THey used structures from the PDB as the basis for designs.
- They found better binders, however if CDRs were re-designed the overall success rate of maintaining binding was lower (25% for Nivolumab) than when frameworks were redesigned (92%).
2024-09-04
p-IgGen: A Paired Antibody Generative Language Model
- generative methods
- language models
- developability
- Novel generative model for antibody sequences that supports Vh/Vl pairing and generation of developable sequences.
- Three models were created, IgGen (unpaired model), p-IgGen (unpaired fine-tuned on pairs) and developable p-IgGen (paired fine-tuned on developable sequences).
- They used ca. 250m unpaired sequences and 1.8m paired sequences for training.
- The model is based on GPT-2 but with rotary position embedding.
- Developable sequences were defined as structural models of the 1.8m that had good TAP metrics (900,000 in total).
- The model is much smaller than many of the models out there, (17m params), so it is more lightweight in training and application.
- The model performs better on immunogenicity prediction than other models but worse on expression prediction.
2024-09-04
Protein loop structure prediction by community-based deep learning and its application to antibody CDR H3 loop modeling
- structure prediction
- Novel CDR-H3 structure prediction method, ComMat based on ensemble sampling.
- Rather than generating a single structure, the method generates several solutions that are then all informing the next iteration.
- The method was integrated into the structure module of AlphaFold2.
- Crucially, with the introduction of the second prediction into the ‘community’, the predictions become better. However these quickly plateau, showing the limits of the approach.
- The method does not produce better results than ABodyBuilder2 and EquiFold.
2024-09-04
Antibody Humanization via Protein Language Model and Neighbor Retrieval
- language models
- developability
- Novel humanization protocol employing language models and large-scale repertoire data.
- Human OAS and germline sequences are embedded using ESM2.
- K-nearest neighbors algorithm is then used to introduce mutations into the ESM-2 embedded query sequence coming from closest functional neighbors in the ESM2-embedded OAS+germlinse space.
- Results of humanized abs are validated experimentally via ELISA.
2024-08-28
AntiBARTy Diffusion for Property Guided Antibody Design
- language models
- generative methods
- developability
- Novel language model AntiBARTy with demonstration of how to use it to diffuse novel antibodies with favorable solubility properties.
- The core model is a BART-based transformer, with 16m parameters.
- It was firstly trained on all human heavy and light chains from OAS (254m heavies and 342m lights <- yes, more lights). This was followed by fine tuning on the higher quality paired data from OAS.
- The diffusion model was based on U-net (CNN used for segmentation of medical images), totaling 3m parameters.
- They define low and high solubility classes as predicted by protein-sol on paired OAS, with roughly 20k samples for each class.
- Overall, one can sample from multivariate to get a vector in Antibarty latent space and use it to get an antibody sequence that is either high or low protein-sol predicted solubility.
2024-08-28
For antibody sequence generative modeling, mixture models may be all you need
- annotation/numbering
- Authors introduce AntPack - software for rapid numbering of antibody sequences, germline identification and humanization.
- Authors use a mixture model (so not ML!) on millions of sequences from NGS.
- The sequences are pre-numbered to standardize them and then assigned to clusters which offer explainability on germline assignment and residue probability at a given position.
- The method is very fast in comparison to HMM-based approaches such as ANARCI.
- Method is available via https://github.com/Wang-lab-UCSD/AntPack
2024-08-28
Unsupervised evolution of protein and antibody complexes with a structure-informed language model
- generative methods
- protein design
- Authors demonstrate that using inverse folding, one can affinity mature antibodies, confirmed experimentally.
- Authors employ ESM-IF as the inverse folding algorithm.
- They take two existing antibodies, bebletovimab and BD55-5840, both instrumental in COVID-19.
- They introduce all possible single point mutations to the Vh and Vl regions (about 4300). They pick the best perplexity for experimental characterization.
- The best perplexity ones have many framework mutations (bebletovimab 10/14 and BD55 5840 3/6). There was only one mutation to CDR-H3 in Bebletovimab.
- Inverse folding mother achieves much better performance when antigen is used as well.
2024-08-28
Linguistics-based formalization of the antibody language as a basis for antibody language models
- language models
- Proposal for modeling antibodies using language, that is more fit-for purpose than current approaches.
- It is plausible to represent antibodies/proteins as language to draw from existing trove of research on natural language.
- Current approaches of porting the models from natural language to proteins/antibodies verbatim, might not release their full potential because of not focusing on key differences between natural language and proteins.
- Authors propose a more fit for purpose formalization, where quite an important part is better token definition and associating them with function. For instance do not simply use amino acids or k-mers but have something more complex such as C*U and RA*, associated with hydrophobicity, binding zinc fingers or similar.