To address the issue we curated databases of antibody sequences from leading resources such as GenBank, USPTO, WIPO, DDBJ and EBI available at www.naturalantibody.comOur databases currently hold more than 300,000 unique variable region sequences from scientific publications and patents, many of them annotated with their molecular targets. We hope that our databases would facilitate myriad AIRR-seq convergence studies.
The Patented Antibody Database comprises antibody sequences found in patent documents from primary sources (USPTO, WIPO) and third parties (DDBJ, EBI). Current release of the database is October 2020. The database currently encompasses ca 267,722 antibody chains (148,774 heavy chains and 118,948 light chains) from 19,037 patent families.The database is aimed at antibody engineers in industry and academia to help navigate the landscape of the protected antibodies. Below we describe the contents of the database, how to interact with it and how to use the search functionality.
Typical patent disclosures provide characterization of the antibody sequences and molecules that they target.
We extracted antibody sequences from patents from primary sources (USPTO, WIPO) and third parties (DDBJ, EBI) into the Patented Antibody Database (PAD)3.
We only retain full antibody sequence variable regions (all three CDRs and four framework regions) and containing only the 20 canonical amino acids.
Current release (Nov 2020) of PAD holds 267,722 antibody chains (148,774 heavy chains and 118,948 light chains) from 19,037 patent families.
Patent families categorized as having antibodies for medicinal purposes account for 59.60% of documents (out of 15,951 families with CPC classification as of Jan 2020).
Perfect length-matched hits in PAD can be found for 96.98% of therapeutic antibodies with assigned International Nonproprietary Names.
Good correspondence between patented antibody sequences and therapeutics indicates that information contained within patent documents could provide a reflection of engineering choices during development of these molecules3.
Number of antibody sequences in patent documents is rising, suggesting that the number of antibodies reflective of engineering know-how will keep on accumulating3.
Users can perform full-sequence searches of query antibody sequences at http://naturalantibody.com/pad and the closets patented sequence hits will be displayed alongside molecules identified in the patent document (Figure 2).