PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981844
19821862
19831173
19841184
19851296
19869105
198711116
198825141
198945186
199048234
199154288
199266354
1993217571
1994427998
19953221,320
19963851,705
19975312,236
19987042,940
19998373,777
20009374,714
20019995,713
200210606,773
200314808,253
2004204610,299
2005224912,548
2006251815,066
2007286617,932
2008263720,569
2009269023,259
2010271725,976
2011248928,465
2012269331,158
2013290434,062
2014357137,633
2015291940,552
2016340043,952
2017360147,553
2018333850,891
2019371954,610
2020459859,208
2021400563,213
2022493768,150
2023463572,785
2024482977,614
2025517282,786