PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821856
1983965
19841075
19851287
1986996
198711107
198822129
198940169
199040209
199151260
199257317
1993178495
1994347842
19952771,119
19963271,446
19974581,904
19985942,498
19997083,206
20008174,023
20018844,907
20029265,833
200313117,144
200418128,956
2005200310,959
2006221813,177
2007244615,623
2008228217,905
2009230620,211
2010230122,512
2011205324,565
2012220326,768
2013233229,100
2014282631,926
2015228534,211
2016257236,783
2017266639,449
2018262042,069
2019279544,864
2020346148,325
2021273451,059
2022356354,622
2023343658,058
2024350061,558
2025371165,269