PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983863
19841073
19851083
1986891
198710101
198821122
198934156
199033189
199145234
199251285
1993163448
1994316764
19952481,012
19962981,310
19974201,730
19985212,251
19996402,891
20007343,625
20017854,410
20028115,221
200311856,406
200416128,018
200517639,781
2006196611,747
2007213913,886
2008198515,871
2009196917,840
2010194219,782
2011170821,490
2012184223,332
2013193825,270
2014227027,540
2015185929,399
2016214131,540
2017214833,688
2018215235,840
2019225638,096
2020274440,840
2021221543,055
2022285445,909
2023272748,636
2024293551,571
2025293254,503