PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977922
1978325
1979227
1980431
1981839
19821756
1983965
19841075
19851186
1986894
19879103
198820123
198933156
199034190
199145235
199253288
1993160448
1994316764
19952501,014
19962921,306
19974131,719
19985192,238
19996362,874
20007353,609
20017774,386
20028135,199
200311846,383
200416117,994
200517569,750
2006196611,716
2007214113,857
2008198815,845
2009196417,809
2010194519,754
2011170921,463
2012183823,301
2013193525,236
2014225327,489
2015185529,344
2016213231,476
2017214933,625
2018213335,758
2019225638,014
2020275240,766
2021221142,977
2022283545,812
2023272548,537
2024278251,319
2025235353,672