PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821551
1983758
19841068
19851078
1986886
1987995
198817112
198927139
199032171
199143214
199252266
1993140406
1994294700
1995222922
19962641,186
19973791,565
19984562,021
19995562,577
20006383,215
20016813,896
20027004,596
20039775,573
200413866,959
200514538,412
2006160210,014
2007168911,703
2008157513,278
2009148514,763
2010143016,193
2011124617,439
2012137418,813
2013144020,253
2014168221,935
2015138723,322
2016158724,909
2017163926,548
2018162328,171
2019167329,844
2020208831,932
2021163933,571
2022204535,616
2023200437,620
2024203139,651
2025214141,792