Clustering amino acid contents of protein domains: biochemical functions of proteins and implications for origin of biological macromolecules
1 Laboratory Of Chemical Kinetics And Catalysis, Chair Of Physical Chemistry, Chem. Dept. of Moscow State University, Moscow, 119899, Russia and Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA 19107, USA
Front. Biosci. (Landmark Ed) 2001, 6(1), 1–12;
Published: 1 April 2001

Structural classes of protein domains correlate with their amino acid compositions. Several successful algorithms (that use only amino acid composition) have been elaborated for the prediction of structural class or potential biochemical significance. This work deals with dynamic classification (clustering) of the domains on the basis of their amino acid composition. Amino acid contents of domains from a non-redundant PDB set were clustered in 20-dimensional space of amino acid contents. Despite the variations of an empirical parameter and non-redundancy of the set, only one large cluster (tens-hundreds of proteins) surrounded by hundreds of small clusters (1-5 proteins), was identified. The core of the largest cluster contains at least 64% DNA (nucleotide)-interacting protein domains from various sources. About 90% of the proteins of the core are intracellular proteins. 83% of the DNA/nucleotide interacting domains in the core belong to the mixed alpha-beta folds (a+b, a/b), 14% are all-alpha (mostly helices) and all-beta (mostly beta-strands) proteins. At the same time, when core domains that belong to one organism (E.coli) are considered, over 80% of them prove to be DNA/nucleotide interacting proteins. The core is compact: amino acid contents of domains from the core lie in relatively narrow and specific ranges. The core also contains several Fe-S cluster-binding domains, amino acid contents of the core overlap with ferredoxin and CO-dehydrogenase clusters, the oldest known proteins. As Fe-S clusters are thought to be the first biocatalysts, the results are discussed in relation to contemporary experiments and models dealing with the origin of biological macromolecules. The origin of most primordial proteins is considered here to be a result of co-adsorption of nucleotides and amino acids on specific clays, followed by en-block polymerization of the adsorbed mixtures of amino acids.

