Bioinformatic approaches to the investigation of the atavistic genes implicated in cancer

¹ DNA Damage Laboratory, Department of Physics, School of Applied Mathematical and Physical Sciences, Zografou Campus, National Technical University of Athens (NTUA), 15780 Athens, Greece

² Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, 15784 Athens, Greece

³ Izmir Biomedicine and Genome Center (IBG), 35340 Balcova, Izmir, Turkey

⁴ Izmir International Biomedicine and Genome Institute, Genomics and Molecular Biotechnology Department, Dokuz Eylül University, 35340 Balcova, Izmir, Turkey

^*Correspondence: athanasia.pavlopoulou@ibg.edu.tr (Athanasia Pavlopoulou); alexg@mail.ntua.gr (Alexandros G. Georgakilas)

Front. Biosci. (Landmark Ed) 2021, 26(8), 279–311; https://doi.org/10.52586/4944

Submitted: 7 June 2021 | Revised: 19 July 2021 | Accepted: 30 July 2021 | Published: 30 August 2021

This is an open access article under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/).

PDF

Brower Figures

Cite

Abstract

Introduction: Cancer is a widespread phenomenon occurring across multicellular organisms and represents a condition of atavism, wherein cells follow a path of reverse evolution that unlocks a toolkit of ancient pre-existing adaptations by disturbing hub genes of the human gene network. This results to a primitive cellular phenotype which resembles a unicellular life form. Methods: In the present study, we have employed bioinformatic approaches for the in-depth investigation of twelve atavistic hub genes (ACTG1, CTNNA1, CTNND1, CTTN, DSP, ILK, PKN2, PKP3, PLEC, RCC2, TLN1 and VASP), which exhibit highly disrupted interactions in diverse types of cancer and are associated with the formation of metastasis. To this end, phylogenetic analyses were conducted towards unravelling the evolutionary history of those hubs and tracing the origin of cancer in the Tree of Life. Results: Based on our results, most of those genes are of unicellular origin, and some of them can be traced back to the emergence of cellular life itself (atavistic theory). Our findings indicate how deep the evolutionary roots of cancer actually are, and may be exploited in the clinical setting for the design of novel therapeutic approaches and, particularly, in overcoming resistance to antineoplastic treatment.

Keywords

Atavism

Cancer

Bioinformatics

Evolution

Phylogeny

Biological pathways

Unicellularity

Multicellularity

2. Introduction

Cancer’s origin dates back to the emergence of multicellularity itself, about one billion years ago [1], since cancer and cancel-like phenomena have been observed in almost all species that exhibit either clonal or aggregative multicellularity [2], indicating that spontaneous tumor formation has deep evolutionary roots [3]. Cancer is the result of the breakdown of complex molecular and cellular mechanisms, which are necessary to enable multicellular cooperation by regulating cell growth, cell differentiation, cell death and senescence, resulting in a more primitive cellular phenotype that resembles a unicellular life form [4]. Each one of the hallmarks of cancer [5] is a direct “violation” of the principles of multicellular cooperation [2]. In point of fact, the transition from unicellular to multicellular life was only possible when cooperating cells acquired a selective advantage over those who lived independently by inhibiting their own growth and replication [6].

From an evolutionary perspective, cancer is suggested to occur because early in the evolution of life, cells were ‘designed’ so as to maximize their replication capacity. However, cancer is actually uncommon because during the emergence of multicellularity, natural selection at the level of the individual led to the emergence of robust mechanisms to suppress it [1]. Nevertheless, the paradox of cancer is that cancer cells, which initially disrupt the principles of multicellular cooperation, end up implementing those same principles [7, 8], especially in neoplasms of advanced stages [2]; as a consequence, tumors resemble more of an ecosystem than a simple aggregate of cells [1] or a rather “pseudomulticellular neotissue” [9]. Cancer cells represent a lower level of organization of life, similar to our Cambrian ancestors, and as such are capable of transitioning from multicellularity to unicellularity, but can never adapt the phenotype of complex multicellularity [10].

Although healthy cells, of both unicellular and multicellular origin, exhibit a finely tuned coordination of expression of biological processes during carcinogenesis, this coordination is markedly disrupted, resulting in a characteristic up-regulation of genes of unicellular origin, detected in various human cancer types where cancer cells maintain control over cell cycle activity [11], as well as significant inactivation of genes which are associated with multicellularity and, therefore, have evolved more recently [4]. This phenomenon of enhanced segregation of cellular processes with different evolutionary ages is called “mutual exclusivity” and is common among tumors, and is of particular importance for tumor development [4] and, ultimately, cancer progression.

For mutual exclusivity events to occur, certain fundamental alterations to the gene network need to take place. The human gene network consists of two main subnetworks comprised of genes of unicellular and multicellular origin, respectively. The “multicellular” network has been progressively built upon the “unicellular” one during billions of years of evolution, which led to the formation of an intricate network [12, 13], dedicated to maintain the complex phenotypes and cooperative growth required for multicellularity [4]. Those subnetworks are interconnected in the human gene network through “hub” genes that appeared during the early metazoan life in order to enhance intercellular cooperation [14], and more precisely at the evolutionary boundary between unicellularity and multicellularity; as a result, they reflect key points in the evolutionary transition from unicellular to multicellular life [15]. Those genes represent “points of vulnerability”, since mutual exclusivity occurs particularly due to the alteration of their interactions [4]. Hence, only a limited amount of driver mutations is thought to be responsible for the transition of a normal cell to a malignant state [16, 17, 18], indicating that mutations in these hub genes result in widespread dysregulation of the gene network and are sufficient to initiate carcinogenesis, partially, through a process of de-differentiation [19]. Particularly, since the “unicellular” subnetwork is denser and exhibits a higher inside/outside interaction ratio compared to the “multicellular” subnetwork, the former acts as an attractor that renders the cells of multicellular organisms vulnerable to carcinogenesis (principles of atavistic model for cancer evolution) [20].

The phenotype exhibited by cancer cells resembles that of a unicellular form of life because it is achieved through a process of de-differentiation, also referred to as “reverse evolution” [21]. Through a series of consecutive reversionary transitions, cancer progression follows a similar pattern, but in reverse, that is, to the gradual transition from unicellularity to multicellularity [22]. Particularly, the emergence of common characteristics among cancer cells, regardless of the tissue they originated from, indicates that the occurrence and progression of cancer may be a controlled transition from a complex multicellular phenotype to a primitive unicellular one [15]. Through a small number of mutations in hub genes, mutual exclusivity events occur, and cancer cells activate pre-existing ancestral genes and pathways which render cancer cells remarkably robust. Besides, the mechanisms and genes involved in carcinogenesis are mainly evolutionarily ancient and highly conserved, mostly because they play a crucial role in vital cellular functions of a healthy cell [23].

Therefore, according to the “atavistic model” of cancer, cancer represents an atavism at the cellular level and cancer cells are not just “rogue” cells that were generated through a series of random mutations, but rather an ancient form of life that lies dormant within healthy metazoan cells [24]. In other words, cancer cells do not construct a gene network ab intitio and acquire traits through random mutations and a few decades of internal Darwinian selection within the host’s body, but strategically take advantage of certain components of the existing gene networks [24]. Characteristic examples include the healthy cells’ ability to multiply rapidly and migrate during wound healing [25], traits that at the same time render cells vulnerable to cancer [26]. The same applies to rapid angiogenesis, which is necessary in wound healing to supply new cells with oxygen and nutrients [25], but also a hallmark of cancer [27], with cancer cells even utilizing the same angiogenic signaling pathways as the ones used by the healthy cells of a multicellular organism to develop the vascular system [28]. Another example is the behavior of cells in the early developmental stages, such as the invasion of cells into adjacent developing tissues during gastrulation and the ability of some cells to transform from immotile epithelial cells into motile mesenchymal cells, a process termed “epithelial-mesenchymal transition” (EMT). If cells do not possess the necessary capabilities to perform these functions, growth cannot be achieved; however, these same cell capabilities enable cancer cells to metastasize [26].

The pathways of cell differentiation are initially identical to all organisms and then branch off in different taxonomic divisions [29]. Accordingly, tracing the origin of emergence and evolutionary history of certain genes can be important, especially for the most ancient genes that play a crucial role in cancer progression compared to the more recently evolved ones, since the effects of inactivating the former tend to be more pronounced as they are more likely to lead to cell death [24].

In the present study, we investigated certain genes which have previously been proven to exhibit a highly disrupted number of interactions across multiple tumor types and contribute to the phenomenon of mutual exclusivity [4]. More precisely, Trigos and colleagues [4] studied a pair of cellular processes, that is, chromosome organization and cellular junction organization. The co-expression of genes involved in those processes is highly important for the eukaryotic cells of a multicellular organism and the co-expression of those genes was strongly disrupted in tumors. Both of these processes involve genes of both unicellular and multicellular origin, and the dramatic change in the co-expression of those genes suggests that mutual exclusivity between these processes occurs during carcinogenesis and is actually advantageous to the development of different cancer stages such as early cancer, late cancer, and metastasis. A set of twelve atavistic genes, which represent the “hubs” between the unicellular and multicellular processes was identified: RCC1, TLN1, VASP, ACTG1, PLEC, CTTN, DSP, ILK, PKN2, CTNNA1, CTNND1 and PKP3. Those genes act as regulators of co-expression of the genes involved in the two aforementioned processes and their hubness was dramatically changed in seven different solid tumor types. Therefore, due to their central role in the human gene network, those genes represent fundamental points of vulnerability particularly regarding the phenomenon of mutual exclusivity and, consequently, carcinogenesis. To these data is added the fact that these 12 genes interact with genes associated with genomic instability, as well as genes associated with poor prognosis for cancer progression and metastasis [30]. Moreover, these 12 genes play a critical role in regulatory networks associated with genomic instability and metastasis and are generally involved in key processes of carcinogenesis [4]. Therefore, these genes can be considered as pan-cancer molecular markers or regulators of malignancy in diverse cancer tumors [4].

To this end, we have employed a bioinformatics approach to explore the involvement of those genes in various cancer types and performed phylostratigraphic analyses [14, 31], in an effort to elucidate the evolutionary trajectory of these genes aiming towards tracing the origin of cancer in the Tree of Life.

3. Materials and methods

3.1 Sequence database searching

In this study, we followed the evolutionary lineage of the contemporary human species since the organisms or taxonomic divisions under investigation represent important links of human evolution: Homo sapiens (human), Pan troglodytes (chimpanzee), Macaca mulatta (Rhesus monkey), Callithrix jacchus (marmoset), Mus musculus (mouse), Rattus norvegicus (rat), Canis lupus familiaris (dog), Equus caballus (horse), Sus scrofa (pig), Bos taurus (cattle), Tursiops truncatus (dolphin), Pteropus vampyrus (bat), Monodelphis domestica (opossum), Ornythorhynchus anatinus (platypus), Gallus gallus (chicken), Xenopus laevis (frog), Latimeria chalumnae (coelacanth), Danio rerio (zebrafish), Callorhinchus milii (shark), Petromyzon marinus (lamprey), Ciona intestinalis (vase tunicate), Strongylocentrotus purpuratus (sea urchin), Amphimedon queenslandica (sponge), Monosiga brevicollis (choanoflagellate), Saccharomyces cerevisiae (baker’s yeast), Schizosaccharomyces pombe (fission yeast), Actinobacteria, Chlamydiae, Cyanobacteria, Proteobacteria, Firmicutes, Bacteroidetes, Archaea.

The official names of the genes were initially retrieved from the HGNC database [32, 33] and then the accession numbers of the peptide sequences corresponding to the human genes were retrieved from the publicly available non-redundant NCBI Reference Sequence Database (RefSeq) [34]. The amino acid sequences were used subsequently as probes in an extensive series of BLASTp [35] reciprocal searches in order to obtain the canonical homologous amino acid sequences corresponding to the species included in this study. This process was reiterated until no novel sequences could be detected, ensuring that a full representation of each gene’s family is obtained.

3.2 Alignment and phylogenetic analysis

To investigate the evolutionary history of each gene, we conducted comprehensive phylogenetic analysis using the entire length protein sequences of the species under study. The full-length amino acid sequences were aligned with MAFFT v.7 (https://mafft.cbrc.jp/alignment/server/) [36]. The alignments were subsequently used to reconstruct phylogenetic trees by employing two separate methods, Maximum Likelihood, a method based on a heuristic approach for finding the optimal tree that fits the observed data, and Neighbor Joining, a method based on a hierarchical clustering algorithm [37], as implemented in the software package MEGA (https://www.megasoftware.net/), version 10 [38]. Furthermore, MEGA v.10 [38] was used to estimate the best-fit substitution model, which best describes the number of observed amino acid substitutions per position. For both methods of phylogenetic tree reconstruction, bootstrap analyses (150 pseudo-replicates) were conducted in order to evaluate the robustness and the statistical significance of the inferred reconstructed trees. Finally, the phylogenetic trees were visualized with MEGA v.10 [38].

3.3 Pathways identification

The pathways and biological processes in which each gene is involved were retrieved from the Reactome Pathway database [39], the KEGG Pathways database [40, 41] and the biomedical literature.

3.4 Differential gene expression analysis

RNA sequencing data for tumor and corresponding normal tissue samples from the TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) databases, respectively, were retrieved from the GEPIA2 website (http://gepia2.cancer-pku.cn/). The atavistic genes that are differentially expressed (DE) between tumor and normal samples were identified using ANOVA; threshold value for absolute log fold change $\mid{}$ log ${}_{2}$ FC $\mid{}$ $\geq$ 2 and FDR-adjusted p-value (or q-value) $\leq$ 0.05.

3.5 Functional network of hubs

The twelve hub genes were provided as input to the STRING (version 11.0) (https://string-db.org/) database [42] of investigating and visualizing both known and predicted gene/protein associations.

4. Results

4.1 Atavistic genes’ association with cancer

By conducting a thorough and comprehensive review of studies published in PubMed (https://pubmed.ncbi.nlm.nih.gov/) and The Human Protein Atlas [43], we found those cancers each of the 12 genes is involved in; even different types of cancers that can affect the same organ (Table 1, Ref. [43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237]). In this context, certain genes have been shown to exhibit either oncogenic or tumor suppressive effects. In fact, both qualitative and quantitative modifications were identified in the genes under study which are associated with the development and progression of different cancers in different tissues. In particular, the types of alterations observed in those genes and/or their protein products, include aberrant, increased or decreased, expression (Table 2), epigenetic modifications, mutations, gene inactivation or amplification, copy number alterations, altered protein interactions and post-translational modifications, characteristic protein subcellular localization. Furthermore, these genes constitute diagnostic and prognostic biomarkers in cancer, regarding malignancy, disease stage, clinical subtypes, disease progression, metastasis, clinical outcome and survival and prediction of patients’ response to therapeutic treatments. In addition, each gene and/or its products constitute drug targets or are considered as novel potential drug targets for at least one type of cancer.

Table 1.Atavistic hub genes’ association with diverse types of cancers.

	Organ system	Types of cancer	Metastasis	Biomarker	Drug target	Type of alteration	Oncogenic (O) or tumor suppression (TS) effect	Reference
RCC2	Gene’s expression is regulated by p53, thus it plays an important role in metastasis							[44]
	It regulates apoptosis by blocking RAC1 signaling. The gene’s expression levels in tumor cells are used to predict response to chemotherapy.							[45]
	Brain and nervous system	Glioblastoma		+				[46]
	Breast	Breast Cancer	+			Wnt pathway		[47]
	Gastrointestinal	Gastric Cancer				Aberrant expression		[48]
		Colorectal Cancer		+				[49]
		Hepatic Cancer		+				[43]
		Pancreatic Cancer						[50]
	Genitourinary and gynecologic	Epithelial Ovarian Carcinoma	+		+			[51]
		Ovarian Cancer			+	RalA pathway		[52]
		Renal Cancer		+				[43]
		Cervical Cancer		+				[43]
	Skin	Melanoma		+				[43]
	Thoracic and respiratory	Lung Adenocarcinoma	+			Increased expression		[53]
TLN1	Bone and muscle sarcoma	Malignant Fibrous Histiocytoma (MFH)				Mutations		[54]
		Fibrosarcoma (FS)
		Ewing Sarcoma Family of Tumors (EFT)
	Brain and nervous system	Glioblastoma	+					[55]
	Breast	Triple-negative Breast Cancer		+				[56]
	Endocrine system	Thyroid Cancer	+			Increased expression		[57]
	Gastrointestinal	Hepatocellular Carcinoma	+	+		Decreased expression		[58, 59]
		Hepatocellular Carcinoma	+	+		ERK1/2 pathway		[58, 59]
		Colorectal Cancer		+				[43]
	Genitourinary and gynecologic	Prostate Cancer	+	+		Increased expression		[60, 61]
		Renal Cancer		+				[43]
		Ovarian Cancer				Increased expression		[62]
	Head and neck	Nasopharyngeal Carcinoma	+			Increased expression		[63]
	Head and neck	Oral Squamous Cell Carcinoma	+	+		Increased expression		[64]
	Hematopoietic	Chronic Myeloid Leukemia				Increased expression		[65]
VASP	Bone and muscle sarcoma	Osteosarcoma	+					[66]
	Breast	Breast Cancer	+					[67]
	Gastrointestinal	Hepatocellular Carcinoma	+	+		Increased expression		[43, 68]
		Colorectal Cancer	+	+				[69]
		Gastric (stomach) Cancer	+			PI3/AKT pathway		[70]
	Genitourinary and gynecologic	Renal Cell Carcinoma	+	+		Post-translational modifications (phosphorylation)		[43, 71]
	Hematopoietic	Chronic Myeloid Leukemia			+	Aberrant expression		[72]
						Protein interactions
						Post-translational modifications (phosphorylation)
	Skin	Melanoma	+					[73]
	Thoracic and respiratory	Lung Adenocarcinoma		+		Increased expression		[74]
ACTG1	Bone and muscle sarcoma	Osteosarcoma	+	+		Increased expression		[75]
	Breast	Breast Cancer				Increased expression		[76]
	Gastrointestinal	Alcohol related Hepatocellular Carcinoma		+		Increased expression		[77, 78]
	Gastrointestinal	Colorectal Cancer	+	+				[43, 79]
	Genitourinary and gynecologic	Renal Cancer		+				[43]
	Genitourinary and gynecologic	Cervical Cancer		+		Epigenetic modification (methylation)		[80]
	Hematopoietic	Acute Lymphoblastic Leukemia (children)		+		SNPs		[81]
	Skin	Skin Cancer	+	+		Increased expression		[82]
	Skin	Skin Cancer	+	+		ROCK Pathway		[82]
	Thoracic and respiratory	Non-small Cell Lung Cancer	+	+				[83]
PLEC	Gastrointestinal	Hepatocellular Carcinoma	+	+		Decreased expression		[84]
		Colon Cancer		+				[85]
		Pancreatic Ductal Adenocarcinoma	+	+				[86]
	Genitourinary and gynecologic	Renal Cancer		+				[43]
	Genitourinary and gynecologic	Testicular Germ Cell Tumors		+				[87]
	Head and neck	Paranasal sinus carcinoma				Increased expression		[88]
	Head and neck	Oral Squamous Cell Carcinoma		+		Decreased expression		[89]
	Skin	Melanoma	+			Copy number alterations		[90]
	Thoracic and respiratory	Lung Cancer		+				[43]
CTTN	CTTN is a well-established oncogene, associated with advanced disease stage and poor prognosis.							[91]
	Bone and muscle sarcoma	Osteosarcoma		+		Increased expression		[92]
	Brain and nervous system	Glioma				Increased expression		[93]
	Brain and nervous system	Glioblastoma	+					[94]
	Breast	Breast Cancer				Post-translational modification (phosphorylation)		[95]
	Endocrine system	Thyroid Cancer	+		+	Increased expression		[96]
	Endocrine system	Thyroid Cancer	+		+	Protein interactions		[96]
	Gastrointestinal	Gastric Cancer		+		SNP		[97]
		Hepatocellular Carcinoma	+	+		Increased expression		[98, 99]
		Hepatocellular Carcinoma	+	+		Protein interactions		[98, 99]
		Colorectal Cancer	+			Increased expression		[100, 101]
		Colorectal Cancer	+			EGFR-MAPK pathway		[100, 101]
	Genitourinary and gynecologic	Bladder Cancer	+					[102]
		Ovarian Epithelial Cancer		+		Increased expression		[103]
		Prostate Cancer	+			Increased expression		[104]
		Renal Clear Cell Carcinoma	+					[105]
	Head and neck	Head and Neck Squamous Cell Carcinomas		+	+			[91, 106]
		Esophageal Squamous Cell Carcinoma	+	+	+	Increased expression		[107, 108]
		Oral Squamous Cell Carcinoma	+	+	+	Gene amplification		[109, 110]
		Oral Squamous Cell Carcinoma	+	+	+	Increased expression		[109, 110]
		Pharyngolaryngeal Squamous Cell Carcinomas		+	+	Gene amplification		[106]
		Pharyngolaryngeal Squamous Cell Carcinomas		+	+	Increased expression		[106]
		Oropharynx Squamous Cell Carcinoma		+		Gene amplification		[91]
		Oropharynx Squamous Cell Carcinoma		+		Increased expression		[91]
		Laryngeal Cancer		+				[111, 112]
	Hematopoietic	B-cell Acute Lymphoblastic Leukemia	+	+		Increased expression		[113]
	Skin	Melanoma	+			Ubiquitination		[114]
	Skin	Cutaneous Squamous Cell Carcinoma		+		Post-translational modification (phosphorylation)		[115]
	Thoracic and respiratory	Non-small Cell Lung Cancer	+			Increased expression		[116]
DSP	Breast	Breast Cancer	+			Decreased expression		[117]
	Gastrointestinal	Colorectal Cancer		+	+			[118]
		Gastric Cancer				Decreased expression		[119]
		Gastric Cancer				Wnt/ $\beta$ -catenin pathway		[119]
		Hepatocellular Carcinoma	+			Decreased expression		[120]
	Genitourinary and gynecologic	Ovarian Cancer	+	+		Increased expression		[121]
		Ovarian Cancer	+	+		Immune response		[121]
		High-grade Serous Ovarian Cancer		+		Characteristic expression profile		[122]
		Renal Cancer		+				[43]
		Urothelial Cancer		+				[43]
	Head and neck	Oral Squamous Cell Carcinoma	+			Decreased expression		[123]
		Oropharyngeal Squamous Cell Carcinomas	+	+		Decreased expression		[124]
		Oropharyngeal Squamous Cell Carcinomas	+	+		Isomorph		[124]
	Skin	Melanoma	+	+		Immune response		[121]
	Thoracic and respiratory	Lung Adenocarcinoma				Characteristic expression profiles		[125]
		Lung Squamous Cell Carcinoma				Characteristic subcellular localization
		Adenosquamous Carcinoma				Characteristic subcellular localization
		Non-small Cell Lung Cancer		+		Decreased expression due to epigenetic modification (methylation)		[126]
		Non-small Cell Lung Cancer		+		Wnt/ $\beta$ -catenin pathway		[126]
ILK	Increased expression is associated with an aggressive phenotype and metastasis in many types of cancer.							[127]
	Brain and nervous system	Neuroblastoma			+	LIMS1/ILK pathway		[128]
	Brain and nervous system	Glioblastoma		+	+	ILKAP pathway		[129]
	Breast	Breast Cancer	+	+	+	Increased expression		[130, 131, 132]
						Twist-ITGB1-FAK/ILK pathway
						PI3K/Akt pathway
	Endocrine system	Thyroid Cancer		+	+	Aberrant expression		[133]
	Gastrointestinal	Colorectal Cancer	+	+		Increased expression		[134]
		Pancreatic Ductal Adenocarcinoma	+	+		Increased expression		[135]
		Hepatocellular Carcinoma				Increased expression		[136]
		Hepatocellular Carcinoma				Akt activation		[136]
		Gallbladder Squamous Cell Carcinoma	+	+	+	Increased expression		[137]
		Adenosquamous Gallbladder Carcinomas
		Gallbladder Adenocarcinoma
		Pancreatic Cancer	+	+	+	KRAS-ILK regulatory feedback loop		[138]
		Gastric Cancer	+	+		Increased expression		[139]
	Genitourinary and gynecologic	Ovarian Epithelial Cancer	+		+	Increased expression		[140, 141]
		Bladder Cancer	+	+	+	ILK/PI3K/Akt pathway		[142]
		Renal Clear Cell Carcinoma	+	+				[43, 143]
		Prostate Cancer			+	Cell cycle regulation		[144]
	Head and neck	Salivary Adenoid Cystic Carcinoma	+	+		Increased expression		[145]
		Laryngeal Squamous Cell Carcinoma	+	+	+	Increased expression		[146]
		Esophageal Squamous Cell Carcinoma		+		Increased expression		[147]
	Hematopoietic	Chronic Myeloid Leukemia			+			[148]
	Hematopoietic	Acute Myeloid Leukemia			+			[148]
	Skin	Melanoma				Impaired post-translational modification (phosphorylation)		[149]
	Thoracic and respiratory	Non-small Cell Lung Cancer	+	+				[150]
PKN2	Breast	Triple-negative Breast Cancer		+	+	Increased expression	TS	[151, 152]
	Head and Neck	Oral Squamous Cell Carcinoma (smokers)		+	+	Increased expression		[153]
	Head and Neck	Oral Squamous Cell Carcinoma (smokers)		+	+	Post-translational modification (hyperphosphorylation)		[153]
	Gastrointestinal	Colon Cancer		+	+	Decreased expression	TS	[43, 154]
	Gastrointestinal	Colon Cancer		+	+	DUSP6-Erk1/2 pathway	TS	[43, 154]
	Genitourinary and gynecologic	Prostate Cancer		+		Increased expression		[155, 156]
	Genitourinary and gynecologic	Bladder Cancer	+			Increased expression		[155]
CTNNA1	CTNNA1 is generally considered as a tumor suppressor
	Brain and nervous system	Glioblastoma				Increased expression		[157]
	Breast	Luminal Breast Cancer				Increased expression		[158]
		Triple-negative Breast Cancer (basal-like)				NF-kB pathway	TS	[159]
		Lobular Type Breast Carcinoma		+		Aberrant expression		[160]
	Endocrine system	Differentiated Thyroid Carcinoma	+	+		Decreased expression		[161]
	Gastrointestinal	Colorectal Cancer	+	+		Pseudogene CTNNAP1		[162, 163, 164]
						Aberrant expression
						Cell cycle regulation
		Gastric Cancer	+			Deleterious variants		[165]
		Gastric Cancer	+			Mutations		[165]
		Pancreatic Ductal Adenocarcinoma	+	+		Inactivation		[166, 167, 168]
						Aberrant expression
						Decreased expression
						Impaired epigenetic modification (methylation)
		Hepatocellular Carcinoma		+		Decreased expression		[169]
		Cholangiocarcinoma		+		Decreased expression		[170]
	Genitourinary and gynecologic	Bladder Cancer	+			Protein interactions		[171]
		Ovarian Cancer		+		Epigenetic modification (methylation)		[172]
		Renal Cell Carcinoma				Decreased expression		[43, 173]
		Prostate Cancer	+			Inactivation		[174]
		Prostate Cancer	+			Protein interactions		[174]
	Head and neck	Oral Squamous Cell Carcinoma		+		Decrease protein levels		[175]
		Oral Squamous Cell Carcinoma		+		Characteristic subcellular localization		[175]
		Esophageal Cancer	+			Decreased expression		[176]
	Hematopoietic	Myelodysplastic Syndromes		+		Chromosome 5q deletion		[177]
		Myelodysplastic Syndromes				Decreased expression
		Acute Myeloid Leukemia				Epigenetic modifications (methylation and histone deacetylation of the promoter)
	Immune system	Thymoma		+		Characteristic expression profile		[178]
	Skin	Melanoma	+			Decreased expression		[179]
	Skin	Melanoma	+			Inactivation		[179]
	Thoracic and respiratory	Non-small Cell Lung Cancer		+		Decreased expression		[180, 181, 182]
CTNND1	In fact, p120 catenin appears to have both pro-oncogenic and anti-oncogenic effects, depending on the localization and the specific function of p120 catenin in each cell compartment.							[183]
	Loss, downregulation or mislocalization of p120 catenin is observed in most human cancers.							[184]
	Bone and muscle sarcoma	Osteosarcoma	+			Increased expression		[185]
	Bone and muscle sarcoma	Synovial Sarcoma						[186]
	Brain and nervous system	Astrocytoma	+			Abnormal post-translational modification (hyperphosphorylation)		[187]
		Glioblastoma	+	+		Increased expression		[188]
		Neuroblastoma				Protein interactions		[189]
	Breast	Breast Cancer	+	+		Increased expression		[190, 191]
						Wnt/ $\beta$ -catenin pathway
						Isomorphs (basal-like and luminal subtypes)
						Characteristic subcellular localization
		Breast Invasive Lobular Carcinoma	+			Decreased expression		[191, 192]
						Characteristic subcellular localization
						ROCK1 pathway
		Endocrine-resistant Breast Cancer	+			Decreased expression		[193]
		Endocrine-resistant Breast Cancer	+			Characteristic subcellular localization		[193]
		Triple-negative Breast Cancer (basal like)	${}_{+}$			Decreased expression		[191, 193]
		Triple-negative Breast Cancer (basal like)	${}_{+}$			Characteristic subcellular localization		[191, 193]
	Gastrointestinal	Colorectal Cancer	+	+	+	Increased expression		[194, 195, 196]
		Colorectal Cancer	+	+	+	Decreased expression		[194, 195, 196]
		Gastric Cancer	+	+		Increased expression		[197, 198, 199]
		Gastric Cancer	+	+		Characteristic subcellular localization		[197, 198, 199]
		Hepatocellular Carcinoma				Increased expression	O	[200, 201]
						Decreased expression
						Wnt/ $\beta$ -catenin pathway	TS
		Pancreatic Ductal Adenocarcinoma	+	+	+	Decreased expression	TS	[202, 203, 204]
						Characteristic subcellular localization
						RAC1 pathway
		Solid Pseudopapillary Tumors of the Pancreas	+			Decreased expression		[203, 205]
		Solid Pseudopapillary Tumors of the Pancreas	+			Characteristic subcellular localization		[203, 205]
	Genitourinary and gynecologic	Bladder Cancer		+		Decreased expression		[206]
		Bladder Cancer		+		Characteristic subcellular localization		[206]
		Cervical Cancer	+			Aberrant expression		[207]
		Endometrial Cancer				Decreased expression		[208]
		Prostatic Adenocarcinoma	+	+		Decreased expression		[209]
		Renal Cancer	+	+	+	Isophorm Increased expression		[43, 210]
		Ovarian Cancer	+			Characteristic subcellular localization		[211]
		Ovarian Cancer	+			RAC1 pathway		[211]
	Head and neck	Head and Neck Squamous Cell Carcinomas	+			Decreased expression		[212]
		Esophageal Squamous Cell Carcinoma	+			Decreased expression	TS	[213]
		Esophageal Squamous Cell Carcinoma	+			Characteristic subcellular localization	TS	[213]
	Hematopoietic	Acute Lymphoblastic Leukemia		+		Increased expression		[214]
	Skin	Skin Squamous Cell Carcinoma				Decreased expression		[215, 216]
		Skin Squamous Cell Carcinoma				Characteristic subcellular localization		[215, 216]
		Melanoma				Aberrant expression		[179]
		Melanoma				Characteristic subcellular localization		[179]
	Thoracic and respiratory	Lung Cancer		+		Increased expression	O	[217]
		Lung Cancer		+		Characteristic subcellular localization	O	[217]
		Lung Adenocarcinoma	+			Decreased expression		[218]
		Lung Squamous Cell Carcinoma	+			Decreased expression		[218]
		Non-small Cell Lung Cancer				Decreased expression		[219]
PKP3	Downregulation of PKP3 leads to tumor formation, a decrease in cell adhesion, promotion of EMT and metastasis							[220, 221, 222]
	Breast	Breast Cancer		+		Increased expression		[223]
	Gastrointestinal	Gastric Adenocarcinoma				Decreased expression		[224]
		Gastric Adenocarcinoma				Inactivation		[224]
		Gastrointestinal Cancer	+	+		Increased expression		[225]
		Pancreatic Cancer		+				[43]
	Genitourinary and gynecologic	Ovarian Cancer	+	+	+	Increased expression	O	[121, 226, 227]
						MAPK-JNK-ERK1/2-mTOR pathway
						Immune response
		Bladder Cancer	+			Increased expression		[228]
		Bladder Cancer	+			Characteristic subcellular localization		[228]
		Prostatic Adenocarcinoma	+			Increased expression		[229, 230]
						Decreased expression
						Protein interactions
		Renal Cancer		+				[43]
		Uterine Carcinosarcoma		+		Epigenetic modification (methylation)		[231]
	Head and neck	Oropharynx Squamous Cell Carcinoma	+	+		Decreased expression		[232]
		Oropharynx Squamous Cell Carcinoma	+	+		Inactivation		[232]
		Nasopharyngeal Carcinoma	+			Decreased expression (DNP carcinogen factor)		[233]
	Skin	Melanoma	+	+	+	Immune response		[121, 234]
	Thoracic and respiratory	Lung Adenocarcinoma		+	+	Increased expression		[235]
		Mesothelioma	+			Increased expression		[236]
		Non-small Cell Lung Cancer	+			Aberrant expression		[237]

Table 2.The differential expression (DE) status of atavistic genes in diverse TCGA cancers.

Gene	Cancer type	DE
ACTG1	DLBC, THYM	Up
CTNNA1	PAAD, THYM	Up
CTTN	THYM, DLBC	Up
CTTN	OV, LAML, UCS	Down
DSP	CESC, COAD, LUAD, LUSC, OV, THYM, TGCT, UCEC,UCS, READ	Up
DSP	SKCM	Down
ILK	UCEC, UCS	Down
PKP3	UCS, LUSC, OV, THYM, STAD, UCEC, PAAD, READ, CESC, COAD	Up
PKP3	SKCM, LAML	Down
PLEC	PAAD	Up
RCC1	GBM, THYM, DLBC	Up
TLN1	READ	Down
VASP	GBM, PAAD	Up
Up, up-regulation; Down, down-regulation.

The critical role of these genes and their corresponding products in carcinogenesis is highlighted by the fact that they are all involved in cancer-relevant pathways and processes, usually more than one (Table 3). More specifically, the identified cancer signaling pathways include the Wnt pathway (RCC2, DSP, CTNND1), the MAPK pathway (TLN1, CTTN, PKP3), the VEGF pathway (TLN1, CTNNA1, CTNND1), the PI3K-Akt pathway (VASP, ILK, PKN2), the Hippo pathway (ACTG1, CTNNA1), the PPAR pathway (ILK) and the Rap1 pathway (VASP, CTNND1, ACTG1, TLN1). Moreover, particularly vital processes of the cell, which also play a role in carcinogenesis, are eukaryotic cell cycle regulation and mitosis (RCC2, ILK, CTNNA1, CTNND1), apoptosis (ACTG1, PLEC, DSP, CTTN, CTNND1), immune system processes (TLN1, VASP, ACTG1, DSP, PKP3) and interaction with proteoglycans in cancer (ACTG1, CTTN). Finally, the proteins that constitute the adherens junctions (ACTG1, CTNNA1, CTNND1) and contribute to focal adhesions (VASP, TLN1, ACTG1) play a critical role especially in metastasis.

Table 3.Atavistic genes’ association with pathways and processes related to cancer.

RCC2

TLN1

VASP

ACTG1

PLEC

CTTN

DSP

ILK

PKN2

CTNNA1

CTNND1

PKP3

Wnt

MAPK

VEGF

PI3K-Akt

Hippo

PPAR

Rap1

Cell cycle regulation and mitosis

Apoptosis

Immune system processes

Interaction with proteoglycans in cancer

Adherence junctions

Focal adhesion

Of note, based on the STRING database analysis, the most significantly enriched pathway of the 12 gene/proteins is ‘cell junction assembly’ (GO: 0034329; FDR = 4.10 $\times$ 10 ${}^{-15}$ ). In this network (Fig. 1), ten hub genes/gene products appear to interact either physically or functionally in the process of cell juction organization, having an integral role in EMT which is essential in cancer progression and metastasis [238].

Therefore, the importance of the examined genes is enhanced by their involvement in pathways related to carcinogenesis and the progression of the disease in various types of cancer.

Fig. 1.

STRING network depicting the associations (connecting lines) of the hub genes/gene products (nodes) under investigation in the cell junction assembly.

4.2 Phylogenetic analysis

In terms of overall topology, there is congruence between the phylogenetic trees that were generated with both methods. The trees constructed with the maximum likelihood method are considered more accurate and reliable and they display higher bootstrap values (Figs. 2,3,4, 5,6,7,8,9,10,11,12). These high bootstrap values indicate that the tree nodes are statistically significant and the inferred topology is biologically significant. The trees generated with the neighbor-joining method can be found at the Supplementary material.

The ILK, CTNNA1, CTNND1 and PKP3 genes are detected exclusively in multicellular Animals as shown in the phylogenetic trees in Figs. 2,3,4, respectively. Therefore, those genes most likely first appeared in a eukaryotic multicellular organism which was the common ancestor of Metazoa.

Fig. 2.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the ILK genes. TNNI3K is used as outgroup. The branch length represents evolutionary distance. A discrete Gamma distribution was used to model evolutionary rate differences among sites. The scale bar at the lower left denotes the length of amino acid substitutions per site.

Fig. 3.

Maximum likelihood-based unrooted phylogenetic tree of the protein sequences encoded by the CTNNA1, CTNNA2, CTNNA3, CTNND1 and CTNND2 genes. The conventions are the same as in Fig. 2.

Fig. 4.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PKP1, PKP2, PKP3 and PKP4 genes. CTNND2 is used as outgroup. The conventions are the same as in Fig. 2.

Catenins A (CTNNA) and catenins D (CTNND) are members of the catenin superfamily (Fig. 3). CTNNA1, CTNNA2, CTNNA3, CTNND1, CTNND2 genes are paralogs which probably occurred through a series of gene duplication events. The corresponding orthologous proteins of each gene form distinct clades, and as a result the phylogenetic tree is divided into two major subtrees, comprised of protein sequences encoded by CTNNA and CTNND genes, respectively. The subtree of CTNNA includes the protein sequences encoded by CTNNA1, CTNNA2 and CTNNA3. CTNNA2 appears to be the primordial gene as it was first detected in Amphimedon queenslandica, thus it probably first emerged in an ancestor of Porifera after their divergence from Choanoflagellates, since an ortholog was not found in Monosiga brevicollis. On the other hand, CTNNA1 and CTNNA3 were detected for the first time in Callorhinchus milli, and thus, they appeared later in evolution and probably occurred due to duplication events of the CTNNA2 gene, as they most likely first appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since orthologs were not detected in Ciona intestinalis and in fact they demonstrate high similarity to each other. The subtree of catenins D includes the protein sequences encoded by CTNND1 and CTNND2. CTNND2 is apparently the primordial gene as it was first detected in Amphimedon queenslandica, and thus it probably first arose in an ancestor of Porifera after their divergence from Choanoflagellates, since an ortholog was not detected in Monosiga brevicollis. On the other hand, CTNND1 appeared later in evolution and probably occurred due to CTNND2 gene duplication, as it was detected for the first time in Ciona intestinalis. Therefore, CTNND1 might have arisen in an ancestor of Tunicates after their divergence from Echinodermata, given that CTNND1 orthologs were not detected in Strongylocentrotus purpuratus.

The PKP1, PKP2, PKP3, PKP4 genes are likely paralogs (Fig. 4) which probably occurred through a series of gene duplication events of an ancestral PKP gene. The corresponding orthologs of each gene form distinct clades. All PKP genes were detected for the first time in Callorhinchus milli, suggesting that they probably appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since PKP orthologs were not detected in Ciona intestinalis. Based on the inferred phylogenetic tree, however, PKP1 and PKP2 exhibit the highest similarity, followed by PKP3. We could speculate that PKP4 is the primordial gene of this family, since it is basal to PKP1, PKP2 and PKP3. PKP3 appeared later in evolution, probably due to PKP4 gene duplication, while PKP1 and PKP2 have emerged even more recently through another series of duplication events in a chondrichthyan PKP1/2 gene and have not accumulated a large number of mutations.

The VASP, CTTN and DSP genes are detected during the relatively late transition phase from unicellularity to multicellularity, based on the inferred phylogeny shown in Figs. 5,6,7, respectively.

VASP and Enah/Vasp-Like are likely paralogs (Fig. 5). In Monosiga brevicollis, Strongylocentrotus purpuratus, Ciona intestinalis and Callorhinchus milli the corresponding detected genes were annotated as VASP-like in RefSeq, but display high similarity to the VASP genes of the rest of Metazoa.

Fig. 5.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the VASP and ENAH-LIKE/ENAH/VASP-LIKE genes. EVL is used as outgroup. The conventions are the same as in Fig. 2.

Fig. 6.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the CTTN genes. HCLS1 is used as outgroup. The conventions are the same as in Fig. 2.

Fig. 7.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the DSP genes. PLEC is used as outgroup. The conventions are the same as in Fig. 2.

The PLEC and TLN1 genes were detected in unicellular and multicellular organisms, as shown in Figs. 8,9, respectively, and they probably first appeared in a eukaryotic unicellular organism that was the common ancestor of Animalia and Fungi.

PLEC homologs were found in Animalia as well as in Fungi (SAC6 in Saccharomyces cerevisiae and FM1 in Schizosaccharomyces pombe) (Fig. 8).

The TLN1 and TLN2 genes are likely paralogs (Fig. 9), and their corresponding orthologous protein sequences form distinct clades. TLN1/2 is probably the primordial gene of this family, since it was first detected in Amphimedon queenslandica, Monosiga brevicollis, Ciona intestinalis and Strongylocentrotus purpuratus which gave rise to TLN1 and TLN2 after successive gene duplications. Talin homologs were also identified in Fungi, that is, SLA2P in Saccharomyces cerevisiae and END4 in Schizosaccharomyces pombe, as also confirmed in literature [239, 240]. Therefore, although members of the TLN1 gene family were found only in Metazoa, it has deep evolutionary roots in some ancient unicellular eukaryotic organism, which is the common ancestor of Animalia and Fungi.

Fig. 8.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PLEC genes. MAKF1 is used as outgroup. The conventions are the same as in Fig. 2.

Fig. 9.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the TLN1 and TLN2 genes. HIP1R is used as outgroup. The conventions are the same as in Fig. 2.

The characteristically long clade corresponding to Monosiga brevicollis, as shown in Figs. 5,6,7,8,9, is due to the evolutionary course of the organism, as it evolved for many millions of years separately from the rest of the Metazoa.

The evolutionary origin of ACTG1, RCC2 and PKN2 genes can be traced to a universal common ancestor, namely an ancient prokaryotic unicellular organism, that is the common ancestor of Animalia, Bacteria and Archaea as illustrated in Figs. 10,11,12, respectively.

Fig. 10.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the ACTA1, ACTA2, ACTB, ACTC1, ACTG1 and ACTG2 genes. ACTR1, ACTR1A and ACTRB are used as outgroups. The conventions are the same as in Fig. 2.

Fig. 11.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the RCC1 and RCC2 genes. HERC3 is used as outgroup. The conventions are the same as in Fig. 2.

Fig. 12.

Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PKN1, PKN2, PKN3, PKN, PKC genes. SGK2 is used as outgroup. The conventions are the same as in Fig. 2.

The ACTA1, ACTA2, ACTB, ACTC1, ACTG1, ACTG2 genes are likely paralogs (Fig. 10), the orthologs of which cluster together in distinct clades. The phylogenetic tree is divided into two major subtrees, one comprised of protein sequences encoded by genes of different types of cytoplasmic actins and the other includes sequences encoded by genes of different types of muscle actins.

The subtree of cytoplasmic actins harbors the protein sequences encoded by ACTG1 (Actin cytoplasmic 2), ACTB (Actin cytoplasmic 1) as well as various other actin genes. ACTG1 is apparently the ancestral gene of this family as it was detected in Animalia, Bacteria (Microbacterium arborescens (Actinobacteria), Chlamydia trachomatis (Chlamydiae), Leptolyngbya sp. PCC 7376 (Cyanobacteria), Kangiella spongicola (Proteobacteria), Staphylococcus aureus (Firmicutes)), as well as Archaea (Candidatus Heimdallarchaeota (Archaea), Candidatus Lokiarchaeota (Archaea)). The sequences detected in bacteria do not cluster together, but are rather scattered among those corresponding to genes that encode the two types of cytoplasmic actins in Metazoa. This is indicative of high similarity among the aforementioned genes, since the encoded protein sequences appear to be highly conserved. Additionally, Actin genes found in Fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe) exhibit high similarity to Actin genes found in Archaea, which encode cytoplasmic actin 2.

On the other hand, the ACTB gene likely appeared later in evolution and probably occurred due to a duplication event of the ACTG1 gene, as it was detected for the first time in Petromyzon marinus. Also, in Ciona intestinalis, a cytoplasmic actin gene and a muscle actin gene were identified, and in fact the gene encoding the cytoplasmic actin shows high similarity to the only ACTIN gene detected in Monosiga brevicollis. Finally, Actin-85C-Like gene found in Amphimedon queenslandica also demonstrates high similarity to types of cytoplasmic actin.

The subtree of muscle actins, includes the protein sequences encoded by ACTA1 (Actin alpha 1, skeletal muscle), ACTA2 (Actin alpha 2, smooth muscle), ACTC1 (Actin alpha cardiac muscle 1) and ACTG2 (Actin gamma 2, enteric smooth muscle). Muscle actin genes appear to be the primordial genes of all muscle actin encoding genes as they were first identified in Strongylocentrotus purpuratus and Ciona intestinalis; thus, the gene probably first appeared in a common ancestor of Echinodermata and Chordata after their divergence from Porifera, since an ortholog was not detected in Amphimedon queenslandica. ACTA2 and ACTC1 genes were first detected in Callorhinchus milli, ACTA1 in Danio rerio, and ACTG2 in Latimeria chalumnae, respectively. We would expect to find the ACTA1 and ACTG2 genes in Callorhinchus milli, as it is the first organism in this study to have differentiated organs, but the sequences were not detected using BLAST. In any case, we believe that these genes occurred through a series of gene duplication events yielding four paralogs that probably first appeared in an ancestor of the Chondrichthyes after their divergence from Echinodermata and Tunicates.

In summary, those genes encoding different types of cytoplasmic actins are evolutionarily older, since those genes that encode different types of muscle actins appeared later in evolution. Specifically, ACTG1 is the ancestral gene of the entire family, since it is the evolutionarily older out of all paralogs found in Metazoa.

The RCC1 and RCC2 genes are likely paralogs, and their corresponding orthologous protein sequences form distinct clades (Fig. 11). RCC1 is probably the primordial gene of this family as it was detected in Metazoa, Bacteria Bifidobacterium asteroides (Actinobacteria), Parachlamydia sp. C2 (Chlamydiae), Synechococcus sp. WH 8103 (Cyanobacteria), Myxococcus xanthus (Proteobacteria), Cohnella sp. CAU 1483 (Firmicutes), Hymenobacter chitinivorans (Bacteroidetes) and Methanocella conradii (Archaea). RCC1 and RCC2 homologs have also been identified in Fungi, that is, SRM1 in Saccharomyces cerevisiae and PIM1 in Schizosaccharomyces pombe, as confirmed in literature [241, 242]. Furthermore, although the corresponding gene in Monosiga brevicollis and Amphimedon queenslandica is annotated as RCC in RefSeq, it displays high similarity to the RCC1 gene. Thus, RCC1 probably first appeared in a common ancestor of Eukaryotes, Bacteria and Archaea. RCC2 on the other hand, might have appeared later in evolution, as it was detected for the first time in Strongylocentrotus purpuratus, and probably occurred due to duplication of the RCC1 gene in an ancestor of Echinodermata after their divergence from the phylum of Porifera, since an RCC2 ortholog was not detected in Amphimedon queenslandica. Therefore, although RCC2 is found only in Metazoa, its ancestor is traced to some ancient prokaryotic unicellular organism, that is, the common ancestor of Eukaryotes, Bacteria and Archaea.

The PKN1, PKN2, PKN3, PKN, PKC genes are likely paralogs, the orthologs of which cluster together in distinct clades (Fig. 12). PKN was detected in Amphimedon queenslandica, in the unicellular Choanoflagellate Monosiga brevicollis, in Bacteria Streptomyces seoulensis (Actinobacteria), Chlamydia ibidis (Chlamydiae), Chloracidobacterium thermophilum (Cyanobacteria), Escherichia coli (Proteobacteria), Paenibacillus donghaensis (Firmicutes), Spirosoma panaciterrae (Bacteroidetes) and in Archaea (Methanoregula boonei and Thermococcus thioreducens) and is likely the primordial gene of this family and precursor of the PKN1, PKN2 and PKN3 genes which probably occurred through a series of PKN gene duplication events. The PKC gene was also detected in Petromyzon marinus, as well as in Fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe). PKN2 is apparently the oldest one as it was first detected in Ciona intestinalis and Strongylocentrotus purpuratus, so it probably first appeared in a common ancestor of Tunicates and Echinodermata after their divergence from Choanoflagellates, since an PKN2 ortholog was not detected in Monosiga brevicollis. PKN1 most likely emerged later as it was first detected in Callorhinchus milli; it probably first appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since an PKN1 ortholog was not detected in Ciona intestinalis. PKN3 was detected for the first time in Latimeria chalumnae, so it probably first emerged in an ancestor of Actinists after their divergence from Osteichthyes, since an PKN3 ortholog was not detected in Danio rerio. Therefore, although PKN2 was found exclusively in Metazoa, it has very deep evolutionary roots found in some ancient prokaryotic unicellular organism, that is the common ancestor of Animalia, Bacteria and Archaea.

5. Discussion

According to the findings of this in silico study, three different types of genes were identified within the 12 ‘hub’ genes, in terms of their evolutionary age. Fırst and foremost, genes of unicellular origin that are in fact associated with the emergence of the first cellular life forms. In particular, ACTG1, RCC2 and PKN2 were detected in all domains of life, namely the empire Eukaryota (Animal and Fungi kingdoms) and the empire Prokaryota (Eubacteria and Archaebacteria kingdoms). Second, genes of unicellular origin that are associated with the emergence of the first eukaryotic life forms, namely the VASP, DSP, CTTN, PLEC and TLN1 genes. Third, genes of multicellular origin that are in fact associated with the evolution of multicellularity in the Animal kingdom. In particular, the ILK, CTNNA1, CTNND1 and PKP3 genes were detected exclusively in Metazoa; hence, they could be considered as genes of multicellular origin. Therefore, these are highly conserved genes of unicellular origin, which are in fact associated with the emergence of the first cellular life forms. Consequently, the investigated genes are considered to have deep evolutionary roots, with the most recently evolved ones being linked to the emergence of Metazoa and the most ancient ones having an evolutionary age of billions of years, thereby coinciding with the emergence of the first cellular life forms (Fig. 13).

Fig. 13.

Species tree representing the evolutionary relationships among taxa under study. The marker symbols on the tree nodes indicate the common ancestral taxon, to which the evolutionary roots of a given gene family under study are traced.

In addition to their evolutionary age, the genes are involved in multiple cancer-related pathways and processes, and are associated with various forms of cancer, especially metastasis. Furthermore, they are characterized as hub genes and hold a particular position in the human gene network, on the boundary of unicellularity and multicellularity, and thus, contribute to the phenomenon of mutual exclusivity. Of particular note, all 12 genes have been linked to the most aggressive trait of cancer, that is, metastasis, in various types of cancers.

To further support the theory of “atavistic reversion” of cancer, the emerging model of the germ-and-soma life cycle [243] shows that cancer, uses not only first cellular, first eukaryotic, and Metzaoa I genes, but also genes of the late evolved unicellular organisms, respectively stemness and differentiation genes, as well as genome repair genes, including their own mechanisms of cancer stem cell production.

The findings of the present study can be applied on the design of therapeutic strategies, since the investigated genes and their products, as well as the processes and pathways in which they participate, could represent candidate pan-cancer biomarkers and potential targets for the development of a new class of pan-cancer treatment protocols that can be applied to any type of cancer. The development of therapeutic strategies based on the analysis of the configuration of the human gene network itself have been proposed. These approaches entail the cellular processes of a specific evolutionary age to be used as targets, and especially the genes of unicellular or multicellular origin that are highly interconnected and contribute to the phenomenon of mutual exclusivity, rendering them vulnerable and also potential drug targets [15].

Although the concept that cancer is linked with evolution was first proposed almost a century ago, laying in this way the foundation for the atavistic model of cancer [244, 245, 246], the application of evolutionary biology approaches to the study of neoplasms’ formation and progression is a recent endeavor. Comparative oncology is a novel and highly promising field of cancer research that can lead to a deeper understanding of cancer and contribute to the discovery of novel biomarkers and clinical therapeutic strategies.

Several other possible evolutionary approaches to cancer treatment and prevention have been proposed, mainly to address the problem of cancer cells’ remarkable resistance to therapeutic regimens. Given the resilience and diversity of different forms of cancer, the key lies in the common characteristics of all cancer cells, regardless of tumor type. The pathways and genes involved, in particular, can be utilized to the design of drugs that target cancer cells selectively and are effective against any cancer cell [5].

Another very promising approach concerns neoplasm ecology. In particular, the therapeutic approaches that have been proposed include the use of a combination of drugs [247], alteration of the competition between healthy and cancer cells by enhancing the adaptability of the former [248], selection of cells sensitive to chemotherapy [248] or cells exhibiting genetic stability [249] and, above all, treatments aiming at preventing the survival of resistant cells which lead to disease recurrence and recession after treatment [250].

Apart from the type of treatment, the manner in which it is administered is also important, as it can influence the evolutionary dynamics of tumor cells [250]. Chemotherapy, for example, has different effect on cell competition when administered in large individual doses instead of small continuous doses [251], since the latter strategy diminishes the likelihood of creating resistant clones in the neoplasm population [250].

6. Conclusions

Herein, we have made an effort to track the emergence of twelve important hub cancer genes in the evolutionary history by phylostratigraphy. Based on our analyses, those genes that have been proven to play a crucial role in several aspects of cancer biology, as being part of an intricate regulatory network, are evolutionarily ancient, with a high fraction of them (67%) being of unicellular origin and existing well before humans emerged and evolved. The fact that most of the hub genes are of unicellular origin adds further support to the atavistic model of cancer, according to which the biological origin of cancer is believed to date before the emergence of multicellular animals, approximately 600 million years ago. In the light of evolution, cancer arises as a phenomenon that is inextricably linked with multicellularity itself, and therefore this should be taken into consideration, in order to deeply understand and efficiently tackle probably one of the oldest chronic diseases on the planet. In this way, anti-neoplastic therapeutic strategies could be developed tailored to the atavistic model, by targeting the most recently evolved key innovations (i.e., the so-called weaknesses) that have lost their ancestral functionality in cancer.

7. Author contributions

AGG conceived the study; AL and AP designed and run the in silico experiments; AL, IT and AP analyzed the data; AL, IT, AP and AGG wrote the manuscript.

8. Ethics approval and consent to participate

Not applicable.

9. Acknowledgment

Işıl Takan acknowledges the YÖK (Yükseköğretim Kurulu) scholarship.

10. Funding

This research received no external funding.

11. Conflict of interest

The authors declare no conflict of interest.

Abbreviations

CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; COAD, Colon adenocarcinoma; DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; GBM, Glioblastoma multiforme; LAML, Acute Myeloid Leukemia; LUAD, Lung adenocarcinoma; LUSC, Lung squamous cell carcinoma; OV, Ovarian serous cystadenocarcinoma; PAAD, Pancreatic adenocarcinoma; READ, Rectum adenocarcinoma; SKCM, Skin Cutaneous Melanoma; STAD, Stomach adenocarcinoma; TGCT, Testicular Germ Cell Tumors; THYM, Thymoma; UCEC, Uterine Corpus Endometrial Carcinoma; UCS, Uterine Carcinosarcoma.

Associated Data

Figures.zip

Neighbor Joining-based rooted phylogenetic trees.docx

References

[1]

Aktipis CA, Nesse RM. Evolutionary foundations for cancer biology. Evolutionary Applications. 2013; 6: 144–159.