Introduction: Cancer is a widespread phenomenon occurring across multicellular organisms and represents a condition of atavism, wherein cells follow a path of reverse evolution that unlocks a toolkit of ancient pre-existing adaptations by disturbing hub genes of the human gene network. This results to a primitive cellular phenotype which resembles a unicellular life form. Methods: In the present study, we have employed bioinformatic approaches for the in-depth investigation of twelve atavistic hub genes (ACTG1, CTNNA1, CTNND1, CTTN, DSP, ILK, PKN2, PKP3, PLEC, RCC2, TLN1 and VASP), which exhibit highly disrupted interactions in diverse types of cancer and are associated with the formation of metastasis. To this end, phylogenetic analyses were conducted towards unravelling the evolutionary history of those hubs and tracing the origin of cancer in the Tree of Life. Results: Based on our results, most of those genes are of unicellular origin, and some of them can be traced back to the emergence of cellular life itself (atavistic theory). Our findings indicate how deep the evolutionary roots of cancer actually are, and may be exploited in the clinical setting for the design of novel therapeutic approaches and, particularly, in overcoming resistance to antineoplastic treatment.
Cancer’s origin dates back to the emergence of multicellularity itself, about one billion years ago [1], since cancer and cancel-like phenomena have been observed in almost all species that exhibit either clonal or aggregative multicellularity [2], indicating that spontaneous tumor formation has deep evolutionary roots [3]. Cancer is the result of the breakdown of complex molecular and cellular mechanisms, which are necessary to enable multicellular cooperation by regulating cell growth, cell differentiation, cell death and senescence, resulting in a more primitive cellular phenotype that resembles a unicellular life form [4]. Each one of the hallmarks of cancer [5] is a direct “violation” of the principles of multicellular cooperation [2]. In point of fact, the transition from unicellular to multicellular life was only possible when cooperating cells acquired a selective advantage over those who lived independently by inhibiting their own growth and replication [6].
From an evolutionary perspective, cancer is suggested to occur because early in the evolution of life, cells were ‘designed’ so as to maximize their replication capacity. However, cancer is actually uncommon because during the emergence of multicellularity, natural selection at the level of the individual led to the emergence of robust mechanisms to suppress it [1]. Nevertheless, the paradox of cancer is that cancer cells, which initially disrupt the principles of multicellular cooperation, end up implementing those same principles [7, 8], especially in neoplasms of advanced stages [2]; as a consequence, tumors resemble more of an ecosystem than a simple aggregate of cells [1] or a rather “pseudomulticellular neotissue” [9]. Cancer cells represent a lower level of organization of life, similar to our Cambrian ancestors, and as such are capable of transitioning from multicellularity to unicellularity, but can never adapt the phenotype of complex multicellularity [10].
Although healthy cells, of both unicellular and multicellular origin, exhibit a finely tuned coordination of expression of biological processes during carcinogenesis, this coordination is markedly disrupted, resulting in a characteristic up-regulation of genes of unicellular origin, detected in various human cancer types where cancer cells maintain control over cell cycle activity [11], as well as significant inactivation of genes which are associated with multicellularity and, therefore, have evolved more recently [4]. This phenomenon of enhanced segregation of cellular processes with different evolutionary ages is called “mutual exclusivity” and is common among tumors, and is of particular importance for tumor development [4] and, ultimately, cancer progression.
For mutual exclusivity events to occur, certain fundamental alterations to the gene network need to take place. The human gene network consists of two main subnetworks comprised of genes of unicellular and multicellular origin, respectively. The “multicellular” network has been progressively built upon the “unicellular” one during billions of years of evolution, which led to the formation of an intricate network [12, 13], dedicated to maintain the complex phenotypes and cooperative growth required for multicellularity [4]. Those subnetworks are interconnected in the human gene network through “hub” genes that appeared during the early metazoan life in order to enhance intercellular cooperation [14], and more precisely at the evolutionary boundary between unicellularity and multicellularity; as a result, they reflect key points in the evolutionary transition from unicellular to multicellular life [15]. Those genes represent “points of vulnerability”, since mutual exclusivity occurs particularly due to the alteration of their interactions [4]. Hence, only a limited amount of driver mutations is thought to be responsible for the transition of a normal cell to a malignant state [16, 17, 18], indicating that mutations in these hub genes result in widespread dysregulation of the gene network and are sufficient to initiate carcinogenesis, partially, through a process of de-differentiation [19]. Particularly, since the “unicellular” subnetwork is denser and exhibits a higher inside/outside interaction ratio compared to the “multicellular” subnetwork, the former acts as an attractor that renders the cells of multicellular organisms vulnerable to carcinogenesis (principles of atavistic model for cancer evolution) [20].
The phenotype exhibited by cancer cells resembles that of a unicellular form of life because it is achieved through a process of de-differentiation, also referred to as “reverse evolution” [21]. Through a series of consecutive reversionary transitions, cancer progression follows a similar pattern, but in reverse, that is, to the gradual transition from unicellularity to multicellularity [22]. Particularly, the emergence of common characteristics among cancer cells, regardless of the tissue they originated from, indicates that the occurrence and progression of cancer may be a controlled transition from a complex multicellular phenotype to a primitive unicellular one [15]. Through a small number of mutations in hub genes, mutual exclusivity events occur, and cancer cells activate pre-existing ancestral genes and pathways which render cancer cells remarkably robust. Besides, the mechanisms and genes involved in carcinogenesis are mainly evolutionarily ancient and highly conserved, mostly because they play a crucial role in vital cellular functions of a healthy cell [23].
Therefore, according to the “atavistic model” of cancer, cancer represents an atavism at the cellular level and cancer cells are not just “rogue” cells that were generated through a series of random mutations, but rather an ancient form of life that lies dormant within healthy metazoan cells [24]. In other words, cancer cells do not construct a gene network ab intitio and acquire traits through random mutations and a few decades of internal Darwinian selection within the host’s body, but strategically take advantage of certain components of the existing gene networks [24]. Characteristic examples include the healthy cells’ ability to multiply rapidly and migrate during wound healing [25], traits that at the same time render cells vulnerable to cancer [26]. The same applies to rapid angiogenesis, which is necessary in wound healing to supply new cells with oxygen and nutrients [25], but also a hallmark of cancer [27], with cancer cells even utilizing the same angiogenic signaling pathways as the ones used by the healthy cells of a multicellular organism to develop the vascular system [28]. Another example is the behavior of cells in the early developmental stages, such as the invasion of cells into adjacent developing tissues during gastrulation and the ability of some cells to transform from immotile epithelial cells into motile mesenchymal cells, a process termed “epithelial-mesenchymal transition” (EMT). If cells do not possess the necessary capabilities to perform these functions, growth cannot be achieved; however, these same cell capabilities enable cancer cells to metastasize [26].
The pathways of cell differentiation are initially identical to all organisms and then branch off in different taxonomic divisions [29]. Accordingly, tracing the origin of emergence and evolutionary history of certain genes can be important, especially for the most ancient genes that play a crucial role in cancer progression compared to the more recently evolved ones, since the effects of inactivating the former tend to be more pronounced as they are more likely to lead to cell death [24].
In the present study, we investigated certain genes which have previously been proven to exhibit a highly disrupted number of interactions across multiple tumor types and contribute to the phenomenon of mutual exclusivity [4]. More precisely, Trigos and colleagues [4] studied a pair of cellular processes, that is, chromosome organization and cellular junction organization. The co-expression of genes involved in those processes is highly important for the eukaryotic cells of a multicellular organism and the co-expression of those genes was strongly disrupted in tumors. Both of these processes involve genes of both unicellular and multicellular origin, and the dramatic change in the co-expression of those genes suggests that mutual exclusivity between these processes occurs during carcinogenesis and is actually advantageous to the development of different cancer stages such as early cancer, late cancer, and metastasis. A set of twelve atavistic genes, which represent the “hubs” between the unicellular and multicellular processes was identified: RCC1, TLN1, VASP, ACTG1, PLEC, CTTN, DSP, ILK, PKN2, CTNNA1, CTNND1 and PKP3. Those genes act as regulators of co-expression of the genes involved in the two aforementioned processes and their hubness was dramatically changed in seven different solid tumor types. Therefore, due to their central role in the human gene network, those genes represent fundamental points of vulnerability particularly regarding the phenomenon of mutual exclusivity and, consequently, carcinogenesis. To these data is added the fact that these 12 genes interact with genes associated with genomic instability, as well as genes associated with poor prognosis for cancer progression and metastasis [30]. Moreover, these 12 genes play a critical role in regulatory networks associated with genomic instability and metastasis and are generally involved in key processes of carcinogenesis [4]. Therefore, these genes can be considered as pan-cancer molecular markers or regulators of malignancy in diverse cancer tumors [4].
To this end, we have employed a bioinformatics approach to explore the involvement of those genes in various cancer types and performed phylostratigraphic analyses [14, 31], in an effort to elucidate the evolutionary trajectory of these genes aiming towards tracing the origin of cancer in the Tree of Life.
In this study, we followed the evolutionary lineage of the contemporary human species since the organisms or taxonomic divisions under investigation represent important links of human evolution: Homo sapiens (human), Pan troglodytes (chimpanzee), Macaca mulatta (Rhesus monkey), Callithrix jacchus (marmoset), Mus musculus (mouse), Rattus norvegicus (rat), Canis lupus familiaris (dog), Equus caballus (horse), Sus scrofa (pig), Bos taurus (cattle), Tursiops truncatus (dolphin), Pteropus vampyrus (bat), Monodelphis domestica (opossum), Ornythorhynchus anatinus (platypus), Gallus gallus (chicken), Xenopus laevis (frog), Latimeria chalumnae (coelacanth), Danio rerio (zebrafish), Callorhinchus milii (shark), Petromyzon marinus (lamprey), Ciona intestinalis (vase tunicate), Strongylocentrotus purpuratus (sea urchin), Amphimedon queenslandica (sponge), Monosiga brevicollis (choanoflagellate), Saccharomyces cerevisiae (baker’s yeast), Schizosaccharomyces pombe (fission yeast), Actinobacteria, Chlamydiae, Cyanobacteria, Proteobacteria, Firmicutes, Bacteroidetes, Archaea.
The official names of the genes were initially retrieved from the HGNC database [32, 33] and then the accession numbers of the peptide sequences corresponding to the human genes were retrieved from the publicly available non-redundant NCBI Reference Sequence Database (RefSeq) [34]. The amino acid sequences were used subsequently as probes in an extensive series of BLASTp [35] reciprocal searches in order to obtain the canonical homologous amino acid sequences corresponding to the species included in this study. This process was reiterated until no novel sequences could be detected, ensuring that a full representation of each gene’s family is obtained.
To investigate the evolutionary history of each gene, we conducted comprehensive phylogenetic analysis using the entire length protein sequences of the species under study. The full-length amino acid sequences were aligned with MAFFT v.7 (https://mafft.cbrc.jp/alignment/server/) [36]. The alignments were subsequently used to reconstruct phylogenetic trees by employing two separate methods, Maximum Likelihood, a method based on a heuristic approach for finding the optimal tree that fits the observed data, and Neighbor Joining, a method based on a hierarchical clustering algorithm [37], as implemented in the software package MEGA (https://www.megasoftware.net/), version 10 [38]. Furthermore, MEGA v.10 [38] was used to estimate the best-fit substitution model, which best describes the number of observed amino acid substitutions per position. For both methods of phylogenetic tree reconstruction, bootstrap analyses (150 pseudo-replicates) were conducted in order to evaluate the robustness and the statistical significance of the inferred reconstructed trees. Finally, the phylogenetic trees were visualized with MEGA v.10 [38].
The pathways and biological processes in which each gene is involved were retrieved from the Reactome Pathway database [39], the KEGG Pathways database [40, 41] and the biomedical literature.
RNA sequencing data for tumor and corresponding normal tissue samples from the
TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) databases,
respectively, were retrieved from the GEPIA2 website
(http://gepia2.cancer-pku.cn/). The atavistic genes that are differentially
expressed (DE) between tumor and normal samples were identified using ANOVA;
threshold value for absolute log fold change
The twelve hub genes were provided as input to the STRING (version 11.0) (https://string-db.org/) database [42] of investigating and visualizing both known and predicted gene/protein associations.
By conducting a thorough and comprehensive review of studies published in PubMed (https://pubmed.ncbi.nlm.nih.gov/) and The Human Protein Atlas [43], we found those cancers each of the 12 genes is involved in; even different types of cancers that can affect the same organ (Table 1, Ref. [43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237]). In this context, certain genes have been shown to exhibit either oncogenic or tumor suppressive effects. In fact, both qualitative and quantitative modifications were identified in the genes under study which are associated with the development and progression of different cancers in different tissues. In particular, the types of alterations observed in those genes and/or their protein products, include aberrant, increased or decreased, expression (Table 2), epigenetic modifications, mutations, gene inactivation or amplification, copy number alterations, altered protein interactions and post-translational modifications, characteristic protein subcellular localization. Furthermore, these genes constitute diagnostic and prognostic biomarkers in cancer, regarding malignancy, disease stage, clinical subtypes, disease progression, metastasis, clinical outcome and survival and prediction of patients’ response to therapeutic treatments. In addition, each gene and/or its products constitute drug targets or are considered as novel potential drug targets for at least one type of cancer.
Organ system | Types of cancer | Metastasis | Biomarker | Drug target | Type of alteration | Oncogenic (O) or tumor suppression (TS) effect | Reference | |
RCC2 | Gene’s expression is regulated by p53, thus it plays an important role in metastasis | [44] | ||||||
It regulates apoptosis by blocking RAC1 signaling. The gene’s expression levels in tumor cells are used to predict response to chemotherapy. | [45] | |||||||
Brain and nervous system | Glioblastoma | + | [46] | |||||
Breast | Breast Cancer | + | Wnt pathway | [47] | ||||
Gastrointestinal | Gastric Cancer | Aberrant expression | [48] | |||||
Colorectal Cancer | + | [49] | ||||||
Hepatic Cancer | + | [43] | ||||||
Pancreatic Cancer | [50] | |||||||
Genitourinary and gynecologic | Epithelial Ovarian Carcinoma | + | + | [51] | ||||
Ovarian Cancer | + | RalA pathway | [52] | |||||
Renal Cancer | + | [43] | ||||||
Cervical Cancer | + | [43] | ||||||
Skin | Melanoma | + | [43] | |||||
Thoracic and respiratory | Lung Adenocarcinoma | + | Increased expression | [53] | ||||
TLN1 | Bone and muscle sarcoma | Malignant Fibrous Histiocytoma (MFH) | Mutations | [54] | ||||
Fibrosarcoma (FS) | ||||||||
Ewing Sarcoma Family of Tumors (EFT) | ||||||||
Brain and nervous system | Glioblastoma | + | [55] | |||||
Breast | Triple-negative Breast Cancer | + | [56] | |||||
Endocrine system | Thyroid Cancer | + | Increased expression | [57] | ||||
Gastrointestinal | Hepatocellular Carcinoma | + | + | Decreased expression | [58, 59] | |||
ERK1/2 pathway | ||||||||
Colorectal Cancer | + | [43] | ||||||
Genitourinary and gynecologic | Prostate Cancer | + | + | Increased expression | [60, 61] | |||
Renal Cancer | + | [43] | ||||||
Ovarian Cancer | Increased expression | [62] | ||||||
Head and neck | Nasopharyngeal Carcinoma | + | Increased expression | [63] | ||||
Oral Squamous Cell Carcinoma | + | + | Increased expression | [64] | ||||
Hematopoietic | Chronic Myeloid Leukemia | Increased expression | [65] | |||||
VASP | Bone and muscle sarcoma | Osteosarcoma | + | [66] | ||||
Breast | Breast Cancer | + | [67] | |||||
Gastrointestinal | Hepatocellular Carcinoma | + | + | Increased expression | [43, 68] | |||
Colorectal Cancer | + | + | [69] | |||||
Gastric (stomach) Cancer | + | PI3/AKT pathway | [70] | |||||
Genitourinary and gynecologic | Renal Cell Carcinoma | + | + | Post-translational modifications (phosphorylation) | [43, 71] | |||
Hematopoietic | Chronic Myeloid Leukemia | + | Aberrant expression | [72] | ||||
Protein interactions | ||||||||
Post-translational modifications (phosphorylation) | ||||||||
Skin | Melanoma | + | [73] | |||||
Thoracic and respiratory | Lung Adenocarcinoma | + | Increased expression | [74] | ||||
ACTG1 | Bone and muscle sarcoma | Osteosarcoma | + | + | Increased expression | [75] | ||
Breast | Breast Cancer | Increased expression | [76] | |||||
Gastrointestinal | Alcohol related Hepatocellular Carcinoma | + | Increased expression | [77, 78] | ||||
Colorectal Cancer | + | + | [43, 79] | |||||
Genitourinary and gynecologic | Renal Cancer | + | [43] | |||||
Cervical Cancer | + | Epigenetic modification (methylation) | [80] | |||||
Hematopoietic | Acute Lymphoblastic Leukemia (children) | + | SNPs | [81] | ||||
Skin | Skin Cancer | + | + | Increased expression | [82] | |||
ROCK Pathway | ||||||||
Thoracic and respiratory | Non-small Cell Lung Cancer | + | + | [83] | ||||
PLEC | Gastrointestinal | Hepatocellular Carcinoma | + | + | Decreased expression | [84] | ||
Colon Cancer | + | [85] | ||||||
Pancreatic Ductal Adenocarcinoma | + | + | [86] | |||||
Genitourinary and gynecologic | Renal Cancer | + | [43] | |||||
Testicular Germ Cell Tumors | + | [87] | ||||||
Head and neck | Paranasal sinus carcinoma | Increased expression | [88] | |||||
Oral Squamous Cell Carcinoma | + | Decreased expression | [89] | |||||
Skin | Melanoma | + | Copy number alterations | [90] | ||||
Thoracic and respiratory | Lung Cancer | + | [43] | |||||
CTTN | CTTN is a well-established oncogene, associated with advanced disease stage and poor prognosis. | [91] | ||||||
Bone and muscle sarcoma | Osteosarcoma | + | Increased expression | [92] | ||||
Brain and nervous system | Glioma | Increased expression | [93] | |||||
Glioblastoma | + | [94] | ||||||
Breast | Breast Cancer | Post-translational modification (phosphorylation) | [95] | |||||
Endocrine system | Thyroid Cancer | + | + | Increased expression | [96] | |||
Protein interactions | ||||||||
Gastrointestinal | Gastric Cancer | + | SNP | [97] | ||||
Hepatocellular Carcinoma | + | + | Increased expression | [98, 99] | ||||
Protein interactions | ||||||||
Colorectal Cancer | + | Increased expression | [100, 101] | |||||
EGFR-MAPK pathway | ||||||||
Genitourinary and gynecologic | Bladder Cancer | + | [102] | |||||
Ovarian Epithelial Cancer | + | Increased expression | [103] | |||||
Prostate Cancer | + | Increased expression | [104] | |||||
Renal Clear Cell Carcinoma | + | [105] | ||||||
Head and neck | Head and Neck Squamous Cell Carcinomas | + | + | [91, 106] | ||||
Esophageal Squamous Cell Carcinoma | + | + | + | Increased expression | [107, 108] | |||
Oral Squamous Cell Carcinoma | + | + | + | Gene amplification | [109, 110] | |||
Increased expression | ||||||||
Pharyngolaryngeal Squamous Cell Carcinomas | + | + | Gene amplification | [106] | ||||
Increased expression | ||||||||
Oropharynx Squamous Cell Carcinoma | + | Gene amplification | [91] | |||||
Increased expression | ||||||||
Laryngeal Cancer | + | [111, 112] | ||||||
Hematopoietic | B-cell Acute Lymphoblastic Leukemia | + | + | Increased expression | [113] | |||
Skin | Melanoma | + | Ubiquitination | [114] | ||||
Cutaneous Squamous Cell Carcinoma | + | Post-translational modification (phosphorylation) | [115] | |||||
Thoracic and respiratory | Non-small Cell Lung Cancer | + | Increased expression | [116] | ||||
DSP | Breast | Breast Cancer | + | Decreased expression | [117] | |||
Gastrointestinal | Colorectal Cancer | + | + | [118] | ||||
Gastric Cancer | Decreased expression | [119] | ||||||
Wnt/ | ||||||||
Hepatocellular Carcinoma | + | Decreased expression | [120] | |||||
Genitourinary and gynecologic | Ovarian Cancer | + | + | Increased expression | [121] | |||
Immune response | ||||||||
High-grade Serous Ovarian Cancer | + | Characteristic expression profile | [122] | |||||
Renal Cancer | + | [43] | ||||||
Urothelial Cancer | + | [43] | ||||||
Head and neck | Oral Squamous Cell Carcinoma | + | Decreased expression | [123] | ||||
Oropharyngeal Squamous Cell Carcinomas | + | + | Decreased expression | [124] | ||||
Isomorph | ||||||||
Skin | Melanoma | + | + | Immune response | [121] | |||
Thoracic and respiratory | Lung Adenocarcinoma | Characteristic expression profiles | [125] | |||||
Lung Squamous Cell Carcinoma | Characteristic subcellular localization | |||||||
Adenosquamous Carcinoma | ||||||||
Non-small Cell Lung Cancer | + | Decreased expression due to epigenetic modification (methylation) | [126] | |||||
Wnt/ | ||||||||
ILK | Increased expression is associated with an aggressive phenotype and metastasis in many types of cancer. | [127] | ||||||
Brain and nervous system | Neuroblastoma | + | LIMS1/ILK pathway | [128] | ||||
Glioblastoma | + | + | ILKAP pathway | [129] | ||||
Breast | Breast Cancer | + | + | + | Increased expression | [130, 131, 132] | ||
Twist-ITGB1-FAK/ILK pathway | ||||||||
PI3K/Akt pathway | ||||||||
Endocrine system | Thyroid Cancer | + | + | Aberrant expression | [133] | |||
Gastrointestinal | Colorectal Cancer | + | + | Increased expression | [134] | |||
Pancreatic Ductal Adenocarcinoma | + | + | Increased expression | [135] | ||||
Hepatocellular Carcinoma | Increased expression | [136] | ||||||
Akt activation | ||||||||
Gallbladder Squamous Cell Carcinoma | + | + | + | Increased expression | [137] | |||
Adenosquamous Gallbladder Carcinomas | ||||||||
Gallbladder Adenocarcinoma | ||||||||
Pancreatic Cancer | + | + | + | KRAS-ILK regulatory feedback loop | [138] | |||
Gastric Cancer | + | + | Increased expression | [139] | ||||
Genitourinary and gynecologic | Ovarian Epithelial Cancer | + | + | Increased expression | [140, 141] | |||
Bladder Cancer | + | + | + | ILK/PI3K/Akt pathway | [142] | |||
Renal Clear Cell Carcinoma | + | + | [43, 143] | |||||
Prostate Cancer | + | Cell cycle regulation | [144] | |||||
Head and neck | Salivary Adenoid Cystic Carcinoma | + | + | Increased expression | [145] | |||
Laryngeal Squamous Cell Carcinoma | + | + | + | Increased expression | [146] | |||
Esophageal Squamous Cell Carcinoma | + | Increased expression | [147] | |||||
Hematopoietic | Chronic Myeloid Leukemia | + | [148] | |||||
Acute Myeloid Leukemia | ||||||||
Skin | Melanoma | Impaired post-translational modification (phosphorylation) | [149] | |||||
Thoracic and respiratory | Non-small Cell Lung Cancer | + | + | [150] | ||||
PKN2 | Breast | Triple-negative Breast Cancer | + | + | Increased expression | TS | [151, 152] | |
Head and Neck | Oral Squamous Cell Carcinoma (smokers) | + | + | Increased expression | [153] | |||
Post-translational modification (hyperphosphorylation) | ||||||||
Gastrointestinal | Colon Cancer | + | + | Decreased expression | TS | [43, 154] | ||
DUSP6-Erk1/2 pathway | ||||||||
Genitourinary and gynecologic | Prostate Cancer | + | Increased expression | [155, 156] | ||||
Bladder Cancer | + | Increased expression | [155] | |||||
CTNNA1 | CTNNA1 is generally considered as a tumor suppressor | |||||||
Brain and nervous system | Glioblastoma | Increased expression | [157] | |||||
Breast | Luminal Breast Cancer | Increased expression | [158] | |||||
Triple-negative Breast Cancer (basal-like) | NF-kB pathway | TS | [159] | |||||
Lobular Type Breast Carcinoma | + | Aberrant expression | [160] | |||||
Endocrine system | Differentiated Thyroid Carcinoma | + | + | Decreased expression | [161] | |||
Gastrointestinal | Colorectal Cancer | + | + | Pseudogene CTNNAP1 | [162, 163, 164] | |||
Aberrant expression | ||||||||
Cell cycle regulation | ||||||||
Gastric Cancer | + | Deleterious variants | [165] | |||||
Mutations | ||||||||
Pancreatic Ductal Adenocarcinoma | + | + | Inactivation | [166, 167, 168] | ||||
Aberrant expression | ||||||||
Decreased expression | ||||||||
Impaired epigenetic modification (methylation) | ||||||||
Hepatocellular Carcinoma | + | Decreased expression | [169] | |||||
Cholangiocarcinoma | + | Decreased expression | [170] | |||||
Genitourinary and gynecologic | Bladder Cancer | + | Protein interactions | [171] | ||||
Ovarian Cancer | + | Epigenetic modification (methylation) | [172] | |||||
Renal Cell Carcinoma | Decreased expression | [43, 173] | ||||||
Prostate Cancer | + | Inactivation | [174] | |||||
Protein interactions | ||||||||
Head and neck | Oral Squamous Cell Carcinoma | + | Decrease protein levels | [175] | ||||
Characteristic subcellular localization | ||||||||
Esophageal Cancer | + | Decreased expression | [176] | |||||
Hematopoietic | Myelodysplastic Syndromes | + | Chromosome 5q deletion | [177] | ||||
Decreased expression | ||||||||
Acute Myeloid Leukemia | Epigenetic modifications (methylation and histone deacetylation of the promoter) | |||||||
Immune system | Thymoma | + | Characteristic expression profile | [178] | ||||
Skin | Melanoma | + | Decreased expression | [179] | ||||
Inactivation | ||||||||
Thoracic and respiratory | Non-small Cell Lung Cancer | + | Decreased expression | [180, 181, 182] | ||||
CTNND1 | In fact, p120 catenin appears to have both pro-oncogenic and anti-oncogenic effects, depending on the localization and the specific function of p120 catenin in each cell compartment. | [183] | ||||||
Loss, downregulation or mislocalization of p120 catenin is observed in most human cancers. | [184] | |||||||
Bone and muscle sarcoma | Osteosarcoma | + | Increased expression | [185] | ||||
Synovial Sarcoma | [186] | |||||||
Brain and nervous system | Astrocytoma | + | Abnormal post-translational modification (hyperphosphorylation) | [187] | ||||
Glioblastoma | + | + | Increased expression | [188] | ||||
Neuroblastoma | Protein interactions | [189] | ||||||
Breast | Breast Cancer | + | + | Increased expression | [190, 191] | |||
Wnt/ | ||||||||
Isomorphs (basal-like and luminal subtypes) | ||||||||
Characteristic subcellular localization | ||||||||
Breast Invasive Lobular Carcinoma | + | Decreased expression | [191, 192] | |||||
Characteristic subcellular localization | ||||||||
ROCK1 pathway | ||||||||
Endocrine-resistant Breast Cancer | + | Decreased expression | [193] | |||||
Characteristic subcellular localization | ||||||||
Triple-negative Breast Cancer (basal like) | Decreased expression | [191, 193] | ||||||
Characteristic subcellular localization | ||||||||
Gastrointestinal | Colorectal Cancer | + | + | + | Increased expression | [194, 195, 196] | ||
Decreased expression | ||||||||
Gastric Cancer | + | + | Increased expression | [197, 198, 199] | ||||
Characteristic subcellular localization | ||||||||
Hepatocellular Carcinoma | Increased expression | O | [200, 201] | |||||
Decreased expression | ||||||||
Wnt/ |
TS | |||||||
Pancreatic Ductal Adenocarcinoma | + | + | + | Decreased expression | TS | [202, 203, 204] | ||
Characteristic subcellular localization | ||||||||
RAC1 pathway | ||||||||
Solid Pseudopapillary Tumors of the Pancreas | + | Decreased expression | [203, 205] | |||||
Characteristic subcellular localization | ||||||||
Genitourinary and gynecologic | Bladder Cancer | + | Decreased expression | [206] | ||||
Characteristic subcellular localization | ||||||||
Cervical Cancer | + | Aberrant expression | [207] | |||||
Endometrial Cancer | Decreased expression | [208] | ||||||
Prostatic Adenocarcinoma | + | + | Decreased expression | [209] | ||||
Renal Cancer | + | + | + | Isophorm Increased expression | [43, 210] | |||
Ovarian Cancer | + | Characteristic subcellular localization | [211] | |||||
RAC1 pathway | ||||||||
Head and neck | Head and Neck Squamous Cell Carcinomas | + | Decreased expression | [212] | ||||
Esophageal Squamous Cell Carcinoma | + | Decreased expression | TS | [213] | ||||
Characteristic subcellular localization | ||||||||
Hematopoietic | Acute Lymphoblastic Leukemia | + | Increased expression | [214] | ||||
Skin | Skin Squamous Cell Carcinoma | Decreased expression | [215, 216] | |||||
Characteristic subcellular localization | ||||||||
Melanoma | Aberrant expression | [179] | ||||||
Characteristic subcellular localization | ||||||||
Thoracic and respiratory | Lung Cancer | + | Increased expression | O | [217] | |||
Characteristic subcellular localization | ||||||||
Lung Adenocarcinoma | + | Decreased expression | [218] | |||||
Lung Squamous Cell Carcinoma | ||||||||
Non-small Cell Lung Cancer | Decreased expression | [219] | ||||||
PKP3 | Downregulation of PKP3 leads to tumor formation, a decrease in cell adhesion, promotion of EMT and metastasis | [220, 221, 222] | ||||||
Breast | Breast Cancer | + | Increased expression | [223] | ||||
Gastrointestinal | Gastric Adenocarcinoma | Decreased expression | [224] | |||||
Inactivation | ||||||||
Gastrointestinal Cancer | + | + | Increased expression | [225] | ||||
Pancreatic Cancer | + | [43] | ||||||
Genitourinary and gynecologic | Ovarian Cancer | + | + | + | Increased expression | O | [121, 226, 227] | |
MAPK-JNK-ERK1/2-mTOR pathway | ||||||||
Immune response | ||||||||
Bladder Cancer | + | Increased expression | [228] | |||||
Characteristic subcellular localization | ||||||||
Prostatic Adenocarcinoma | + | Increased expression | [229, 230] | |||||
Decreased expression | ||||||||
Protein interactions | ||||||||
Renal Cancer | + | [43] | ||||||
Uterine Carcinosarcoma | + | Epigenetic modification (methylation) | [231] | |||||
Head and neck | Oropharynx Squamous Cell Carcinoma | + | + | Decreased expression | [232] | |||
Inactivation | ||||||||
Nasopharyngeal Carcinoma | + | Decreased expression (DNP carcinogen factor) | [233] | |||||
Skin | Melanoma | + | + | + | Immune response | [121, 234] | ||
Thoracic and respiratory | Lung Adenocarcinoma | + | + | Increased expression | [235] | |||
Mesothelioma | + | Increased expression | [236] | |||||
Non-small Cell Lung Cancer | + | Aberrant expression | [237] |
Gene | Cancer type | DE |
ACTG1 | DLBC, THYM | Up |
CTNNA1 | PAAD, THYM | Up |
CTTN | THYM, DLBC | Up |
CTTN | OV, LAML, UCS | Down |
DSP | CESC, COAD, LUAD, LUSC, OV, THYM, TGCT, UCEC,UCS, READ | Up |
DSP | SKCM | Down |
ILK | UCEC, UCS | Down |
PKP3 | UCS, LUSC, OV, THYM, STAD, UCEC, PAAD, READ, CESC, COAD | Up |
PKP3 | SKCM, LAML | Down |
PLEC | PAAD | Up |
RCC1 | GBM, THYM, DLBC | Up |
TLN1 | READ | Down |
VASP | GBM, PAAD | Up |
Up, up-regulation; Down, down-regulation. |
The critical role of these genes and their corresponding products in carcinogenesis is highlighted by the fact that they are all involved in cancer-relevant pathways and processes, usually more than one (Table 3). More specifically, the identified cancer signaling pathways include the Wnt pathway (RCC2, DSP, CTNND1), the MAPK pathway (TLN1, CTTN, PKP3), the VEGF pathway (TLN1, CTNNA1, CTNND1), the PI3K-Akt pathway (VASP, ILK, PKN2), the Hippo pathway (ACTG1, CTNNA1), the PPAR pathway (ILK) and the Rap1 pathway (VASP, CTNND1, ACTG1, TLN1). Moreover, particularly vital processes of the cell, which also play a role in carcinogenesis, are eukaryotic cell cycle regulation and mitosis (RCC2, ILK, CTNNA1, CTNND1), apoptosis (ACTG1, PLEC, DSP, CTTN, CTNND1), immune system processes (TLN1, VASP, ACTG1, DSP, PKP3) and interaction with proteoglycans in cancer (ACTG1, CTTN). Finally, the proteins that constitute the adherens junctions (ACTG1, CTNNA1, CTNND1) and contribute to focal adhesions (VASP, TLN1, ACTG1) play a critical role especially in metastasis.
RCC2 | TLN1 | VASP | ACTG1 | PLEC | CTTN | DSP | ILK | PKN2 | CTNNA1 | CTNND1 | PKP3 | |
Wnt | + | + | + | |||||||||
MAPK | + | + | + | |||||||||
VEGF | + | + | + | |||||||||
PI3K-Akt | + | + | + | |||||||||
Hippo | + | + | ||||||||||
PPAR | + | |||||||||||
Rap1 | + | + | + | + | ||||||||
Cell cycle regulation and mitosis | + | + | + | + | ||||||||
Apoptosis | + | + | + | + | + | |||||||
Immune system processes | + | + | + | + | + | |||||||
Interaction with proteoglycans in cancer | + | + | ||||||||||
Adherence junctions | + | + | + | |||||||||
Focal adhesion | + | + | + |
Of note, based on the STRING database analysis, the most significantly enriched
pathway of the 12 gene/proteins is ‘cell junction assembly’ (GO: 0034329; FDR =
4.10
Therefore, the importance of the examined genes is enhanced by their involvement in pathways related to carcinogenesis and the progression of the disease in various types of cancer.
STRING network depicting the associations (connecting lines) of the hub genes/gene products (nodes) under investigation in the cell junction assembly.
In terms of overall topology, there is congruence between the phylogenetic trees that were generated with both methods. The trees constructed with the maximum likelihood method are considered more accurate and reliable and they display higher bootstrap values (Figs. 2,3,4, 5,6,7,8,9,10,11,12). These high bootstrap values indicate that the tree nodes are statistically significant and the inferred topology is biologically significant. The trees generated with the neighbor-joining method can be found at the Supplementary material.
The ILK, CTNNA1, CTNND1 and PKP3 genes are detected exclusively in multicellular Animals as shown in the phylogenetic trees in Figs. 2,3,4, respectively. Therefore, those genes most likely first appeared in a eukaryotic multicellular organism which was the common ancestor of Metazoa.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the ILK genes. TNNI3K is used as outgroup. The branch length represents evolutionary distance. A discrete Gamma distribution was used to model evolutionary rate differences among sites. The scale bar at the lower left denotes the length of amino acid substitutions per site.
Maximum likelihood-based unrooted phylogenetic tree of the protein sequences encoded by the CTNNA1, CTNNA2, CTNNA3, CTNND1 and CTNND2 genes. The conventions are the same as in Fig. 2.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PKP1, PKP2, PKP3 and PKP4 genes. CTNND2 is used as outgroup. The conventions are the same as in Fig. 2.
Catenins A (CTNNA) and catenins D (CTNND) are members of the catenin superfamily (Fig. 3). CTNNA1, CTNNA2, CTNNA3, CTNND1, CTNND2 genes are paralogs which probably occurred through a series of gene duplication events. The corresponding orthologous proteins of each gene form distinct clades, and as a result the phylogenetic tree is divided into two major subtrees, comprised of protein sequences encoded by CTNNA and CTNND genes, respectively. The subtree of CTNNA includes the protein sequences encoded by CTNNA1, CTNNA2 and CTNNA3. CTNNA2 appears to be the primordial gene as it was first detected in Amphimedon queenslandica, thus it probably first emerged in an ancestor of Porifera after their divergence from Choanoflagellates, since an ortholog was not found in Monosiga brevicollis. On the other hand, CTNNA1 and CTNNA3 were detected for the first time in Callorhinchus milli, and thus, they appeared later in evolution and probably occurred due to duplication events of the CTNNA2 gene, as they most likely first appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since orthologs were not detected in Ciona intestinalis and in fact they demonstrate high similarity to each other. The subtree of catenins D includes the protein sequences encoded by CTNND1 and CTNND2. CTNND2 is apparently the primordial gene as it was first detected in Amphimedon queenslandica, and thus it probably first arose in an ancestor of Porifera after their divergence from Choanoflagellates, since an ortholog was not detected in Monosiga brevicollis. On the other hand, CTNND1 appeared later in evolution and probably occurred due to CTNND2 gene duplication, as it was detected for the first time in Ciona intestinalis. Therefore, CTNND1 might have arisen in an ancestor of Tunicates after their divergence from Echinodermata, given that CTNND1 orthologs were not detected in Strongylocentrotus purpuratus.
The PKP1, PKP2, PKP3, PKP4 genes are likely paralogs (Fig. 4) which probably occurred through a series of gene duplication events of an ancestral PKP gene. The corresponding orthologs of each gene form distinct clades. All PKP genes were detected for the first time in Callorhinchus milli, suggesting that they probably appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since PKP orthologs were not detected in Ciona intestinalis. Based on the inferred phylogenetic tree, however, PKP1 and PKP2 exhibit the highest similarity, followed by PKP3. We could speculate that PKP4 is the primordial gene of this family, since it is basal to PKP1, PKP2 and PKP3. PKP3 appeared later in evolution, probably due to PKP4 gene duplication, while PKP1 and PKP2 have emerged even more recently through another series of duplication events in a chondrichthyan PKP1/2 gene and have not accumulated a large number of mutations.
The VASP, CTTN and DSP genes are detected during the relatively late transition phase from unicellularity to multicellularity, based on the inferred phylogeny shown in Figs. 5,6,7, respectively.
VASP and Enah/Vasp-Like are likely paralogs (Fig. 5). In Monosiga brevicollis, Strongylocentrotus purpuratus, Ciona intestinalis and Callorhinchus milli the corresponding detected genes were annotated as VASP-like in RefSeq, but display high similarity to the VASP genes of the rest of Metazoa.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the VASP and ENAH-LIKE/ENAH/VASP-LIKE genes. EVL is used as outgroup. The conventions are the same as in Fig. 2.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the CTTN genes. HCLS1 is used as outgroup. The conventions are the same as in Fig. 2.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the DSP genes. PLEC is used as outgroup. The conventions are the same as in Fig. 2.
The PLEC and TLN1 genes were detected in unicellular and multicellular organisms, as shown in Figs. 8,9, respectively, and they probably first appeared in a eukaryotic unicellular organism that was the common ancestor of Animalia and Fungi.
PLEC homologs were found in Animalia as well as in Fungi (SAC6 in Saccharomyces cerevisiae and FM1 in Schizosaccharomyces pombe) (Fig. 8).
The TLN1 and TLN2 genes are likely paralogs (Fig. 9), and their corresponding orthologous protein sequences form distinct clades. TLN1/2 is probably the primordial gene of this family, since it was first detected in Amphimedon queenslandica, Monosiga brevicollis, Ciona intestinalis and Strongylocentrotus purpuratus which gave rise to TLN1 and TLN2 after successive gene duplications. Talin homologs were also identified in Fungi, that is, SLA2P in Saccharomyces cerevisiae and END4 in Schizosaccharomyces pombe, as also confirmed in literature [239, 240]. Therefore, although members of the TLN1 gene family were found only in Metazoa, it has deep evolutionary roots in some ancient unicellular eukaryotic organism, which is the common ancestor of Animalia and Fungi.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PLEC genes. MAKF1 is used as outgroup. The conventions are the same as in Fig. 2.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the TLN1 and TLN2 genes. HIP1R is used as outgroup. The conventions are the same as in Fig. 2.
The characteristically long clade corresponding to Monosiga brevicollis, as shown in Figs. 5,6,7,8,9, is due to the evolutionary course of the organism, as it evolved for many millions of years separately from the rest of the Metazoa.
The evolutionary origin of ACTG1, RCC2 and PKN2 genes can be traced to a universal common ancestor, namely an ancient prokaryotic unicellular organism, that is the common ancestor of Animalia, Bacteria and Archaea as illustrated in Figs. 10,11,12, respectively.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the ACTA1, ACTA2, ACTB, ACTC1, ACTG1 and ACTG2 genes. ACTR1, ACTR1A and ACTRB are used as outgroups. The conventions are the same as in Fig. 2.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the RCC1 and RCC2 genes. HERC3 is used as outgroup. The conventions are the same as in Fig. 2.
Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PKN1, PKN2, PKN3, PKN, PKC genes. SGK2 is used as outgroup. The conventions are the same as in Fig. 2.
The ACTA1, ACTA2, ACTB, ACTC1, ACTG1, ACTG2 genes are likely paralogs (Fig. 10), the orthologs of which cluster together in distinct clades. The phylogenetic tree is divided into two major subtrees, one comprised of protein sequences encoded by genes of different types of cytoplasmic actins and the other includes sequences encoded by genes of different types of muscle actins.
The subtree of cytoplasmic actins harbors the protein sequences encoded by ACTG1 (Actin cytoplasmic 2), ACTB (Actin cytoplasmic 1) as well as various other actin genes. ACTG1 is apparently the ancestral gene of this family as it was detected in Animalia, Bacteria (Microbacterium arborescens (Actinobacteria), Chlamydia trachomatis (Chlamydiae), Leptolyngbya sp. PCC 7376 (Cyanobacteria), Kangiella spongicola (Proteobacteria), Staphylococcus aureus (Firmicutes)), as well as Archaea (Candidatus Heimdallarchaeota (Archaea), Candidatus Lokiarchaeota (Archaea)). The sequences detected in bacteria do not cluster together, but are rather scattered among those corresponding to genes that encode the two types of cytoplasmic actins in Metazoa. This is indicative of high similarity among the aforementioned genes, since the encoded protein sequences appear to be highly conserved. Additionally, Actin genes found in Fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe) exhibit high similarity to Actin genes found in Archaea, which encode cytoplasmic actin 2.
On the other hand, the ACTB gene likely appeared later in evolution and probably occurred due to a duplication event of the ACTG1 gene, as it was detected for the first time in Petromyzon marinus. Also, in Ciona intestinalis, a cytoplasmic actin gene and a muscle actin gene were identified, and in fact the gene encoding the cytoplasmic actin shows high similarity to the only ACTIN gene detected in Monosiga brevicollis. Finally, Actin-85C-Like gene found in Amphimedon queenslandica also demonstrates high similarity to types of cytoplasmic actin.
The subtree of muscle actins, includes the protein sequences encoded by ACTA1 (Actin alpha 1, skeletal muscle), ACTA2 (Actin alpha 2, smooth muscle), ACTC1 (Actin alpha cardiac muscle 1) and ACTG2 (Actin gamma 2, enteric smooth muscle). Muscle actin genes appear to be the primordial genes of all muscle actin encoding genes as they were first identified in Strongylocentrotus purpuratus and Ciona intestinalis; thus, the gene probably first appeared in a common ancestor of Echinodermata and Chordata after their divergence from Porifera, since an ortholog was not detected in Amphimedon queenslandica. ACTA2 and ACTC1 genes were first detected in Callorhinchus milli, ACTA1 in Danio rerio, and ACTG2 in Latimeria chalumnae, respectively. We would expect to find the ACTA1 and ACTG2 genes in Callorhinchus milli, as it is the first organism in this study to have differentiated organs, but the sequences were not detected using BLAST. In any case, we believe that these genes occurred through a series of gene duplication events yielding four paralogs that probably first appeared in an ancestor of the Chondrichthyes after their divergence from Echinodermata and Tunicates.
In summary, those genes encoding different types of cytoplasmic actins are evolutionarily older, since those genes that encode different types of muscle actins appeared later in evolution. Specifically, ACTG1 is the ancestral gene of the entire family, since it is the evolutionarily older out of all paralogs found in Metazoa.
The RCC1 and RCC2 genes are likely paralogs, and their corresponding orthologous protein sequences form distinct clades (Fig. 11). RCC1 is probably the primordial gene of this family as it was detected in Metazoa, Bacteria Bifidobacterium asteroides (Actinobacteria), Parachlamydia sp. C2 (Chlamydiae), Synechococcus sp. WH 8103 (Cyanobacteria), Myxococcus xanthus (Proteobacteria), Cohnella sp. CAU 1483 (Firmicutes), Hymenobacter chitinivorans (Bacteroidetes) and Methanocella conradii (Archaea). RCC1 and RCC2 homologs have also been identified in Fungi, that is, SRM1 in Saccharomyces cerevisiae and PIM1 in Schizosaccharomyces pombe, as confirmed in literature [241, 242]. Furthermore, although the corresponding gene in Monosiga brevicollis and Amphimedon queenslandica is annotated as RCC in RefSeq, it displays high similarity to the RCC1 gene. Thus, RCC1 probably first appeared in a common ancestor of Eukaryotes, Bacteria and Archaea. RCC2 on the other hand, might have appeared later in evolution, as it was detected for the first time in Strongylocentrotus purpuratus, and probably occurred due to duplication of the RCC1 gene in an ancestor of Echinodermata after their divergence from the phylum of Porifera, since an RCC2 ortholog was not detected in Amphimedon queenslandica. Therefore, although RCC2 is found only in Metazoa, its ancestor is traced to some ancient prokaryotic unicellular organism, that is, the common ancestor of Eukaryotes, Bacteria and Archaea.
The PKN1, PKN2, PKN3, PKN, PKC genes are likely paralogs, the orthologs of which cluster together in distinct clades (Fig. 12). PKN was detected in Amphimedon queenslandica, in the unicellular Choanoflagellate Monosiga brevicollis, in Bacteria Streptomyces seoulensis (Actinobacteria), Chlamydia ibidis (Chlamydiae), Chloracidobacterium thermophilum (Cyanobacteria), Escherichia coli (Proteobacteria), Paenibacillus donghaensis (Firmicutes), Spirosoma panaciterrae (Bacteroidetes) and in Archaea (Methanoregula boonei and Thermococcus thioreducens) and is likely the primordial gene of this family and precursor of the PKN1, PKN2 and PKN3 genes which probably occurred through a series of PKN gene duplication events. The PKC gene was also detected in Petromyzon marinus, as well as in Fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe). PKN2 is apparently the oldest one as it was first detected in Ciona intestinalis and Strongylocentrotus purpuratus, so it probably first appeared in a common ancestor of Tunicates and Echinodermata after their divergence from Choanoflagellates, since an PKN2 ortholog was not detected in Monosiga brevicollis. PKN1 most likely emerged later as it was first detected in Callorhinchus milli; it probably first appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since an PKN1 ortholog was not detected in Ciona intestinalis. PKN3 was detected for the first time in Latimeria chalumnae, so it probably first emerged in an ancestor of Actinists after their divergence from Osteichthyes, since an PKN3 ortholog was not detected in Danio rerio. Therefore, although PKN2 was found exclusively in Metazoa, it has very deep evolutionary roots found in some ancient prokaryotic unicellular organism, that is the common ancestor of Animalia, Bacteria and Archaea.
According to the findings of this in silico study, three different types of genes were identified within the 12 ‘hub’ genes, in terms of their evolutionary age. Fırst and foremost, genes of unicellular origin that are in fact associated with the emergence of the first cellular life forms. In particular, ACTG1, RCC2 and PKN2 were detected in all domains of life, namely the empire Eukaryota (Animal and Fungi kingdoms) and the empire Prokaryota (Eubacteria and Archaebacteria kingdoms). Second, genes of unicellular origin that are associated with the emergence of the first eukaryotic life forms, namely the VASP, DSP, CTTN, PLEC and TLN1 genes. Third, genes of multicellular origin that are in fact associated with the evolution of multicellularity in the Animal kingdom. In particular, the ILK, CTNNA1, CTNND1 and PKP3 genes were detected exclusively in Metazoa; hence, they could be considered as genes of multicellular origin. Therefore, these are highly conserved genes of unicellular origin, which are in fact associated with the emergence of the first cellular life forms. Consequently, the investigated genes are considered to have deep evolutionary roots, with the most recently evolved ones being linked to the emergence of Metazoa and the most ancient ones having an evolutionary age of billions of years, thereby coinciding with the emergence of the first cellular life forms (Fig. 13).
Species tree representing the evolutionary relationships among taxa under study. The marker symbols on the tree nodes indicate the common ancestral taxon, to which the evolutionary roots of a given gene family under study are traced.
In addition to their evolutionary age, the genes are involved in multiple cancer-related pathways and processes, and are associated with various forms of cancer, especially metastasis. Furthermore, they are characterized as hub genes and hold a particular position in the human gene network, on the boundary of unicellularity and multicellularity, and thus, contribute to the phenomenon of mutual exclusivity. Of particular note, all 12 genes have been linked to the most aggressive trait of cancer, that is, metastasis, in various types of cancers.
To further support the theory of “atavistic reversion” of cancer, the emerging model of the germ-and-soma life cycle [243] shows that cancer, uses not only first cellular, first eukaryotic, and Metzaoa I genes, but also genes of the late evolved unicellular organisms, respectively stemness and differentiation genes, as well as genome repair genes, including their own mechanisms of cancer stem cell production.
The findings of the present study can be applied on the design of therapeutic strategies, since the investigated genes and their products, as well as the processes and pathways in which they participate, could represent candidate pan-cancer biomarkers and potential targets for the development of a new class of pan-cancer treatment protocols that can be applied to any type of cancer. The development of therapeutic strategies based on the analysis of the configuration of the human gene network itself have been proposed. These approaches entail the cellular processes of a specific evolutionary age to be used as targets, and especially the genes of unicellular or multicellular origin that are highly interconnected and contribute to the phenomenon of mutual exclusivity, rendering them vulnerable and also potential drug targets [15].
Although the concept that cancer is linked with evolution was first proposed almost a century ago, laying in this way the foundation for the atavistic model of cancer [244, 245, 246], the application of evolutionary biology approaches to the study of neoplasms’ formation and progression is a recent endeavor. Comparative oncology is a novel and highly promising field of cancer research that can lead to a deeper understanding of cancer and contribute to the discovery of novel biomarkers and clinical therapeutic strategies.
Several other possible evolutionary approaches to cancer treatment and prevention have been proposed, mainly to address the problem of cancer cells’ remarkable resistance to therapeutic regimens. Given the resilience and diversity of different forms of cancer, the key lies in the common characteristics of all cancer cells, regardless of tumor type. The pathways and genes involved, in particular, can be utilized to the design of drugs that target cancer cells selectively and are effective against any cancer cell [5].
Another very promising approach concerns neoplasm ecology. In particular, the therapeutic approaches that have been proposed include the use of a combination of drugs [247], alteration of the competition between healthy and cancer cells by enhancing the adaptability of the former [248], selection of cells sensitive to chemotherapy [248] or cells exhibiting genetic stability [249] and, above all, treatments aiming at preventing the survival of resistant cells which lead to disease recurrence and recession after treatment [250].
Apart from the type of treatment, the manner in which it is administered is also important, as it can influence the evolutionary dynamics of tumor cells [250]. Chemotherapy, for example, has different effect on cell competition when administered in large individual doses instead of small continuous doses [251], since the latter strategy diminishes the likelihood of creating resistant clones in the neoplasm population [250].
Herein, we have made an effort to track the emergence of twelve important hub cancer genes in the evolutionary history by phylostratigraphy. Based on our analyses, those genes that have been proven to play a crucial role in several aspects of cancer biology, as being part of an intricate regulatory network, are evolutionarily ancient, with a high fraction of them (67%) being of unicellular origin and existing well before humans emerged and evolved. The fact that most of the hub genes are of unicellular origin adds further support to the atavistic model of cancer, according to which the biological origin of cancer is believed to date before the emergence of multicellular animals, approximately 600 million years ago. In the light of evolution, cancer arises as a phenomenon that is inextricably linked with multicellularity itself, and therefore this should be taken into consideration, in order to deeply understand and efficiently tackle probably one of the oldest chronic diseases on the planet. In this way, anti-neoplastic therapeutic strategies could be developed tailored to the atavistic model, by targeting the most recently evolved key innovations (i.e., the so-called weaknesses) that have lost their ancestral functionality in cancer.
AGG conceived the study; AL and AP designed and run the in silico experiments; AL, IT and AP analyzed the data; AL, IT, AP and AGG wrote the manuscript.
Not applicable.
Işıl Takan acknowledges the YÖK (Yükseköğretim Kurulu) scholarship.
This research received no external funding.
The authors declare no conflict of interest.
CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; COAD, Colon adenocarcinoma; DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; GBM, Glioblastoma multiforme; LAML, Acute Myeloid Leukemia; LUAD, Lung adenocarcinoma; LUSC, Lung squamous cell carcinoma; OV, Ovarian serous cystadenocarcinoma; PAAD, Pancreatic adenocarcinoma; READ, Rectum adenocarcinoma; SKCM, Skin Cutaneous Melanoma; STAD, Stomach adenocarcinoma; TGCT, Testicular Germ Cell Tumors; THYM, Thymoma; UCEC, Uterine Corpus Endometrial Carcinoma; UCS, Uterine Carcinosarcoma.