Bioinformatic approaches to the investigation of the atavistic genes implicated in cancer

disturbing hub genes of the human gene network. This results to a primitive cellular phenotype which resembles a unicellular life form. Methods : In the present study, we have employed bioinformatic approaches for the in-depth investigation of twelve atavistic hub genes ( ACTG1 , CTNNA1 , CTNND1 , CTTN , DSP , ILK , PKN2 , PKP3 , PLEC , RCC2 , TLN1 and VASP ), which exhibit highly disrupted interactions in diverse types of cancer and are associated with the formation of metastasis. To this end, phylogenetic analyses were conducted towards unrav-elling the evolutionary history of those hubs and tracing the origin of cancer in the Tree of Life. Results : Based on our results, most of those genes are of unicellular origin, and some of them can be traced back to the emergence of cellular life itself (atavistic theory). Our findings indicate how deep the evolutionary roots of cancer actually are, and may be exploited in the clinical setting for the design of novel therapeutic approaches and, particularly, in overcoming resistance to antineoplastic treatment.


Abstract
Introduction: Cancer is a widespread phenomenon occurring across multicellular organisms and represents a condition of atavism, wherein cells follow a path of reverse evolution that unlocks a toolkit of ancient preexisting adaptations by disturbing hub genes of the human gene network.This results to a primitive cellular phenotype which resembles a unicellular life form.Methods: In the present study, we have employed bioinformatic approaches for the in-depth investigation of twelve atavistic hub genes (ACTG1, CTNNA1, CTNND1, CTTN, DSP, ILK, PKN2, PKP3, PLEC, RCC2, TLN1 and VASP), which exhibit highly disrupted interactions in diverse types of cancer and are associated with the formation of metastasis.To this end, phylogenetic analyses were conducted towards unravelling the evolutionary history of those hubs and tracing the origin of cancer in the Tree of Life.Results: Based on our results, most of those genes are of unicellular origin, and some of them can be traced back to the emergence of cellular life itself (atavistic theory).Our findings indicate how deep the evolutionary roots of cancer actually are, and may be exploited in the clinical setting for the design of novel therapeutic approaches and, particularly, in overcoming resistance to antineoplastic treatment.

Introduction
Cancer's origin dates back to the emergence of multicellularity itself, about one billion years ago [1], since cancer and cancel-like phenomena have been observed in almost all species that exhibit either clonal or aggregative multicellularity [2], indicating that spontaneous tumor for-mation has deep evolutionary roots [3].Cancer is the result of the breakdown of complex molecular and cellular mechanisms, which are necessary to enable multicellular cooperation by regulating cell growth, cell differentiation, cell death and senescence, resulting in a more primitive cellular phenotype that resembles a unicellular life form [4].Each one of the hallmarks of cancer [5] is a direct "violation" of the principles of multicellular cooperation [2].In point of fact, the transition from unicellular to multicellular life was only possible when cooperating cells acquired a selective advantage over those who lived independently by inhibiting their own growth and replication [6].
From an evolutionary perspective, cancer is suggested to occur because early in the evolution of life, cells were 'designed' so as to maximize their replication capacity.However, cancer is actually uncommon because during the emergence of multicellularity, natural selection at the level of the individual led to the emergence of robust mechanisms to suppress it [1].Nevertheless, the paradox of cancer is that cancer cells, which initially disrupt the principles of multicellular cooperation, end up implementing those same principles [7,8], especially in neoplasms of advanced stages [2]; as a consequence, tumors resemble more of an ecosystem than a simple aggregate of cells [1] or a rather "pseudomulticellular neotissue" [9].Cancer cells represent a lower level of organization of life, similar to our Cambrian ancestors, and as such are capable of transitioning from multicellularity to unicellularity, but can never adapt the phenotype of complex multicellularity [10].
Although healthy cells, of both unicellular and multicellular origin, exhibit a finely tuned coordination of expression of biological processes during carcinogenesis, this coordination is markedly disrupted, resulting in a characteristic up-regulation of genes of unicellular origin, detected in various human cancer types where cancer cells maintain control over cell cycle activity [11], as well as significant inactivation of genes which are associated with multicellularity and, therefore, have evolved more recently [4].This phenomenon of enhanced segregation of cellular processes with different evolutionary ages is called "mutual exclusivity" and is common among tumors, and is of particular importance for tumor development [4] and, ultimately, cancer progression.
For mutual exclusivity events to occur, certain fundamental alterations to the gene network need to take place.The human gene network consists of two main subnetworks comprised of genes of unicellular and multicellular origin, respectively.The "multicellular" network has been progressively built upon the "unicellular" one during billions of years of evolution, which led to the formation of an intricate network [12,13], dedicated to maintain the complex phenotypes and cooperative growth required for multicellularity [4].Those subnetworks are interconnected in the human gene network through "hub" genes that appeared during the early metazoan life in order to enhance in-tercellular cooperation [14], and more precisely at the evolutionary boundary between unicellularity and multicellu-larity; as a result, they reflect key points in the evolutionary transition from unicellular to multicellular life [15].Those genes represent "points of vulnerability", since mutual exclusivity occurs particularly due to the alteration of their interactions [4].Hence, only a limited amount of driver mutations is thought to be responsible for the transition of a normal cell to a malignant state [16][17][18], indicating that mutations in these hub genes result in widespread dysregulation of the gene network and are sufficient to initiate carcinogenesis, partially, through a process of de-differentiation [19].Particularly, since the "unicellular" subnetwork is denser and exhibits a higher inside/outside interaction ratio compared to the "multicellular" subnetwork, the former acts as an attractor that renders the cells of multicellular organisms vulnerable to carcinogenesis (principles of atavistic model for cancer evolution) [20].
The phenotype exhibited by cancer cells resembles that of a unicellular form of life because it is achieved through a process of de-differentiation, also referred to as "reverse evolution" [21].Through a series of consecutive reversionary transitions, cancer progression follows a similar pattern, but in reverse, that is, to the gradual transition from unicellularity to multicellularity [22].Particularly, the emergence of common characteristics among cancer cells, regardless of the tissue they originated from, indicates that the occurrence and progression of cancer may be a controlled transition from a complex multicellular phenotype to a primitive unicellular one [15].Through a small number of mutations in hub genes, mutual exclusivity events occur, and cancer cells activate pre-existing ancestral genes and pathways which render cancer cells remarkably robust.Besides, the mechanisms and genes involved in carcinogenesis are mainly evolutionarily ancient and highly conserved, mostly because they play a crucial role in vital cellular functions of a healthy cell [23].Therefore, according to the "atavistic model" of cancer, cancer represents an atavism at the cellular level and cancer cells are not just "rogue" cells that were generated through a series of random mutations, but rather an ancient form of life that lies dormant within healthy metazoan cells [24].In other words, cancer cells do not construct a gene network ab intitio and acquire traits through random mutations and a few decades of internal Darwinian selection within the host's body, but strategically take advantage of certain components of the existing gene networks [24].Characteristic examples include the healthy cells' ability to multiply rapidly and migrate during wound healing [25], traits that at the same time render cells vulnerable to cancer [26].The same applies to rapid angiogenesis, which is necessary in wound healing to supply new cells with oxygen and nutrients [25], but also a hallmark of cancer [27], with cancer cells even utilizing the same angiogenic signaling pathways as the ones used by the healthy cells of a multicel-lular organism to develop the vascular system [28].Another example is the behavior of cells in the early developmental stages, such as the invasion of cells into adjacent developing tissues during gastrulation and the ability of some cells to transform from immotile epithelial cells into motile mesenchymal cells, a process termed "epithelial-mesenchymal transition" (EMT).If cells do not possess the necessary capabilities to perform these functions, growth cannot be achieved; however, these same cell capabilities enable cancer cells to metastasize [26].
The pathways of cell differentiation are initially identical to all organisms and then branch off in different taxonomic divisions [29].Accordingly, tracing the origin of emergence and evolutionary history of certain genes can be important, especially for the most ancient genes that play a crucial role in cancer progression compared to the more recently evolved ones, since the effects of inactivating the former tend to be more pronounced as they are more likely to lead to cell death [24].
In the present study, we investigated certain genes which have previously been proven to exhibit a highly disrupted number of interactions across multiple tumor types and contribute to the phenomenon of mutual exclusivity [4].More precisely, Trigos and colleagues [4] studied a pair of cellular processes, that is, chromosome organization and cellular junction organization.The co-expression of genes involved in those processes is highly important for the eukaryotic cells of a multicellular organism and the coexpression of those genes was strongly disrupted in tumors.Both of these processes involve genes of both unicellular and multicellular origin, and the dramatic change in the coexpression of those genes suggests that mutual exclusivity between these processes occurs during carcinogenesis and is actually advantageous to the development of different cancer stages such as early cancer, late cancer, and metastasis.A set of twelve atavistic genes, which represent the "hubs" between the unicellular and multicellular processes was identified: RCC1, TLN1, VASP, ACTG1, PLEC, CTTN, DSP, ILK, PKN2, CTNNA1, CTNND1 and PKP3.Those genes act as regulators of co-expression of the genes involved in the two aforementioned processes and their hubness was dramatically changed in seven different solid tumor types.Therefore, due to their central role in the human gene network, those genes represent fundamental points of vulnerability particularly regarding the phenomenon of mutual exclusivity and, consequently, carcinogenesis.To these data is added the fact that these 12 genes interact with genes associated with genomic instability, as well as genes associated with poor prognosis for cancer progression and metastasis [30].Moreover, these 12 genes play a critical role in regulatory networks associated with genomic instability and metastasis and are generally involved in key processes of carcinogenesis [4].Therefore, these genes can be considered as pan-cancer molecular markers or regulators of malignancy in diverse cancer tumors [4].
To this end, we have employed a bioinformatics approach to explore the involvement of those genes in various cancer types and performed phylostratigraphic analyses [14,31], in an effort to elucidate the evolutionary trajectory of these genes aiming towards tracing the origin of cancer in the Tree of Life.
The official names of the genes were initially retrieved from the HGNC database [32,33] and then the accession numbers of the peptide sequences corresponding to the human genes were retrieved from the publicly available non-redundant NCBI Reference Sequence Database (Ref-Seq) [34].The amino acid sequences were used subsequently as probes in an extensive series of BLASTp [35] reciprocal searches in order to obtain the canonical homologous amino acid sequences corresponding to the species included in this study.This process was reiterated until no novel sequences could be detected, ensuring that a full representation of each gene's family is obtained.

Alignment and phylogenetic analysis
To investigate the evolutionary history of each gene, we conducted comprehensive phylogenetic analysis using the entire length protein sequences of the species under study.The full-length amino acid sequences were aligned with MAFFT v.7 (https://mafft.cbrc.jp/alignment/server/) [36].The alignments were subsequently used to reconstruct phylogenetic trees by employing two separate methods, Maximum Likelihood, a method based on a heuristic approach for finding the optimal tree that fits the observed data, and Neighbor Joining, a method based on a hierarchical clustering algorithm [37], as implemented in the software package MEGA (https://www.megasoftware.net/),version 10 [38].Furthermore, MEGA v. 10 [38] was used to estimate the best-fit substitution model, which best describes the number of observed amino acid substitutions per position.For both methods of phylogenetic tree reconstruction, bootstrap analyses (150 pseudo-replicates) were conducted in order to evaluate the robustness and the statistical significance of the inferred reconstructed trees.Finally, the phylogenetic trees were visualized with MEGA v.10 [38].

Pathways identification
The pathways and biological processes in which each gene is involved were retrieved from the Reactome Pathway database [39], the KEGG Pathways database [40,41] and the biomedical literature.

Differential gene expression analysis
RNA sequencing data for tumor and corresponding normal tissue samples from the TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) databases, respectively, were retrieved from the GEPIA2 website (http://gepia2.cancer-pku.cn/).The atavistic genes that are differentially expressed (DE) between tumor and normal samples were identified using ANOVA; threshold value for absolute log fold change | log 2 FC| ≥2 and FDRadjusted p-value (or q-value) ≤0.05.

Functional network of hubs
The twelve hub genes were provided as input to the STRING (version 11.0) (https://string-db.org/)database [42] of investigating and visualizing both known and predicted gene/protein associations.

Atavistic genes' association with cancer
By conducting a thorough and comprehensive review of studies published in PubMed (https://pubmed.ncbi.nlm.nih.gov/) and The Human Protein Atlas [43], we found those cancers each of the 12 genes is involved in; even different types of cancers that can affect the same organ (Table 1, Ref. ).In this context, certain genes have been shown to exhibit either oncogenic or tumor suppressive effects.In fact, both qualitative and quantitative modifications were identified in the genes under study which are associated with the development and progression of different cancers in different tissues.In particular, the types of alterations observed in those genes and/or their protein products, include aberrant, increased or decreased, expression (Table 2), epigenetic modifications, mutations, gene inactivation or amplification, copy number alterations, altered protein interactions and post-translational modifications, characteristic protein subcellular localization.Furthermore, these genes constitute diagnostic and prognostic biomarkers in cancer, regarding malignancy, disease stage, clinical subtypes, disease progression, metastasis, clinical outcome and survival and prediction of patients' response to therapeutic treatments.In addition, each gene and/or its products constitute drug targets or are considered as novel potential drug targets for at least one type of cancer.
Of note, based on the STRING database analysis, the most significantly enriched pathway of the 12 gene/proteins is 'cell junction assembly' (GO: 0034329; FDR = 4.10 × 10 −15 ).In this network (Fig. 1), ten hub genes/gene products appear to interact either physically or functionally in the process of cell juction organization, having an integral role in EMT which is essential in cancer progression and metastasis [238].
Therefore, the importance of the examined genes is enhanced by their involvement in pathways related to carcinogenesis and the progression of the disease in various types of cancer.

Phylogenetic analysis
In terms of overall topology, there is congruence between the phylogenetic trees that were generated with both methods.The trees constructed with the maximum likelihood method are considered more accurate and reliable and they display higher bootstrap values (Figs.2,3,4, 5,6,7,8,9,10,11,12).These high bootstrap values indicate that the tree nodes are statistically significant and the inferred topology is biologically significant.The trees generated with the neighbor-joining method can be found at the Supplementary material.

CTNND1
In fact, p120 catenin appears to have both pro-oncogenic and anti-oncogenic effects, depending on the localization and the specific function of p120 catenin in each cell compartment.[183] Loss, downregulation or mislocalization of p120 catenin is observed in most human cancers.[184] Bone   The ILK, CTNNA1, CTNND1 and PKP3 genes are detected exclusively in multicellular Animals as shown in the phylogenetic trees in Figs.2,3,4, respectively.Therefore, those genes most likely first appeared in a eukaryotic multicellular organism which was the common ancestor of Metazoa.
Catenins A (CTNNA) and catenins D (CTNND) are members of the catenin superfamily (Fig. 3).CTNNA1, CTNNA2, CTNNA3, CTNND1, CTNND2 genes are paralogs which probably occurred through a series of gene duplication events.The corresponding orthologous proteins of each gene form distinct clades, and as a result the phylogenetic tree is divided into two major subtrees, comprised of protein sequences encoded by CTNNA and CTNND genes, respectively.The subtree of CTNNA includes the protein sequences encoded by CTNNA1, CTNNA2 and CTNNA3.CTNNA2 appears to be the primordial gene as it was first detected in Amphimedon queenslandica, thus it probably first emerged in an ancestor of Porifera after their divergence from Choanoflagellates, since an ortholog was not found in Monosiga brevicollis.On the other hand, CTNNA1 and CTNNA3 were detected for the first time in Callorhinchus milli, and thus, they appeared later in evolution and probably occurred due to duplication events of the CTNNA2 gene, as they most likely first appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since orthologs were not detected in Ciona intestinalis and in fact they demonstrate high similarity to each other.The subtree of catenins D includes the protein sequences encoded by CTNND1 and CTNND2.CTNND2 is apparently the primordial gene as it was first detected in Amphimedon queenslandica, and thus it probably first arose in an ancestor of Porifera after their divergence from Choanoflagellates, since an ortholog was not detected in Monosiga brevicollis.On the other hand, CTNND1 appeared later in evolution and probably occurred due to CTNND2 gene duplication, as it was detected for the first time in Ciona intestinalis.Therefore, CTNND1 might have arisen in an ancestor of Tunicates after their divergence from Echinodermata, given that CTNND1 orthologs were not detected in Strongylocentrotus purpuratus.
The PKP1, PKP2, PKP3, PKP4 genes are likely paralogs (Fig. 4) which probably occurred through a series of gene duplication events of an ancestral PKP gene.The corresponding orthologs of each gene form distinct clades.All PKP genes were detected for the first time in Callorhinchus milli, suggesting that they probably appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since PKP orthologs were not detected in Ciona intestinalis.Based on the inferred phylogenetic tree, however, PKP1 and PKP2 exhibit the highest similarity, followed by PKP3.We could speculate that PKP4 is the primordial gene of this family, since it is basal to PKP1, PKP2 VASP and Enah/Vasp-Like are likely paralogs (Fig. 5).In Monosiga brevicollis, Strongylocentrotus purpuratus, Ciona intestinalis and Callorhinchus milli the corresponding detected genes were annotated as VASP-like in RefSeq, but display high similarity to the VASP genes of the rest of Metazoa.
The PLEC and TLN1 genes were detected in unicellular and multicellular organisms, as shown in Figs.8,9, respectively, and they probably first appeared in a eukaryotic unicellular organism that was the common ancestor of Animalia and Fungi.
PLEC homologs were found in Animalia as well as in Fungi (SAC6 in Saccharomyces cerevisiae and FM1 in Schizosaccharomyces pombe) (Fig. 8).
The TLN1 and TLN2 genes are likely paralogs (Fig. 9), and their corresponding orthologous protein sequences form distinct clades.TLN1/2 is probably the primordial gene of this family, since it was first detected in Amphimedon queenslandica, Monosiga brevicollis, Ciona intestinalis and Strongylocentrotus purpuratus which gave rise to TLN1 and TLN2 after successive gene duplications.Talin homologs were also identified in Fungi, that is, SLA2P in Saccharomyces cerevisiae and END4 in Schizosaccharomyces pombe, as also confirmed in literature [239,240].Therefore, although members of the TLN1 gene family were found only in Metazoa, it has deep evolutionary roots in some ancient unicellular eukaryotic organism, which is the common ancestor of Animalia and Fungi.
The characteristically long clade corresponding to Monosiga brevicollis, as shown in Figs.5,6,7,8,9, is due to the evolutionary course of the organism, as it evolved for many millions of years separately from the rest of the Metazoa.
The evolutionary origin of ACTG1, RCC2 and PKN2 genes can be traced to a universal common ancestor, namely an ancient prokaryotic unicellular organism, that is the common ancestor of Animalia, Bacteria and Archaea as illustrated in Figs.10,11,12, respectively.The ACTA1, ACTA2, ACTB, ACTC1, ACTG1, ACTG2 genes are likely paralogs (Fig. 10), the orthologs of which cluster together in distinct clades.The phylogenetic tree is divided into two major subtrees, one comprised of protein sequences encoded by genes of different types of cytoplasmic actins and the other includes sequences encoded by genes of different types of muscle actins.
The subtree of cytoplasmic actins harbors the protein sequences encoded by ACTG1 (Actin cytoplasmic 2), ACTB (Actin cytoplasmic 1) as well as various other actin genes.ACTG1 is apparently the ancestral gene of this family as it was detected in Animalia, Bacteria (Microbacterium arborescens (Actinobacteria), Chlamydia trachomatis (Chlamydiae), Leptolyngbya sp.PCC 7376 (Cyanobacteria), Kangiella spongicola (Proteobacteria), Staphylococcus aureus (Firmicutes)), as well as Archaea (Candidatus Heimdallarchaeota (Archaea), Candidatus Lokiarchaeota (Archaea)).The sequences detected in bacteria do not cluster together, but are rather scattered among those corresponding to genes that encode the two types of cytoplasmic actins in Metazoa.This is indicative of high similarity among the aforementioned genes, since the encoded protein sequences appear to be highly conserved.Additionally, Actin genes found in Fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe) exhibit high similarity to Actin genes found in Archaea, which encode cytoplasmic actin 2. On the other hand, the ACTB gene likely appeared later in evolution and probably occurred due to a duplication event of the ACTG1 gene, as it was detected for the first time in Petromyzon marinus.Also, in Ciona intestinalis, a cytoplasmic actin gene and a muscle actin gene were identified, and in fact the gene encoding the cytoplasmic actin shows high similarity to the only ACTIN gene detected in Monosiga brevicollis.Finally, Actin-85C-Like gene found in Amphimedon queenslandica also demonstrates high similarity to types of cytoplasmic actin.
The subtree of muscle actins, includes the protein sequences encoded by ACTA1 (Actin alpha 1, skeletal muscle), ACTA2 (Actin alpha 2, smooth muscle), ACTC1 (Actin alpha cardiac muscle 1) and ACTG2 (Actin gamma 2, enteric smooth muscle).Muscle actin genes appear to be the primordial genes of all muscle actin encoding genes as they were first identified in Strongylocentrotus purpuratus and Ciona intestinalis; thus, the gene probably first appeared in a common ancestor of Echinodermata and Chordata after their divergence from Porifera, since an ortholog was not detected in Amphimedon queenslandica.ACTA2 and ACTC1 genes were first detected in Callorhinchus milli, ACTA1 in Danio rerio, and ACTG2 in Latimeria chalumnae, respectively.We would expect to find the ACTA1 and ACTG2 genes in Callorhinchus milli, as it is the first or- ganism in this study to have differentiated organs, but the sequences were not detected using BLAST.In any case, we believe that these genes occurred through a series of gene duplication events yielding four paralogs that probably first appeared in an ancestor of the Chondrichthyes after their divergence from Echinodermata and Tunicates.
In summary, those genes encoding different types of cytoplasmic actins are evolutionarily older, since those  genes that encode different types of muscle actins appeared later in evolution.Specifically, ACTG1 is the ancestral gene of the entire family, since it is the evolutionarily older out of all paralogs found in Metazoa.
The RCC1 and RCC2 genes are likely paralogs, and their corresponding orthologous protein sequences form distinct clades (Fig. 11).RCC1 is probably the primordial gene of this family as it was detected in Metazoa, Bacteria Bifidobacterium asteroides (Actinobacteria), Parachlamydia sp.C2 (Chlamydiae), Synechococcus sp.WH 8103 (Cyanobacteria), Myxococcus xanthus (Proteobacteria), Cohnella sp.CAU 1483 (Firmicutes), Hymenobacter chitinivorans (Bacteroidetes) and Methanocella conradii (Archaea).RCC1 and RCC2 homologs have also been identified in Fungi, that is, SRM1 in Saccharomyces cerevisiae and PIM1 in Schizosaccharomyces pombe, as confirmed in literature [241,242].Furthermore, although the corresponding gene in Monosiga brevicollis and Amphimedon queenslandica is annotated as RCC in RefSeq, it displays high similarity to the RCC1 gene.Thus, RCC1 probably first appeared in a common ancestor of Eukaryotes, Bacteria and Archaea.RCC2 on the other hand, might have appeared later in evolution, as it was detected for the first time in Strongylocentrotus purpuratus, and probably occurred due to duplication of the RCC1 gene in an ancestor of Echinodermata after their divergence from the phylum of Porifera, since an RCC2 ortholog was not detected in Amphimedon queenslandica.Therefore, although RCC2 is found only in Metazoa, its ancestor is traced to some ancient prokaryotic unicellular organism, that is, the common ancestor of Eukaryotes, Bacteria and Archaea.
PKN was detected in Amphimedon queenslandica, in the unicellular Choanoflagellate Monosiga brevicollis, in Bacteria Streptomyces seoulensis (Actinobacteria), Chlamydia ibidis (Chlamydiae), Chloracidobacterium thermophilum (Cyanobacteria), Escherichia coli (Proteobacteria), Paenibacillus donghaensis (Firmicutes), Spirosoma panaciterrae (Bacteroidetes) and in Archaea (Methanoregula boonei and Thermococcus thioreducens) and is likely the primordial gene of this family and precursor of the PKN1, PKN2 and PKN3 genes which probably occurred through a series of PKN gene duplication events.The PKC gene was also detected in Petromyzon marinus, as well as in Fungi (Sac- charomyces cerevisiae and Schizosaccharomyces pombe).PKN2 is apparently the oldest one as it was first detected in Ciona intestinalis and Strongylocentrotus purpuratus, so it probably first appeared in a common ancestor of Tunicates and Echinodermata after their divergence from Choanoflagellates, since an PKN2 ortholog was not detected in Monosiga brevicollis.PKN1 most likely emerged later as it was first detected in Callorhinchus milli; it probably first appeared in an ancestor of Chondrichthyes after their divergence from Tunicates, since an PKN1 ortholog was not detected in Ciona intestinalis.PKN3 was detected for the first time in Latimeria chalumnae, so it probably first emerged in an ancestor of Actinists after their divergence from Osteichthyes, since an PKN3 ortholog was not detected in Danio rerio.Therefore, although PKN2 was found exclusively in Metazoa, it has very deep evolutionary roots found in some ancient prokaryotic unicellular organism, that is the common ancestor of Animalia, Bacteria and Archaea.

Discussion
According to the findings of this in silico study, three different types of genes were identified within the 12 'hub' genes, in terms of their evolutionary age.Fırst and foremost, genes of unicellular origin that are in fact associated with the emergence of the first cellular life forms.In particular, ACTG1, RCC2 and PKN2 were detected in all domains of life, namely the empire Eukaryota (Animal and Fungi kingdoms) and the empire Prokaryota (Eubacteria and Archaebacteria kingdoms).Second, genes of unicellular origin that are associated with the emergence of the first eukaryotic life forms, namely the VASP, DSP, CTTN, PLEC and TLN1 genes.Third, genes of multicellular origin that are in fact associated with the evolution of multicellularity in the Animal kingdom.In particular, the ILK, CTNNA1, CTNND1 and PKP3 genes were detected exclusively in Metazoa; hence, they could be considered as genes of multicellular origin.Therefore, these are highly conserved genes of unicellular origin, which are in fact associated with the emergence of the first cellular life forms.Consequently, the investigated genes are considered to have deep evolutionary roots, with the most recently evolved ones being linked to the emergence of Metazoa and the most ancient ones having an evolutionary age of billions of years, thereby coinciding with the emergence of the first cellular life forms (Fig. 13).
In addition to their evolutionary age, the genes are involved in multiple cancer-related pathways and processes, and are associated with various forms of cancer, especially metastasis.Furthermore, they are characterized as hub genes and hold a particular position in the human gene network, on the boundary of unicellularity and multicellularity, and thus, contribute to the phenomenon of mutual exclusivity.Of particular note, all 12 genes have been linked to the most aggressive trait of cancer, that is, metastasis, in various types of cancers.
To further support the theory of "atavistic reversion" of cancer, the emerging model of the germ-and-soma life cycle [243] shows that cancer, uses not only first cellular, first eukaryotic, and Metzaoa I genes, but also genes of the late evolved unicellular organisms, respectively stemness and differentiation genes, as well as genome repair genes, including their own mechanisms of cancer stem cell production.
The findings of the present study can be applied on the design of therapeutic strategies, since the investigated genes and their products, as well as the processes and pathways in which they participate, could represent candidate pan-cancer biomarkers and potential targets for the development of a new class of pan-cancer treatment protocols that can be applied to any type of cancer.The development of therapeutic strategies based on the analysis of the configuration of the human gene network itself have been proposed.These approaches entail the cellular processes of a specific evolutionary age to be used as targets, and espe-cially the genes of unicellular or multicellular origin that are highly interconnected and contribute to the phenomenon of mutual exclusivity, rendering them vulnerable and also potential drug targets [15].
Although the concept that cancer is linked with evolution was first proposed almost a century ago, laying in this way the foundation for the atavistic model of cancer [244][245][246], the application of evolutionary biology approaches to the study of neoplasms' formation and progression is a recent endeavor.Comparative oncology is a novel and highly promising field of cancer research that can lead to a deeper understanding of cancer and contribute to the discovery of novel biomarkers and clinical therapeutic strategies.
Several other possible evolutionary approaches to cancer treatment and prevention have been proposed, mainly to address the problem of cancer cells' remarkable resistance to therapeutic regimens.Given the resilience and diversity of different forms of cancer, the key lies in the common characteristics of all cancer cells, regardless of tumor type.The pathways and genes involved, in particular, can be utilized to the design of drugs that target cancer cells selectively and are effective against any cancer cell [5].
Another very promising approach concerns neoplasm ecology.In particular, the therapeutic approaches that have been proposed include the use of a combination of drugs [247], alteration of the competition between healthy and cancer cells by enhancing the adaptability of the former [248], selection of cells sensitive to chemotherapy [248] or cells exhibiting genetic stability [249] and, above all, treatments aiming at preventing the survival of resistant cells which lead to disease recurrence and recession after treatment [250].
Apart from the type of treatment, the manner in which it is administered is also important, as it can influence the evolutionary dynamics of tumor cells [250].Chemotherapy, for example, has different effect on cell competition when administered in large individual doses instead of small continuous doses [251], since the latter strategy diminishes the likelihood of creating resistant clones in the neoplasm population [250].

Conclusions
Herein, we have made an effort to track the emergence of twelve important hub cancer genes in the evolutionary history by phylostratigraphy.Based on our analyses, those genes that have been proven to play a crucial role in several aspects of cancer biology, as being part of an intricate regulatory network, are evolutionarily ancient, with a high fraction of them (67%) being of unicellular origin and existing well before humans emerged and evolved.The fact that most of the hub genes are of unicellular origin adds further support to the atavistic model of cancer, according to which the biological origin of cancer is believed to date before the emergence of multicellular animals, approximately 600 million years ago.In the light of evolution, cancer arises as a phenomenon that is inextricably linked with multicellularity itself, and therefore this should be taken into consideration, in order to deeply understand and efficiently tackle probably one of the oldest chronic diseases on the planet.In this way, anti-neoplastic therapeutic strategies     could be developed tailored to the atavistic model, by targeting the most recently evolved key innovations (i.e., the so-called weaknesses) that have lost their ancestral functionality in cancer.

Author contributions
AGG conceived the study; AL and AP designed and run the in silico experiments; AL, IT and AP analyzed the data; AL, IT, AP and AGG wrote the manuscript.

Fig. 1 .
Fig. 1.STRING network depicting the associations (connecting lines) of the hub genes/gene products (nodes) under investigation in the cell junction assembly.

Fig. 2 .
Fig. 2. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the ILK genes.TNNI3K is used as outgroup.The branch length represents evolutionary distance.A discrete Gamma distribution was used to model evolutionary rate differences among sites.The scale bar at the lower left denotes the length of amino acid substitutions per site.

Fig. 3 .
Fig. 3. Maximum likelihood-based unrooted phylogenetic tree of the protein sequences encoded by the CTNNA1, CTNNA2, CTNNA3, CTNND1 and CTNND2 genes.The conventions are the same as in Fig. 2.

Fig. 4 .
Fig. 4. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PKP1, PKP2, PKP3 and PKP4 genes.CTNND2 is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 5 .
Fig. 5. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the VASP and ENAH-LIKE/ENAH/VASP-LIKE genes.EVL is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 6 .
Fig. 6.Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the CTTN genes.HCLS1 is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 7 .
Fig. 7. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the DSP genes.PLEC is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 8 .
Fig. 8. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PLEC genes.MAKF1 is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 9 .
Fig. 9. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the TLN1 and TLN2 genes.HIP1R is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 11 .
Fig. 11.Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the RCC1 and RCC2 genes.HERC3 is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 12 .
Fig. 12. Maximum likelihood-based rooted phylogenetic tree of the protein sequences encoded by the PKN1, PKN2, PKN3, PKN, PKC genes.SGK2 is used as outgroup.The conventions are the same as in Fig. 2.

Fig. 13 .
Fig. 13.Species tree representing the evolutionary relationships among taxa under study.The marker symbols on the tree nodes indicate the common ancestral taxon, to which the evolutionary roots of a given gene family under study are traced.

Table 3 . Atavistic genes' association with pathways and processes related to cancer.
PKP3 appeared later in evolution, probably due to PKP4 gene duplication, while PKP1 and PKP2 have emerged even more recently through another series of duplication events in a chondrichthyan PKP1/2 gene and have not accumulated a large number of mutations.The VASP, CTTN and DSP genes are detected during the relatively late transition phase from unicellularity to multicellularity, based on the inferred phylogeny shown in Figs.5,6,7, respectively.