Structural Characterization and Comparative Analyses of the Chloroplast Genome of Eastern Asian Species Cardamine occulta (Asian C . flexuosa With.) and Other Cardamine Species

Background : Cardamine flexuosa is considered to be two separate species in the Cardamine genus based on their geographical distribution: European C. flexuosa and Eastern Asian C. flexuosa . These two species have not shown any morphological differences to distinguish each other. Recently, the Eastern Asian species has been regarded as Cardamine occulta by their ecological habitats. Therefore, we are interested in analyzing the C. occulta chloroplast genome and its characteristics at the molecular level. Methods : Here, the complete chloroplast (cp) genome of C. occulta was assembled de novo with next-generation sequencing technology and various bioinformatics tools applied for comparative studies. Results : The C. occulta cp genome had a quadripartite structure, 154,796 bp in size, consisting of one large single-copy region of 83,836 bp and one small single-copy region of 17,936 bp, separated by two inverted repeats (IRa and IRb) regions of 26,512 bp. This complete cp genome harbored 113 unique genes, including 80 protein-coding genes, 29 tRNA, and four rRNA genes. Of these, six PCGs, eight tRNA, and four rRNA genes were duplicated in the IR region, and one gene, infA , was a pseudogene. Comparative analysis showed that all the species of Cardamine encoded a small variable number of repeats and SSRs in their cp genome. In addition, 56 divergences (Pi > 0.03) were found in the coding (Pi > 0.03) and non-coding (Pi > 0.10) regions. Furthermore, KA/KS nucleotide substitution analysis indicated that thirteen protein-coding genes are highly diverged and identified 29 amino acid sites under potentially positive selection in these genes. Phylogenetic analyses suggested that C. occulta has a closer genetic relationship to C. fallax with a strong bootstrap value. Conclusions : The identified hotspot regions could be helpful in developing molecular genetic markers for resolving the phylogenetic relationships and species validation of the controversial Cardamine clade.


Introduction
The chloroplast genomes have a stable and straightforward genetic structure, haploid, and are generally uniparentally transmitted [1]. This organelle is involved in plant cells for nitrogen fixation, photosynthesis, biosynthesis of starch, fatty acids, essential amino acids, and pigments [2,3]. The cp genomes of flowering plants usually have a typical circular structure, 107-280 kb in length, that consists of a large single-copy (LSC) and a small singlecopy (SSC) region, which are separated by two large, inverted repeats (IRs) region [4,5]. Owing to the maternal inheritance characteristics, the nucleotide substitution rate of the cp genes is lower than that of nuclear genes but higher than that of the mitochondrial genes. Nevertheless, the rate of plastome genome evolution appears to be taxon and gene-dependent [6]. Therefore, techniques for analyzing the molecular phylogeny of plants are strongly dependent on plastome genome sequence data [7]. Thus far, more than 6100 land plant chloroplast genomes are available at the NCBI organellar genome database, which can be used for comparative studies to resolve the phylogenetic implications of the controversial clade. In addition, recent studies showed that the cp genome encompasses various polymorphic regions at both coding and non-coding regions generated through genomic expansion, contraction, inversion, indel, or genome rearrangement that could be used widely as an effective tool for plant phylogenomic analyses [8].
The genus Cardamine (bittercress) is one of the largest genera of the family Brassicaceae and is distributed widely across all the continents except Antarctica [9]. This genus comprises more than 200 exceptionally complex species and remains controversial and unresolved in many circumstances [10]. Among the Cardamine taxa, Cardamine flexuosa With. is distributed in Europe and Eastern Asia [10]. Moreover, the two taxa have not shown any morphological differences that can be used to distinguish these species [11]. Until 2006, the Eurasian taxa, C. flexuosa, is considered a single species [10]. From 2006 onwards, the C. flexuosa was considered two different species based on their location [12]. Recent studies showed that these two species differed by their ecological habitats and reported that the Eastern Asian C. flexuosa species should be considered C. occulta [9]. Differences were also found in the ploidy level. The tetraploid species C. flexuosa originated from Europe, whereas the octoploid C. occulta Hornem. is from Eastern Asia and introduced to other continents [11,13]. The diploid species C. amaraeformis and C. hirsuta are the parental species for C. flexuosa. In contrast, the tetraploidy C. scutata (Diploid species C. amaraeformis and C. parviflora as the parents) and C. kokaiensis (Diploid species C. parviflora as the parent) are the parental for C. occulta [9].
Lihova et al. [10] reported that the populations of Eastern species C. occulta distinguished from the European species C. flexuosa based on the phylogenetic studies of internal transcribed spacer (ITS) region of rDNA and the trnL-trnF region of cpDNA. From the biogeographical perspective of C. flexuosa, the diploid parental species C. amaraeformis is currently absent from Eastern Asia, whereas the tetraploids C. scutata and C. kokaiensis are distributed in Eastern Asia [14]. Previous studies suggested that the morphologically close species of C. amaraeformis are C. torrentis Nakai, C. amariformis Nakai, and C. valida, present in Eastern Asia [15,16]. Therefore, the diploid C. amaraeformis may have had a significantly broader dispersal area in the past, reaching easternmost Asia and contributing to multiple polyploidization events there [17]. A previous study characterized the cp genome of the parental species C. amaraeformis for C. occulta [18]. Therefore, the present study is interested in characterizing the complete chloroplast genome sequence of C. occulta (Asian C. flexuosa With.), and phylogenetic studies were carried out to resolve this issue. Moreover, there are no extensive comparative studies of the Cardamine genera at the whole plastome level. Therefore, this study compared the cp genome of C. occulta with other fourteen species of the Cardamine genomes and identified hotspot regions that could help develop the molecular markers to distinguish the controversial Cardamine species. Overall, this study will provide valuable information for understanding the evolutionary relationship of C. occulta in the Cardamine clade.

Annotation of C. Occulta Chloroplast Genome
The online program Dual Organeller GenoMe Annotator (DOGMA) was accomplished to annotate the chloroplast genome sequence of C. occulta [23]. The initial annotation, putative starts, stops, and intron positions of homologous genes were improved by comparing with the closely related species of Cardamine. The transfer RNA genes were confirmed using the tRNAscan-SE version1.21 with default settings [24]. A circular cp genome map of the C. occulta was produced using the OrganellarGenome DRAW (OGDRAW) program [25].

Comparative Chloroplast Genome Analysis of Cardamine Genus
The mVISTA program in the Shuffle-LAGAN model was applied to analyze the cp genome of C. occulta with 14 other closely related cp genomes of Cardamine genus, applying C. occulta annotation as a reference [26]. The boundaries between the IR and SC regions of all the genera of Cardamine were also compared and investigated.

Analysis of the Genetic Divergence in the Cardamine cp Genomes
The genetic divergence was investigated by extracting and aligning the protein-coding genes, intergenic and intron-containing regions of 15 Cardamine species cp genome individually using Geneious Prime (Biomatters, New Zealand). The genetic divergence among the Cardamine species was estimated using nucleotide diversity (π) and the whole number of polymorphic sites by DnaSP v5 [27]. In this analysis, gaps and missing data were excluded.

Characterization of the Substitution Rates of Cardamine cp Genomes
The cp genome of C. occulta was compared with the other 14 species of Cardamine cp genomes to determine the synonymous (K S ) and non-synonymous (K A ) substitution rates. The specific individual functional proteincoding gene exons of these genomes were extracted and aligned separately using Geneious Prime (Biomatters, New Zealand). The aligned sequences were translated into protein sequences and evaluated using DnaSP for K A and K S substitution rates without stop codon [27].

Positive Selection Analysis
Positive selection analysis was carried out based on the substitution analysis of the Cardamine species. The site-specific model was applied to estimate the nonsynonymous (K A ) and synonymous substitution (K S ) ratio of thirteen protein-coding genes (atpB, ccsA, cemA, matK, ndhA, ndhD, ndhF, ndhG, ndhJ, petA, petD, rps16, and ycf2) of all Cardamine species using EasyCodeML [28]. The sequence of all the thirteen protein-coding genes was aligned separately using the MAFFT program, and the maximum likelihood phylogenetic tree was constructed using RAxML v. 7.2.6 [29]. The codon substitution models M0, M1a, M2a, M3, M7, M8, and M8a were analyzed. The likelihood ratio test was performed to detect the positively selected sites: M0 (one-ratio) vs. M3 (discrete), M1a (neutral) vs. M2a (positive selection) and M7 (β) vs. M8 (β and ω > 1) and M8a (β and ω = 1) vs. M8, which were compared using a site-specific model [28]. The likelihood ratio test (LRT) of the comparison was achieved to evaluate the selection strength. The p-values of a Chi-square (χ 2 ) < 0.05 were considered significant. If the LRT p-values were significant (<0.05), the Bayes Empirical Bayes (BEB) method was implemented to identify the codons under positive selection. BEB values higher than 0.95 and 0.99 indicate the sites possibly under positive selection and highly positive selection, which is implied by asterisks and double asterisks, respectively.

Repeat Sequences and Single Sequence Repeats (SSR) Analysis of Cardamine Genus
The program REPuter was used to determine the presence of repeat sequences in the Cardamine cp genomes, including forward, reverse, palindromic, and complementary repeats [30] The following parameters were used to detect repeats in REPuter: (1) Hamming distance 3, (2) minimum sequence identity of 90%, (3) and a repeat size of more than 30 bp. In addition, Phobos software v1.0.6 was used to find the SSRs in Cardamine cp genomes; parameters for the match, mismatch, gap, and N positions were set at 1, -5, -5, and 0, respectively [31]. Only one IR region was used in the repeat and SSR marker analyses.

Phylogenetic Tree Analysis of Brassicaceae
This study used the cp genomes of 38 Brassicaceae species and two outgroup species for phylogenetic analysis based on 68 homologous CDs, LSC, SSC, and IR regions and the whole genomes separately. The 39 completed cp genome sequences were downloaded from the NCBI Organelle Genome Resource database (Supplementary Table 1). For ML analysis, the aligned protein-coding gene sequences were saved in PHYLIP format using Clustal X v2.1. Phylogenetic analysis was analyzed using the maximum likelihood (ML) method and the GTRGAMMA model using RAxML v. 8.2.X with 1000 bootstrap replications [29]. The same five individual data sets were also performed by Bayesian Markov chain Monte Carlo (MCMC) inference using the MrBayes v3.2.6 [32,33] phylogenetic tree in Geneious Prime v2022.0.2. The gamma model of rate variation and the HKY85 substitution model were used for this analysis.

General Characteristics of the Cardamine occulta Chloroplast Genome
The complete Cardamine occulta chloroplast genome showed a quadripartite structure comprised of 154,796 bp, including a small single-copy (SSC) region of 17,936 bp and a large single-copy (LSC) region of 83,836 bp, which were separated by a pair of inverted repeats (IRa and IRb) of 26,512 bp ( Fig. 1; Table 1). The average GC content of the cp genome was 36.3%. The IR regions had the highest GC content (42.4%), followed by the LSC (34%) and SSC regions (29.2%). The C. occulta cp genome encoded 113 unique genes: 80 protein-coding genes, 29 tRNAs and four rRNAs. Among the 113 genes, fourteen contained one intron (eight protein-coding and six tRNA genes), and three encoded two introns (clpP, ycf3, and rps12). The rps12 gene was a trans-spliced gene with its 5'-end exon located in the LSC region and its intron 3'end exon duplicated in IR regions. In addition, 18 genes were duplicated in the IR regions (Supplementary Table  2).

Comparative Analysis of the Species of Cardamine Genera
The cp genome border LSC-IRb and SSC-IRa of C. occulta were compared with the other fourteen species of Cardamine genera (Fig. 2). The intact copy of the rps19 gene was distributed in the LSC/IRb border of all Cardamine species and dividends 106 bp to 135 bp in the IRb region resulting in the rpl2 gene being situated in the IRb region. Similarly, the pseudogene, ycf1, and ndhF are present in the IRa/SSC border of all the Cardamine cp genomes that exhibit overlap. The overlap of these two coding regions was conserved from 30-192 bp in the border of IRa/SSC of their cp genomes. In all the species of Cardamine genera cp genomes, the SSC-IRb junction contains the full-length ycf1 genes, whereas the IRa/LSC junction encodes the fragmented rps19 and trnH genes. and ribosomal RNA genes. The dashed, dark grey area in the inner circle represents the GC content, and the light grey area implies the genome AT content. LSC, large single-copy; SSC, small single-copy; IR, inverted repeat.

Divergence Analysis of cp Sequence and High Variation Region of Cardamine Genera
Genome-wide comparative analyses of the fifteen Cardamine cp genomes were achieved using mVISTA to estimate the level of sequence divergence. The cp genomes displayed strong sequence similarity, indicating that the plastomes are highly conserved (Fig. 3). Compared to the non-coding regions and single copy, the coding regions and IR were more conserved, with low variation among Cardamine.

Synonymous (K S ) and Non-Synonymous (K A ) Substitution Rate Analysis
Synonymous and non-synonymous substitution rates were calculated for 74 protein-coding genes of fifteen Cardamine genera cp genomes. The K A /K S ratio of most of the protein-coding all the genes was less than 1, except for the protein-coding genes: accD ranged from 0 to 1.3235, atpB

Selective Pressure Events in the cp Genome of Cardamine Genera
The selective pressure of thirteen protein-coding genes, such as four NADH-dehydrogenase subunit genes (ndhA, ndhD, ndhG, and ndhI), two subunits of cytochrome (petA and petD), one ribosome small subunit genes (rps16), one subunit of ATP synthase (atpB), and accD, ccsA, cemA, matK, and ycf2 of fifteen species of Cardamine genera were analyzed based on the substitution rate. If the substitution rate is >1.0 of the individual protein-coding genes between two cp genomes or all the genomes, these genes are considered as under positive selection. The ω 2 values of thirteen genes ranged from 1.0-234.47818 in the M2a model (Supplementary Table 3). Furthermore, Bayes empirical Bayes (BEB) analysis was applied to evaluate the location of the consistent selective sites in the thirteen proteincoding genes using M7 vs. M8 model and identified that seven sites under potentially positive selection in the four protein-coding genes (ccsA -2; matK -2; ndhF -2 and petA -1) with posterior probabilities more than 0.95 and 22 sites (ccsA -2; cemA -1; matK -1; ndhA -2; ndhF -2; ndhG -1; ndhI -10; petA -1; petD -1; ycf2 -2) greater than 0.99 and the 2∆LnL value ranged from 0.328019-455.6721 (Table 3). On the other hand, the atpB, ndhD, and rps16 did not encode any positively selected sites in their genes.

Repeat Structure and SSRs Analysis
Repeat sequences were examined in the fifteen Cardamine plastomes. Six hundred and ninety-one repeat sequences containing forward, reverse, complement, and palindromic repeats, were observed among the fifteen Cardamine plastomes. Three hundred and twenty-two forward (46.6%) and 327 palindromic repeats (47.3%) are relatively common among the detected repeats, whereas 21 (3.04%) of each reverse and complement repeats are comparatively rare (Fig. 6a). The complement repeats were absent in the species of C. amaraeformis, C. enneaphyllos, C. hirsuta, and C. parviflora. Similarly, the reverse repeats were absent in the C. occulta cp genome. In addition, both reverse and complement repeats were absent in the C. impatients and C. oligosperma species. In addition, the length of the repeats (>30 bp) was analyzed, and the sizes of the repeats among the fifteen plastomes varied from 30 to 87 bp. Most repeats (472; 70.97%) are limited to 30-39 bp in size (Fig. 6b).

Phylogenetic Analysis
ML and MrBayes analyses were performed separately to determine the phylogenetic position and distance of C. occulta precisely. The five individual data sets of a combined total of 68 protein-coding genes, LSC, SSC, and IR regions and whole-genome of 40 cp genome sequences were used to imply the phylogenetic relationships between the closely related species of Brassicaceae. All five of both ML (Fig. 7; Supplementary Figs. 1-4) and Bayesian analyses (Supplementary Figs. 5-9) yielded similar trees. All the phylogenetic tree analyses showed that the species of Cardamine genera formed a monophyletic group. The topology of the phylogenetic tree showed that C. occulta has a close relationship with the species of C. fallax with a strong bootstrap value (100% for ML and 1.0 for MrBayes) (Fig. 7). Among the Cardamine clade, C. pentaphyllos and C. kitaibelii are the basal groups. The Cardamine clade was divided into two clades; C. bulbifera, C. quinquefolia, C. impatiens, C. glanduligera, C. macrophylla, C. oligosperma, and C. hirsuta formed one clade, and another clade consisted of C. occulata, C. fallax, C. amaraeformis, C. parviflora, C. enneaphyllos, and C. resedifolia with a 78% bootstrap value.

Discussion
The species Cardamine occulta is distributed predominantly in Eastern Asia [9]. This species is quite similar to the European species, C. flexuosa, and is considered a single species [11] because these two species have not shown any morphological differences. Since 2006, these two species have been differentiated based on their ecological habitats [9][10][11]13,14]. Cardamine is a large genus in the Brassicaceae family of flowering plants that contains more than 200 species of annuals and perennials [9]. Thus far, fourteen chloroplast genomes have been sequenced and analyzed. On the other hand, no extensive and comparative studies of Cardamine genera have been carried out. Therefore, the present study sequenced the whole plastid genome of C. occulta using Illumina HiSeq 2500 platform and characterized the controversial species from South Korea. Comparative studies were carried out with fourteen other species of the Cardamine genera. The length of the complete chloroplast genome sequence of C. occulta is 154,796 bp and contains 131 individual genes, which is in the range of other species of Cardamine genera. The GC content of C. occulta is 36.3%, which is similar to all other species of Cardamine genera, suggesting that the distribution of the GC contents in the Cardamine cp genomes are consistent and highly conserved. Although the overall genomic structure, such as the gene order and gene number of the C. occulta, is identical to other Cardamine and Brassicaceae species except for the length of the atpB gene in the C. amaraeformis. The Cardamine plastomes were conserved, and no rearrangement events were found. All the species of Cardamine genera lost the protein-coding gene, initiation factor A (infA), in their cp genomes. Most of the angiosperms were lost independently from multiple angiosperm lineages, including other species within the Brassicaceae. This gene loss might have been due to an interruption of the nuclear-encoded DNA replication, recombination, and repair machinery that controls the cp genome and the evolution of the plant organelle genome [34].
The results of mVISTA analyses revealed high levels of similarity among the plastomes, indicating that the divergence of the C. occulta plastome is lower than that in other species of the Cardamine genera. Furthermore, lower sequence divergence in the IR region was detected compared to SC regions, which has been previously reported [8,[35][36][37]. One conceivable reason is that in the cp genome, which has multiple copies per cell, gene conversion with a slight bias in the contradiction of new mutations would reduce the mutation load in the two IR regions much more competently than in the single-copy regions because of the duplicative characteristics of the IRs [38][39][40][41]. The expansion and contraction of the IR and single-copy convergence regions are considered the leading mechanism in driving the variation in the size of angiosperm plastomes, playing a vital role in their evolution [38,[42][43][44]. The present study did not identify any significant expansion and shrinkage in the IR/SC  regions. Previous studies reported that the size of the whole cp genome does not always vary with the expansion or contraction of IRs [45][46][47][48][49][50].
Comparative studies of the fifteen Cardamine cp genome sequences revealed several regions of sequence polymorphisms. Among these polymorphisms, most of the sequence variations were dispersed in the LSC and SSC regions, whereas the IR regions displayed relatively lower sequence variations. The lower sequence divergence of the IR region compared to the SC regions in Cardamine species and other plants may be due to a copy correction among the IR sequences during gene conversion. Gene mutations and rearrangements in the cp genome are not exhibited constantly throughout the genome sequence. Instead, identifying the hypervariable regions in the chloroplast genome is considered the hotspot region that serves as specific molecular markers [51]. The present study identified 27 protein-coding and 29 intron and intergenic hypervariable regions. Among these, the maximum hypervariable regions, such as the protein-coding genes (>0.050; ccsA, matK, ndhF, rps16, and rpl32) and intron and intergenic regions (>0.150; rpl32-trnL, trnH-psbK, trnG-trnR, trnF-ndhJ, rpl1-rpl36, and rps15-ycf1) could be used as a DNA barcoding and molecular phylogenetic studies in the Cardamine clade (Table 2).
This study analyzed the substitution rate in the fifteen species of the Cardamine genera. C. occulta was used as a reference genome in the present study and com-pared with other cp genomes. Initially, the substitution rates of all the individual protein-coding genes of fifteen species of Cardamine genera were averaged. The results showed that the ratio of the K A /K S rate of all the proteincoding genes was less than 1. In addition, the synonymous and non-synonymous substitution for all the proteincoding genes was analyzed individually (Fig. 5a,b). The results showed that IR regions had low substitution rate than the SC regions. Furthermore, these genes are considered as being under positive selection if the K A /K S (ω) rate ratio is >1.0 of individual protein-coding genes between the two cp genomes or all the genomes. Therefore, this study identified thirteen protein-coding genes of fifteen species of Cardamine genera that were under selective pressure events: accD, atpB, ccsA, cemA, matK, ndhA, ndhD, ndhG, ndhI, petA, petD, rps16, and ycf2. In the selective pressure events, six types of photosynthesis/transcription and translation-related gene groups were categorized: (1) Subunits of ATP synthase (atpB); (2) Chloroplast envelope membrane protein (cemA); (3) Subunits of NADH dehydrogenase (ndhA, ndhD, ndhG, and ndhI); (4) Subunits of cytochrome b/f complex (petA and petD); (5) One small subunit of the ribosomal gene (rps16); (6) Other genes, such as a maturase gene (matK), a subunit of the acetyl-coA gene (accD), cytochrome synthesis gene (ccsA), and unknown function gene (ycf2). This variation from unity was attributed to indel and amino acid substitution events in the protein-coding genes of Cardamine species. Hence, selective analysis of the exons of thirteen protein-coding genes across the publicly available Cardamine chloroplast genomes was performed to understand the selective pressure events using site-specific models with four comparison models and LRT values. The positive selective model, M2a, showed that the ω 2 values of seven genes ranged from 1.0-234.47818 (Supplementary Table 3). Furthermore, BEB analysis showed that seven sites are under potentially positive selection in the four protein-coding genes (ccsA, matK, ndhF, and petA) with posterior probabilities of more than 0.95 and 22 sites greater than 0.99 (Table 3). Nevertheless, no positively selected sites could be determined in the atpB, ndhD, and rps16 genes even though some Cardamine species have higher ω values. Overall, the nucleotide diversity results show that hotspot mutations in the 11 proteincoding genes (accD, atpB, ccsA, cemA, matK, ndhA, ndhD, ndhG, ndhI, petD, and rps16) were acquired at a significantly higher rate than expected under neutrality, suggesting that the hotspot mutations are the result of positive selection. The occurrence of hotspot mutation is strong evidence of positive selection, showing that the substitution of a specific amino acid offers an adaptive benefit under specific conditions [52,53]. Previous studies reported that the highly positive selection genes could play a major role in the plant genetic system or photosynthesis process [54][55][56][57][58]. Moreover, the thirteen genes of the fifteen species of the Cardamine genera have undergone positive selection, which might be the consequence of adaptation to their diverse habitats. Finally, highly variable regions of both coding and non-coding and thirteen protein-coding genes that were discovered to be under positive selection in the fifteen species of the Cardamine genome could be used to produce potential molecular markers for phylogenetic/genomic studies or candidates for DNA barcoding in future studies.
The repeat units were distributed with high frequency and played a substantial role in the chloroplast genome evolution [59][60][61][62]. The repeat types of the fifteen Cardamine plastomes comprised a variable number in their genomes. Liu et al. [60] reported that the variation in number and variety of repeats play a significant role in the plastome structural organization, but there was no correlation between these large repeat regions and rearrangement endpoints. In addition, microsatellite repeats are primarily present in the plastomes, which exhibit a high level of polymorphism and are used as a molecular marker in genetic studies [63,64]. Simple sequence repeats (SSRs) play a major role during genome rearrangement and recombination [65]. The content of different SSRs and their distribution on various chloroplast regions were similar in their Cardamine species. The distribution of SSRs in the Cardamine plastome does not involve any genome rearrangement process. On the other hand, the existence of repeat sequences in the cp genome of Cardamine genera could be helpful for developing lineage-specific markers for genetic diversity and evolutionary studies.
The whole chloroplast genome of the plant offers a significant foundation to resolve the evolutionary, taxonomic, and phylogenetic studies [58,[66][67][68][69][70][71]. The molecular phylogenetic analysis of both ML and Bayesian analyses in the current study showed that the species of Cardamine formed a monophyletic clade. The Cardamine clade is subdivided into two clades, and the species C. occulta is clustered with C. fallax with a substantial bootstrap value. On the other hand, the cytogenetic studies showed that the tetraploidy C. scutata (Diploid species C. amaraeformis and C. parviflora as the parents) and C. kokaiensis (Diploid species C. parviflora as the parent) are the parental for C. occulta [9,11,17]. In contrast, the diploid species C. amaraeformis and C. hirsuta are the parental species for C. flexuosa [9]. Moreover, the C. fallax was implied to be hexaploidy. Based on the DNA sequence data of C. fallax, it was postulated that this species or its diploid progenitors might have influenced the origin of C. occulta. Nevertheless, the cp genomes of C. flexuosa (European species), C. kokaiensis, and C. scutata need to be included to understand the phylogenetic position and their relationship with other Cardamine genera in future studies.

Conclusions
The complete chloroplast genome Cardamine occulta was sequenced, assembled, and analyzed in the present study. Valuable genomic resources were provided for Cardamine genera. Overall, the gene contents and arrangements were similar and highly conserved in the species of the Cardamine genera. Comparative analyses of the chloroplast genomes identified variable regions with potential application as species-specific DNA barcodes. Furthermore, thirteen protein-coding genes have diverged widely and under potentially positive selection, resulting from adaptation to the ecosystem. Finally, phylogenetic analyses of various cp data sets of both ML and Bayesian analyses show that C. occulta species has a closer genetic relationship to C. fallax. In conclusion, this study will facilitate future research, particularly resolving the controversial Cardamine clade. Nevertheless, in future studies, the cp genome of C. flexuosa (European species), C. kokaiensis, and C. scutata needs to be incorporated to understand the phylogenetic position and their relationship with C. occulta.

Data Availability
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at (https://www.ncbi.nlm.nih.gov) under the accession number MZ043777. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA738458, SRR14833115, and SAMN19729360, respectively. Abbreviations cp, chloroplast; LSC, large single-copy; SSC, small single-copy; IR, inverted-repeats; tRNA, transfer RNA; rRNA, ribosomal RNA; K S , synonymous substitution; K A , non-synonymous substitution; ω, non-synonymous vs. synonymous ratio; SSR, simple sequence repeats; LRT, likelihood ratio test; π, nucleotide diversity.

Author Contributions
GR and SJP designed the research study. GR performed the research, analyzed the data, and prepared a manuscript draft and figures. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript.

Ethics Approval and Consent to Participate
Not applicable.