Assembly and Characterization of the Mitochondrial Genome of Fallopia aubertii (L. Henry) Holub

Background : Fallopia aubertii (L. Henry) Holub is a perennial semi-shrub with both ornamental and medicinal value. The mitochondrial genomes of plants contain valuable genetic traits that can be utilized for the exploitation of genetic resources. The parsing of F. aubertii mitochondrial genome can provide insight into the role of mitochondria in plant growth and development, metabolism regulation, evolution, and response to environmental stress. Methods : In this study, we sequenced the mitochondrial genome of F. aubertii using the Illumina NovaSeq 6000 platform and Nanopore platform. We conducted a comprehensive analysis of the mitochondrial genome of F. aubertii , which involved examining various aspects such as gene composition, repetitive sequences, RNA editing sites, phylogeny, and organelle genome homology. To achieve this, we employed several bioinformatics methods including sequence alignment analysis, repetitive sequence analysis, phylogeny analysis, and more. Results : The mitochondrial genome of F. aubertii has 64 genes, including 34 protein-coding genes (PCGs), three rRNAs, and 27 tRNAs. There were 77 short tandem repeat sequences detected in the mitochondrial genome, five tandem repeat sequences identified by Tandem Repeats Finder (TRF), and 50 scattered repeat sequences observed, including 22 forward repeat sequences and 28 palindrome repeat sequences. A total of 367 RNA coding sites were predicted in PCGs, with the highest number (33) found within ccmB . Ka/Ks values estimated for mitochondrial genes of F. aubertii and three closely related species representing Caryophyllales were less than 1 for most of the genes. The maximum likelihood evolutionary tree showed that F. aubertii and Nepenthes × ventrata are most closely related. Conclusions : In this study, we obtained basic information on the mitochondrial genome of F. aubertii and this study investigated repeat sequences and homologous segments, predicted RNA editing sites, and utilized the Ka/Ks ratio to estimate the selection pressure on mitochondrial genes of F. aubertii . We also discussed the systematic evolutionary position of F. aubertii based on mitochondrial genome sequences. Our study revealed variations in the sequence and structure of mitochondrial genomes in Caryophyllales. These findings are of great significance for identifying and improving valuable plant traits and serve as a reference for future molecular studies of F. aubertii.


Introduction
The mitochondrial genome, which is circular or linear DNA [1,2], usually contains dozens of genes that encode proteins and RNA molecules involved in the regulation of mitochondrial function and morphology.The primary function of mitochondria is the production of adenosine triphosphate (ATP), which provides energy to cells [3].Mitochondrial DNA plays a crucial role in metabolic processes like respiration and photosynthesis, which have a direct or indirect impact on the growth and development of organisms [4].However, mutations, deletions, or insertions in mitochondrial genes can alter this effect [5][6][7].The parsing of higher plant mitochondrial genes can provide insight into the role of mitochondria in plant growth and development, metabolism regulation, and response to environmental stress [8,9].Similarly, the mitochondrial genome contains valuable information that can be utilized for the devel-opment of molecular markers, genetic engineering, and elucidation of the phylogenetic and evolutionary connections among plant species [10,11].
Fallopia aubertii (L.Henry) Holub is a perennial semi-shrub or deciduous vine plant that is widely distributed in Asia, Europe, and North America [12][13][14].It is a heliophilous plant with strong adaptability but is intolerant of shade or water-logging [15], and is commonly found on slopes, riverbanks, forest edges, and beaches.Regarding population ecology, F. aubertii is a relatively dispersed species that is malleable and adaptable to various environments.In recent years, researchers have studied the growth and development, genetic variation, chemical constituents, and pharmacodynamic effects of F. aubertii [14,16,17].Our research team has isolated multiple compounds from F. aubertii, studying its anti-gout efficacy, and sequenced and annotated its chloroplast genome [18][19][20].In this study, we utilize high-throughput sequencing technology and bioinformatics methods to analyze the mitochondrial genome features of F. aubertii.In addition, we reconstruct the phylogenetic relationships among F. aubertii and its closest relatives by using the mitochondrial genome sequence.The current study provides a scientific foundation and a point of reference for the exploitation and development of the species' resources, as well as the preservation of its biological diversity.

Sample Collection and DNA Sequencing
The sample of F. aubertii was collected from the campus of Qinghai Minzu University in Qinghai, China (36.59 • N, 101.82 • E) and identified by Professor Yong-chang Lu.
The fresh young leaves were dried in silica gel and stored at -20 °C.Genomic DNA was extracted from the whole sample of F. aubertii using the modified CTAB method and was subsequently evaluated for purity and integrity using agarose gel electrophoresis and a NanoDrop 2000c spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).The classified DNA samples were sent to Shanghai Origingene Biotechnology Pharmaceutical Technology Co., Ltd., where the genome DNA was sequenced using the Illumina NovaSeq 6000 platform (BIOZERON Co., Ltd., Shanghai, China) and Nanopore platform (Oxford Nanopore Technologies, Oxford, UK).

Assembly and Annotation of Mitochondrial Genome
We used GetOrganelle v1.7.1 (Max Planck Institute of Molecular Plant Physiology, Munich, Germany) [21] to  perform de novo assembly of the mitochondrial genome.The mitochondrial genome of Bougainvillea spectabilis (GenBank Accession Number: MW167296), a closely related species, was used as the reference sequence.To select the potential mitochondrial reads, we used BLAST searches against the mitogenome of B. spectabilis and the GetOrganelle results from a pool of Illumina reads.The mitochondrial Illumina reads were assembled into ∼50 mitogenome contigs using SPAdes v3.14.1 (Algorithmic Biology Lab, St. Petersburg Academic University of the Russian Academy of Sciences, St Petersburg, Russia) [22].Nanopore reads were aligned against the GetOrganelle and SPAdes assembled scaffolds using BWA v0.7.1 (Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK) [23].The aligned Nanopore reads were extracted to perform de novo assembly of the mitochondrial genome using Canu v2.0, and then used Pilon v1.23 (https://github.com/broadinstitute/pilon/;Broad Institute of MIT and Harvard, Cambridge, MA, USA) [24] for error correction.The genes were predicted based on the method of homology alignment prediction.The coding genes, tRNAs, rRNAs, and possible pseudogenes were annotated by Blast+ 2.7.1 and tRNAscan-SE v2.0.7 (Todd Lowe Lab, Dept. of Biomolecular Engineering, School of Engineering University of California, Santa Cruz, CA, USA) [25].tRNAscan-SE was also used to predict the secondary structure of tR-NAs in F. aubertii.The mitochondrial genome map was drawn using OGDRAW (https://chlorobox.mpimpgolm.mpg.de/OGDraw.html)[26].

Analysis of RNA Editing
To predict the RNA-editing sites within the mitochondrial genes of F. aubertii the online PmtREP (http://www.genepioneer.com/)was used, with a threshold value of 0.2 for the mitochondrial sequences.

Analysis of GC Content and GC Skew
We applied the cloud platform (http://www.genepioneer.com/) to analyze the GC content of the F. aubertii mitochondrial coding sequence (CDS).To visualize the GC skew, we uploaded the CDS of F. aubertii to the online Proksee [30] tool and selected the "GC Content" and "GC Skew" tabs.
tetragonoides (MW971440.1))was calculated.The homologous protein-coding sequences were obtained by comparing protein-coding sequences among mitochondrial genomes of F. aubertii and its closely related species to find the best match using BLAST [31].Mafft v7.427 (Research Institute for Microbial Diseases, Suita, Osaka, Japan) [32] (https://mafft.cbrc.jp/alignment/server/)was used for homologous protein sequence alignment, whereas a Perl script was utilized to map the aligned protein sequences to the coding sequence to obtain an aligned

Phylogenetic Analysis
A phylogenetic tree was constructed using the mitochondrial genome sequences of F. aubertii, Malania oleifera (Olacaceae, Santalales), and 15 other species from the order Caryophyllales, with M. oleifera used as an outgroup.We extracted the shared coding sequences from the 17 genomes mentioned above and used BLAST to identify homologous sequences.The resulting sequences were aligned using Mafft, and then concatenated.A trimming percentage of 0.7 was set using the trim function to remove any poorly aligned bases.The maximum likelihood phylogenetic tree was constructed, and the substitution model GTR+F+R3 was selected for the analysis using MEGA 7 [34].The phylogenetic tree's step values were determined for each branch by performing 1000 spontaneous replicate analyses.

Analysis of Homologous Fragments of Chloroplast-Mitochondrial Genomes
A BLAST search was conducted between the mitochondrial and chloroplast genomes of F. aubertii to identify homologous regions.The resulting homologous segments were then compared and analyzed for sequence similarity, number of protein-coding genes, length, and composition of intergenic regions.

Basic Features of the Mitochondrial Genome
The second-generation sequencing platform yielded a data set of 4645.7 Mb of raw reads and 4432 Mb of clean reads, whereas the third-generation sequencing platform generated 21,993 reads with a total size of 67 Mb.
The genome (NCBI accession number: MW664926.1)obtained in this study is a circular DNA molecule with a length of 350,156 bp (Fig. 1), and a GC content of 44.73%.The base composition of the mitochondrial genome is asymmetric, and the coding and non-coding regions' base composition, size, and proportion  are shown in Table 1.The GC content in tRNA and rRNA is relatively high, accounting for 0.59% and 1.47% of the mitochondrial genome sequence, respectively.The F. aubertii mitochondrial genome contains 64 genes, including 34 PCGs, three rRNAs, and 27 tRNAs (Table 2).Out of the 34 protein-coding genes, only five  (ccmFC, rps3, nad4, nad5, and nad7) contain introns and were categorized as cis-splice genes (Fig. 2).We examined 27 tRNA genes and determined that they can transport 17 standard amino acids.The length of tRNA ranged from 71 bp to 88 bp, with a length of 1816 bp in total.Among these genes, multicopy genes were found for trnE-UUC, trnH-GUG, and trnM-CAU.Additionally, we predicted the secondary structures of tRNAs (Fig. 3).All genes except for tRNA-Leu, tRNA-Ser, and tRNA-Tyr were predicted to have typical cloverleaf structures, and the base mismatches were mostly G-U.The mitochondrial genome contains three different tRNAs, each with different anticodons that specifically transport serine (Ser).These results are of great significance for further study of the function and stability of tRNA.

Mitochondrial Repetitive Sequences Analysis
Repetitive sequences can serve as genome-specific genetic markers for phylogenetic relationships among species.STRs show a high degree of polymorphism and are com-monly used as molecular markers for genetic diversity studies and germplasm characteristics, identification, and selection.In the analysis of STR in the mitochondrial genome of F. aubertii, 77 STR loci were detected.There are 28 single nucleotide motifs, 16 dinucleotide motifs, seven trinucleotide motifs, 22 tetranucleotide motifs, and four pentanucleotide motifs.No hexanucleotide motifs were detected (Table 3).Additionally, five long tandem repeat sequences were detected by TRF software in the mitochondrial genome of F. aubertii (Table 4).
Scattered repetitive sequences are crucial in the study of gene mutation, genome origin and evolution, and species formation [35].In this study, a total of 50 scattered repeats were identified in the mitochondrial genome of F. aubertii, consisting of 22 forward repeats and 28 palindromic repeats.No reverse repeats, or complementary repeats were detected.The longest identified repeat sequence was 5272 bp, whereas most of the repeats were 50-200 bp in length (Fig. 4). 14.17

Prediction of RNA Editing Sites
The prediction of RNA editing sites in the mitochondrial genome coding genes of F. aubertii resulted in a total of 367 sites in 30 genes (Fig. 5).Among them, ccmB and nad4 contained the majority of RNA editing sites, with ccmB containing most of them (33 sites), accounting for 8.99% of the total RNA editing sites.The atp8 had the fewest RNA editing sites, accounting for only 0.27% of the total number of sites.The results showed that 7.9% of hydrophobic amino acids were converted to hydrophilic amino acids, 49.86% of hydrophilic amino acids were converted into hydrophobic amino acids, and 42.24% of hydrophobic amino acids remained unchanged (Table 5).RNA editing sites were C to T conversion, and the first base of the codon accounted for 32.0% of the editing sites, whereas the second base accounted for 68.0%.Some RNA editing sites may result in the formation of stop codons, but we did not identify such phenomena in the mitochondrial genome of F. aubertii.Furthermore, the proportion of amino acid conversion to leucine after RNA editing was the largest, accounting for 46.59%.

Mitochondrial Genome Size and GC Content of F. aubertii Compared with Other Species
The CDS of F. aubertii were analyzed to determine the GC content of each gene and calculate the average GC content (Table 6).The nad5 gene was the longest, possibly due to its introns.The GC content varied across PCGs, with matR gene having the highest GC content (51.5%) and the GC content of the nad4L being the lowest (34.65%).The AT skew average value was negative, and 13 genes had positive AT skew.Visual analysis of GC skew showed that 22 genes with positive GC skew had higher G-base content than C-base content, and 12 genes with negative GC skew and high C-base content (Fig. 6).Caryophyllales plants, including F. aubertii, had a mitochondrial genome size range of 247 Kb-509 Kb, with an average GC content of about 44% (Table 7).

Ka, Ks Analysis
Comparing the Ka and Ks in homologous genes during evolution is essential for the study of gene functions and evolutionary relationships across species, as well as for exploring issues such as adaptation and genetic diversity of plant communities.The Ka/Ks ratios of 34 PCGs in the mi-  and T. tetragonoides were analyzed (Fig. 7).Among these 34 genes, the rsp16 gene had a zero Ka/Ks value.Overall, four of the 34 genes had Ka/Ks values greater than one, indicating that these four genes were under positive selective pressure during evolution, which included nad4.rps13, rps1 and ccmFN.The remaining genes had values of Ka/Ks less than 1, which implies purifying selection and relative conservation.

Phylogenetic Analysis
To better understand the phylogenetic position of F. aubertii, 16 plant mitogenomes from the NCBI database were downloaded (Table 7).The phylogenetic tree showed that most branch nodes had high support values above 99%, and species from the same family were grouped together, indicating high result reliability (Fig. 8).The phylogenetic tree also strongly demonstrated that F. aubertii and the N. ventrata clustered into one clade with a 92% bootstrap value (Fig. 7).Caryophyllales plants were further divided into two subgroups [36,37].The F. aubertii and the N. ventrata were non-core taxa of Caryophyllales, and other species belonged to the core group of Caryophyllales.

Discussion
Analyzing mitochondrial genes can lead to the optimization of plant growth characteristics at the genetic level, thus increasing yield and enhancing plant adaptation to the environment, which is essential for agricultural production and food safety [38][39][40][41].The study of genome size and GC content is crucial for understanding plant evolution and adaptability, as they greatly influence traits such as morphology, physiology, and biochemistry of plants [42].The size of angiosperm mitochondrial genomes is usually 200-800 Kb [43], and the entire length of F. aubertii mitochondrial genome is 350,156 bp with a GC content of 44.73%, which is similar to other Caryophyllales plants [44,45].Research on genome size and GC content can provide important insights into plants' genetic characteristics and biological functions [46] and form the scientific basis for applications such as plant breeding and biotechnology [47,48].
Repeat sequences are essential for gene expression and regulation [49].They are the main driving force for gene diversification and evolution [50]-the present study identified STRs, long tandem repeats, and scattered repeats.A total of 77 STR loci, five long tandem repeats, and 50 scattered repeat sequences were detected by analyzing the F. aubertii mitochondrial genome repeats.Repetitive sequences play an important role in the processes of insertion, deletion, and other rearrangements that may occur within the mitochondrial genome [51].Mitochondrial repetitive sequences have been widely used in plant genetic improvement and molecular marker-assisted breeding [52,53].Analysis of the mitochondrial repeat sequences of F. aubertii has helped us to gain insights into the organelle sequences, genetic diversity, and other relevant features of the plant, which are essential for species conservation and germplasm identification [54].
RNA editing is a type of genetic variation event in RNA post-transcriptional modification, which is crucial for maintaining the stability of gene expression [55,56].Compared with DNA editing, RNA editing can respond more quickly to external stimuli or environmental changes, thereby adapting to the needs of survivors [57].The diversity and complexity of RNA function, as well as its impact on gene expression and regulation, can be better understood by exploring the location and type of RNA editing, leading to further study into RNA function and regulatory mechanisms [58].In this study, the RNA editing sites in the mitochondrial genome of F. aubertii were mainly C to T conversions, with the second base of the codon being the most frequently changed.The proportion of amino acid conversion to leucine was the largest, accounting for 46.59%.Leucine is the main amino acid produced after editing, similar to RNA editing results in higher angiosperms [59].
To understand the various selection pressures that took place during the evolution of the gene and to determine whether a gene has undergone rapid evolution, Ka/Ks analysis can be used [60].In addition to predicting protein structure and function, Ka/Ks analysis can be used to research the gene families' functional diversity and evolutionary pathways [61].When the value of Ka/Ks >1, it indicates a positive selection pressure during the gene's evolution.Conversely, Ka/Ks <1 denotes the negative selection.Most of the genes in F. aubertii are conserved.The four genes rps1, nad4, rps13, and ccmFN all have Ka/Ks >1, which is significant for understanding the positive selection pressure these four genes experienced during evolution.We referred to the literature on the mitochondrial genome of order Caryophyllales [62] and found that the phenomenon of four genes with Ka/Ks >1 is unique to F. aubertii.Nevertheless, there are less studies concerning the selective pressure on the mitochondrial genome of Caryophyllales and our findings need to be confirmed by further studies.The positive selection of the above genes would contribute to a faster rate of evolution in organisms and would also increase the adaptive divergence of mitochondrial genes [63,64].
In this study, a maximum likelihood phylogenetic tree was constructed based on the shared mitochondrial CDSs.The F. aubertii and the N. ventrata formed a sister clade indicating a closer relationship between them, and agreed with previous research [62,65].In the evolution of the chloroplast genome system in Caryophyllales, Polygonaceae plants are also closely related to Nepenthaceae plants [66][67][68].Our study provides a scientific basis for further studies on the phylogeny of the Caryophyllales.
There is a horizontal transfer of genes between organelle genomes [69].We made a comparison of the mitochondrial and chloroplast genomes of F. aubertii.Some genes were found to be similar, and these genes were highly homologous.Between the chloroplast and mitochondrial genomes of F. aubertii, there were 58 homologous segments and 14 homologous genes, including 11 tRNAs, two ribosomal protein genes, and a gene for cytochrome c biogenesis (ccmC).In angiosperms, tRNA genes are frequently transferred from the chloroplast genome to the mitochondrial genome [70,71].The homology of chloroplast and mitochondrial genomes is an important research direction in biology and evolution.By studying the functional activity of migrating genes, we have gained a greater understanding of species evolution [72].

Conclusions
In this study, we provided a comprehensive analysis of the mitochondrial genome of F. aubertii.The genome was found to have a circular structure with a length of 350,156 bp and a GC content of 44.73%.The genome contains 64 genes, including 34 protein-coding genes, three rRNAs, and 27 tRNAs.Analyses of repeat sequences revealed the presence of short tandem repeats, long tandem repeats, and scattered repeats in the mitochondrial genome of F. aubertii.RNA editing sites were predicted, and the majority of editing events involved the conversion of C to T, with leucine being the most frequently converted amino acid.Analysis of the Ka/Ks ratios indicated that most mitochondrial genes in F. aubertii have undergone negative selection, whereas four genes (rps1, nad4, rps13, and ccmFN) showed evidence of positive selection.Phylogenetic analysis revealed that F. aubertii and Nepenthes × ventrata are closely related, supporting previous studies on the evolutionary relationships within the order Caryophyllales.Additionally, a comparison of the mitochondrial and chloroplast genomes showed the presence of homologous regions and genes, suggesting potential gene transfer events between these organellar genomes.
This study provides valuable information on the mitochondrial genome of F. aubertii and serves as a reference for future molecular studies and the exploitation of genetic resources in this species.The findings contribute to the identification and improvement of valuable plant traits and highlight the significance of mitochondrial genomes in plant evolution and adaptation.

Fig. 1 .
Fig. 1.A circular map of the F. aubertii mt genome.The counterclockwise genes are located on the outer side of the loop, whereas the clockwise genes are located on the inner side of the circle.The diagram shows the names of the contained genes.Colors are used to represent the different functional groups of genes.The grey circles inside represent GC content.The asterisks (*) represent introncontaining genes.

Fig. 2 .
Fig. 2. Schematic map of the cis-splicing genes.Black blocks represent exons, and white blocks represent introns.The numbers beneath the protein-coding genes indicate the locations of introns and exons.One intron is present in both ccmFC and rps3.Three introns are in nad4 and four in nad5 and nad7.

Fig. 3 .
Fig. 3. tRNA secondary structure prediction.Twenty-four tRNA secondary structures were identified in the mitochondrial genome of F. aubertii.The top left of the structure is the amino acid abbreviation letter.Base mismatches exist, usually U-C base mismatches.The mismatched bases are marked in red boxes.

Fig. 4 .
Fig. 4. Scattered repeats of F. aubertii mt genome.Different types of repeats are represented by different colors, blue represents forward repeats and orange represents palindromic repeats.The height of the colored bar represents the number of sequences.No scattered repeats in the length range of 2000-2999 bp were found.

Fig. 5 .
Fig. 5. Distribution of RNA editing sites in F. aubertii mt protein-coding genes (PCGs).RNA editing sites in 30 coding genes in F. aubertii were predicted.There are no RNA editing sites in cox1, ccmFC, ccmFN, or atp9.The horizontal axis is the protein-coding gene, and the number on the top of the color bar refers to the number of predicted RNA editing sites.

Fig. 6 .
Fig. 6.GC skew of F. aubertii mt CDS.The outermost circle represents the coding area of a gene, the innermost circle represents the variation in GC content across the genome, and the innermost circle represents the GC skew, where green represents skew+ and purple represents skew-.

Fig. 7 .
Fig. 7. Ka/Ks values of the PCGs.The mitochondrial protein-encoding genes of F. aubertii were compared with N. ventrata (MH798871.1),A. githago (MW553037.1),and T. tetragonoides (MW971440.1),respectively, for Ka/Ks analysis.The three closely related species contain protein-coding genes different from those in F. aubertii, so the Ka/Ks values of some genes in the figure are 0.

Fig. 8 .
Fig. 8.A phylogenetic tree of 17 species.Reconstruction of phylogenetic relationships was based on the shared CDS using the maximum likelihood model with M. oleifera (Olacaceae) as an outgroup.F. aubertii is indicated using the star icon.The plants grouped together are marked according to the family.