Abstract

Background:

Metabarcoding of environmental DNA (eDNA), a technique using high-throughput sequencing, has transformed biodiversity monitoring by identifying organisms from DNA fragments present in the environment. This method, particularly useful for aquatic ecosystems, allows for non-invasive species monitoring, helping to provide insight into ecosystem composition and taxonomic diversity. The objective of this study was to assess the efficacy of eDNA metabarcoding for fish species identification in a model community from the northeast Pacific Ocean using 12S ribosomal RNA (12S rRNA) marker.

Methods:

Water samples were collected from the tank of the Primorsky Aquarium, which contains fish species from the Sea of Japan, Sea of Okhotsk, and Bering Sea. DNA was extracted using syringe filters and enriched with polymerase chain reaction (PCR) of mitochondrial 12S rRNA fragment, followed by sequencing on Illumina platform. The resulting reads were processed using the bayesian generalized uncertainty modeling (BEGUM) pipeline and their taxonomic diversity was assessed by basic local alignment search tool (BLAST) search. Using in silico PCR, we also assessed the possible association of detection failures of some species with the presence of primer-to-target sequence mismatches.

Results:

From a fish community of only 20 species in the tank, we identified 56 operational taxonomic units (OTUs) corresponding to 28 genera. Among these OTUs, 20 species were unambiguously classified by BLAST-based analysis, though only 9 of them corresponded to the species actually present in the tank. Significant problems included inconsistent reference data and marker biases that affected the accuracy of species identification. In addition to DNA contamination from feed, contamination from the water source may have introduced extraneous DNA into the samples. Also, using in silico PCR analysis with a small number of available reference sequences, we demonstrated a significantly higher number of primer mismatches for species that were not identified.

Conclusions:

This study highlights the relative efficacy of eDNA metabarcoding for fish species identification, but also highlights the need to improve reference databases and minimise contamination, searching for references and primers to improve accuracy. Further research should focus on optimising marker selection and controlling methodological bias to ensure robust biodiversity estimates.

1. Introduction

Metabarcoding, leveraging high-throughput sequencing, has revolutionized biodiversity assessment by allowing the identification of organisms through their DNA sequences. This technique, when paired with environmental DNA (eDNA) analysis, captures and enriches specific regions of extracellular DNA molecules released into the environment by various organisms. eDNA analysis is especially valuable for non-invasive monitoring of biodiversity, offering a comprehensive snapshot of ecosystem composition without the need for direct specimen collection [1, 2, 3]. This method not only allows researchers to trace organisms through environmental samples, but also enhances our understanding into taxonomic diversity [4, 5], estimating fish biomass [6, 7, 8, 9], and evaluating genetic diversity via amplified sequence variants [10, 11, 12, 13].

In order to validate these methods mock community analysis has been used where bunch of specimens with known taxonomic composition are treated and analyzed as a proxy of natural community [14]. In aquatic environments, mock community analyses using eDNA metabarcoding have commonly employed the species composition of fish housed in aquariums [15, 16]. As a result, it was found that eDNA can effectively detect the majority of bony fish species present in a controlled mesocosm environment, but facing challenges in detecting cartilaginous fishes and sea turtles, due to biases in primer specificity [15] and may be highly influenced by marker selection [16]. Furthermore, analysis of DNA from known fish egg ratios has been explored to evaluate metabarcoding efficacy, revealing that the accuracy of species identification is significantly dependent on the quality of the reference database and the specific molecular markers used [17]. Similar outcomes were highlighted in the research on neotropical fish mock communities which are influenced by factors such as DNA concentration, community composition, and marker choice, thus emphasizing the importance of using multiple markers and conducting controlled experiments for accurate biodiversity assessments [18]. This overall still makes it crucial to assess potential limitations of the approach and generate the necessary methodological recommendations before launching the analysis of natural samples of the local fauna by the modern technique of DNA metabarcoding.

The northwestern Pacific, including the Sea of Japan, Sea of Okhotsk, and the Bering Sea, maintains significant fish diversity, with over 1300 species inhabiting these waters [19, 20]. The unified methods, including the construction of a reference library of DNA fragments, facilitate accurate species identification [21, 22, 23], providing a robust framework for ongoing research and conservation efforts. Recent systematic revisions, aided by molecular genetic approaches, have further refined the understanding of this diverse marine life [24, 25, 26, 27, 28, 29]. In this study, we present findings on the evaluation of marine fish composition in the northwestern Pacific area, specifically in the mock communities settled in Primorsky Aquarium (Russia, Vladivostok), using eDNA metabarcoding techniques based on 12S ribosomal RNA (12S rRNA) mitochondrial marker. This study also aimed a comparison between the in silico polymerase chain reaction (PCR) approach and traditional eDNA metabarcoding, focusing on the use of a standard set of MiFish 12S rRNA primers [30] to simulate PCR amplification of available reference sequences from databases. By simulating PCR amplification in silico, we aimed to evaluate how well the selected primers perform in amplifying DNA sequences from various species in a controlled environment.

2. Materials and Methods
2.1 eDNA Metabarcoding Analysis of the Mock Community

Water from a tank (volume 1.68 m3) at the Primorsky Aquarium (Russky Island, Vladivostok, Primorsky Krai, Russia) was taken on December 12, 2019. Water comes to the tank directly from the bay after a small amount of purification. This tank was holding 20 species spanning 11 families and 19 genera of the coast of the Far Eastern seas area (Sea of Japan, Sea of Okhotsk, and the Bering Sea). This selection includes species from both benthic and pelagic habitats, with varying ecological roles. The water temperature was maintained in the range of 10–13 °C according to the sea temperature at that time of year at that location. Taxonomic information and the number of fishes held in the tank is provided in the Table 1. Pseudopleuronectes sp. among them could not be identified to species level due to small size. Fishes in this tank are fed with minced fillet of pink (Oncorhynchus gorbuscha (Walbaum, 1792)) and chum salmons (O. keta (Walbaum, 1792)), greenlings (Hexagrammos sp. Tilesius, 1810), chub mackerel (Scomber japonicus Linnaeus, 1758), squids (Todarodes pacificus (Steenstrup, 1880), Berryteuthis magister (Berry, 1913)), and shrimp (Pandalus latirostris Rathbun, 1902). Three replicate samples of water (about 450 mL each) were collected from the tank with a syringe (150 mL capacity, luer lock type). The full volume of water from each replicate was filtered through a single syringe filter (25 mm diameter, 0.45 µm pore size, PES material).

Table 1. Fish species composition in the tank of the Primorsky Aquarium and their detection status.
No. Species Name Family No. of Individuals Detected by eDNA Metabarcoding No. of Reads Detected by In Silico PCR
1 Agonomalus proboscidalis Agonidae 1 No 0 No
2 Alcichthys elongatus Cottidae 3 No 0 No
3 Bathymaster derjugini Bathymasteridae 1 Yes 163 No
4 Bero elegans Cottidae 3 No 0 No
5 Blepsias cirrhosus Agonidae 1 Yes 79 Yes
6 Brachyopsis segaliensis Agonidae 2 No 0 Yes
7 Chirolophis japonicus Stichaeidae 3 No 0 Yes
8 Chirolophis saitone Stichaeidae 2 No 0 No
9 Gymnocanthus herzensteini Cottidae 5 Yes 32 Yes
10 Gymnogobius heptacanthus Gobiidae 5 Yes 59 Yes
11 Hypomesus japonicus Osmeridae 3 Yes 3741 Yes
12 Hypsagonus jordani Agonidae 2 Yes 1932 No
13 Hypoptychus dybowskii Hypoptychidae 5 Yes 78 Yes
14 Lumpenus sagitta Lumpenidae 2 No 0 Yes
15 Opisthocentrus ocellatus Opisthocentridae 3 Yes 622 Yes
16 Pholidapus dybowskii Opisthocentridae 10 No 0 Yes
17 Podothecus gilberti Agonidae 3 No 0 No*
18 Pseudopleuronectes sp. Pleuronectidae 2 No 0 Yes
19 Pungitius pungitius Gasterosteidae 6 Yes 317 Yes
20 Stichaeopsis epallax Stichaeidae 5 No 0 No

*, not present in GenBank. eDNA, environmental DNA.

To preserve the DNA on the filter, 1 mL of Longmire’s buffer [31] was passed through it, after which the inlet and outlet ports were sealed with combi-stopper plugs. The fixed filter was then stored at –20 °C. DNA was isolated from the preserved filter using the M-Sorb-OOM kit (Sintol, Moscow, Russia). The manufacturer’s protocol was modified such that the lysis buffer was warmed to 65 °C, pushed in the reverse direction of filtration and poured into a clean test tube (backflushing method, after [32]). The isolated DNA was stored at –20 °C. A mitochondrial fragment —12S rRNA with a length of ~170 bp [30]—was used to retrieve the information on the taxonomic diversity from the eDNA samples. A primer pair, each carrying a unique 7-nucleotide tag (doubly-tagged primers) developed in ecotag (Alpine Ecology Laboratory, Grenoble, France) [33] was used to amplify the fragment for each sample. Negative PCR controls were also performed using separate pairs of tagged primers. The PCR for each sample was performed using three replicates. The reaction mixture consisted of 10 µL of AmpliTaq Gold 360 Master Mix (Applied Biosystems, Foster City, CA, USA), 0.5 µL of each forward and reverse primer (10 µM), 0.16 µL of bovine serum albumin, 10 ng of DNA, and deionized water to reach a final volume of 20 µL. The thermal cycling protocols are provided in Table 2.

Table 2. Thermal cycling protocol for 12S ribosomal RNA (12S rRNA) marker amplification.
Preheating Cycles Denaturation Annealing Elongation Final elongation
95 °C for 3 min. 35 98 °C for 20 sec. 60 °C for 15 sec. 72 °C for 15 sec. 72 °C for 5 min.

The amplification products were assessed by running the fragments on a 1% agarose gel, staining with ethidium bromide, and visualizing them under ultraviolet light. The amplicons were purified using the Cleanup S-Cap kit (Evrogen, Moscow, Russia) and normalized (refer to [34]) prior to pooling. The amount of control reactions was taken as the average from the resulting volume of the normalized samples before pooling. The normalized amplicons were then combined with PCR controls and sequenced at Novogene (Tianjin, China). The library was prepared with the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) and sequenced on an Illumina NovaSeq 6000 platform employing a 250 bp paired-end sequencing approach.

The obtained reads were processed according to the bayesian generalized uncertainty modeling (BEGUM) metabarcoding pipeline (open-source software) [35, 36]. After adapter removal and preliminary evaluation of the read quality with fastqc (Babraham Institute, Cambridge, UK), possible read errors were corrected in Spades (Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia) [37]. Subsequently, paired-end reads were assembled into consensus sequences using PandaSeq (Department of Biology, University of Waterloo, Waterloo, Ontario, Canada) [38]. We then used the BEGUM program to demultiplex and filter the reads for each sample. Clustering was done by sumaclust (Institut de Biologie Computationnelle, Montpellier, France) [39] with parameters -t 0.98 and -R 0.85. The taxonomic assignment for the generated operational taxonomic units (OTUs) was performed using basic local alignment search tool (BLAST) (National Center for Biotechnology Information, Bethesda, MD, USA) [40] implemented in Ubuntu command line interface. The taxonomic information was then summarized in a single table based on the output from the Metagenome Analyzer (MEGAN) community edition program (University of Tübingen, Tübingen, Germany) [41]. The top percent of lowest common ancestor (LCA) parameters was set to 2.0 implementing the naïve LCA algorithm with 80 percent to cover. The number of OTUs after taxonomic referencing was corrected using the LULU package (Center for GeoGenetics, University of Copenhagen, Copenhagen, Denmark) [42] with the following parameters: minimum match –95%, minimum relative co-occurrence –0.97.

2.2 In silico PCR Based on 12S rRNA Sequences

In addition, an in silico approach was used to test the assumption of the presence of more mismatches in the sequences of those species that were not identified by eDNA metabarcoding. For this purpose, 12S rRNA sequences available for the taxa kept in the tank were retrieved from GenBank (Supplementary Table 1). Additionally, 5 sequences of Oncorhynchus gorbuscha and 24 sequences of O. Keta were included in the analysis to assess the potential impact of feed-derived eDNA on our results. For Pseudopleuronectes sp. the sequences of all species of this genus that may occur in the far eastern seas area were searched. For this purpose, a search based on the query was performed. Next, in silico PCR with MiFish primers [30] was performed using the retrieved sequences in the ecoPCR tool of the OBITools package (Laboratoire d’Écologie Alpine, Grenoble, France) [33, 43, 44]. PCR parameters allowed up to 3 mismatches relative to the target sequence per primer. Melting temperature (Tm) was estimated based on [45]. Salt concentration used for estimating Tm was set 0.05. Based on the data obtained, aggregation by species was performed so that each species corresponded to one observation (Supplementary Table 2). To test the hypothesis whether there is a statistically significant difference between detected and undetected species in the number of primers mismatches detected, Welch’s t-test [46] was performed in the stats R package (R Foundation for Statistical Computing, Vienna, Austria) [47]. Sequences that were not amplified in silico were not included in the analysis. Visualisation of the analysis was performed in the ggplot2 R package (R Foundation for Statistical Computing, Vienna, Austria) [48]. All figures were labeled and finalized using Inkscape (Inkscape Project Inc., Silver Spring, MD, USA, https://inkscape.org/).

3. Results and Discussion

After demultiplexing, the 12S rRNA fragment had 42,260 reads, 2827 of which were unique. The control identified 60 reads. After filtering the reads and deleting the chimeric sequences, there were 38,441 sequences for 12S rRNA. Sequences clustered into 56 OTUs, four of which were assigned to bacteria with 57 reads in total. The remaining 52 OTUs were related to ray-finned fishes, classified into seven orders, 15 families and 28 genera (Fig. 1). Species-level taxonomy was unambiguously assigned to 20 OTUs (39%). There were 15 OTUs (26.8%) with taxonomic contradictions at the species level; at the genus level, there were nine OTUs (16%), and one OTU at the family level. The conflict in this case is defined as equal (or undetectable) probability of assigning the same OTU to different taxa according to BLAST results.

Fig. 1.

Bubble graph showing the number of reads for different operational taxonomic units (OTUs) obtained for 12S rRNA, amplified based on the water eDNA samples from a fish tank at the Primorsky Aquarium. The taxonomic orders are presented horizontally and the OTUs information are placed vertically. The size of circles demonstrates the number of reads assigned to the different OTUs. OTU names in bold have no less than 20 reads. Each underlined OTU accounts for more than 1000 reads.

This may indicate either erroneous taxonomic identification of one or several sequences that make up the reference library, or that the marker is highly conserved hence divergence is not sufficient enough to distinguish two or more species. These and other limitations of the approach have been mentioned earlier when estimating fish diversity [2, 3, 49, 50]. On the other hand, despite the data correction with the LULU R package (version 0.1.0), some species names, specifically Hypomesus japonicus, Oncorhynchus keta, and Opisthocentrus ocellatus, were assigned to three different OTUs simultaneously (see Supplementary Table 3). Selecting a standard delineation threshold for clustering based on hierarchical approaches can distort the true number of OTUs. Using a fixed similarity threshold in clustering can sometimes misrepresent the true diversity of OTUs, either by overestimating or underestimating the number of clusters. Clustering approaches that adapt to the inherent structure of the data—without applying a strict cut-off threshold—such as unsupervised Bayesian clustering [51] and the Swarm algorithm [52], may provide more accurate and biologically meaningful results. More than 1000 reads were accounted for each of the clusters among Hypomesus japonicus McAllister, 1963, Hypsagonus jordani Jordan & Starks, 1904 and Stichaeidae sp.—26 OTUs had 20 to 1200 reads, whereas each of the other 23 OTUs had less than 20 reads.

The largest number of reads came from two species of the genus Oncorhynchus (whose fillets are used for feeding, see above)—more than 13,000 for each species. Hence, it is encouraged to account for this when planning sufficient sequencing depth for mesocosm experiments [15, 16] to provide adequate coverage for the detection of DNA for targeted organisms with low amounts of eDNA. An option would be to use blocking primers for the forage organisms, similar to those used in diet assessment. However, it is necessary to consider all the advantages and limitations of this approach [53]. Primer bias should not be excluded either [54, 55], so we decided to investigate its possible impact.

A total of 155 sequences from 24 fish species were available in GenBank for the 12S rRNA fragment (Supplementary Table 1). One sequence of Oncorhynchus gorbuscha and 5 sequences of O. keta were successfully amplified in silico, with 1 mismatch per forward primer detected. The species G. herzensteini had the highest number of sequences (30). Only one 12S rRNA sequence each was found for the species A. proboscidalis, B. derjugini, C. saitone, H. jordani and S. epallax. The mean number of 12S rRNA sequences per species was 6.46 ± 7.42 (mean ± standard deviation). 17 of 24 species were successfully amplified using in silico PCR (had at least one successfully amplified sequence). The highest number of primer mismatches (3) was attributed to the species H. japonicus, while A. elongatus and B. elegans had 2 mismatches each. The rest of the in silico amplified showed 1 mismatch each. At the same time, it should be noted that all mismatches of H. japonicus are concentrated on one (‘forward’) primer, whereas in A. elongatus and B. elegans 1 mismatch per each (forward and reverse) primer was found. Mismatches near the 3 end of the primer can significantly reduce amplification efficiency, whereas mismatches near the 5 end have less effect [56, 57]. It is possible that the mismatches in H. japonicus sequences occurred in less critical regions of the primer binding sites, allowing for successful amplification, while the mismatches in A. elongatus and B. elegans may have been located in positions that more adversely affected in silico amplification. Furthermore, differences in species-specific behaviors and physiology can influence the amount of eDNA shed into the environment which may increase the probability of detecting certain species. Hypomesus japonicus is known to be an active swimmer, often forming schools [58]. This increased activity can lead to higher eDNA shedding rates [59]. In contrast, Alcichthys elongatus and Bero elegans may exhibit less active behaviors, potentially resulting in lower eDNA contributions despite their presence in the tank.

Although we had the same number of individuals (three) for each species in the tank, their behaviors and physiological differences could have led to varying levels of eDNA shedding. While A. elongatus is larger in size—approximately three times larger than H. japonicus and B. elegans—we lack precise biomass measurements to quantify their contributions accurately.

Despite the small representation of the group of species not detected under mock community conditions but detected by in silico PCR, Welch’s t-test showed (Fig. 2) a significantly higher number of mismatches in these species compared to successfully detected species by both methods. The success of in silico PCR (see Supplementary Table 2) varied widely between species. The analyses still allow speculation on the primer bias identified above. Although one of the species successfully detected by eDNA had a greater number mismatches, both species not detected by eDNA mock community also exhibited multiple primers mismatches in silico. This approach, which to our knowledge is not very popular in the field of eDNA metabarcoding, can be used at the preliminary stage of research, between the collection and validation of a local reference library and the selection of optimal primers for the target genome regions.

Fig. 2.

Comparison of the mean primer mismatches between species detected and not detected via eDNA metabarcoding. The boxplot shows two classes of variable (0, not detected via eDNA metabarcoding, 1, detected) significantly discriminated by Welch’s t-test (t = 6.5, df = 14, p-value = 0.00001402). Vertical lines indicate median values.

We unambiguously identified nine out of 20 taxa (45%) in the tank. Two taxa present (Alcichthys elongatus (Steindachner 1881), Bero elegans (Steindachner, 1881)) were not identified, i.e., there is no OTU that might be related to these species. Another 11 taxa (55%) may somehow be concealed behind OTUs with conflicting taxonomic reference (see above), such as Chirolophis sp., Lumpenus sp., Pleuronectidae sp. (Fig. 1, Supplementary Table 3). The indication of all these taxa may come from environmental DNA originating from the water storage reservoir, whose intake is located next to the Primorsky Aquarium building. Therefore, these OTUs likely result from external environmental DNA introduced via the water supply rather than solely from the fish present in the tank. The presence of such external DNA is expected and, regretfully, is not unusual [15, 16]. In addition to reducing the probability of contamination, the development and verification of a reference library for fish identification should be considered as an important way to reduce uncertainty and improve the reliability of high-throughput monitoring methods based on aquatic eDNA [3, 49].

4. Conclusions

Our study highlights the potential and challenges of using eDNA metabarcoding for monitoring fish biodiversity in controlled environments. Despite the potential of this method, our results showed that only 9 out of the 20 species present in the tank were unambiguously identified, and there were 25 instances of incorrect species identifications, resulting in a low accuracy rate. These findings indicate a substantial potential for misidentification, suggesting that the method, as applied in our study, should be used with extreme caution and awareness of the its limitations.

The high rate of misidentification may be attributed to several factors, including inconsistent and incomplete reference databases, primer biases affecting amplification efficiency, and contamination from external DNA sources such as feed and water supply. These limitations underscore the critical need for comprehensive and accurate reference libraries, improved primer design, and stringent contamination controls to enhance the accuracy and reliability of eDNA metabarcoding. By acknowledging these limitations, we aim to contribute to the advancement of eDNA metabarcoding methods and encourage further studies to overcome the obstacles identified in our research. Improving these methodologies is essential before eDNA metabarcoding can be confidently applied in practical biodiversity monitoring and conservation efforts.

Availability of Data and Materials

The data used to support the findings of this study are available from the corresponding author upon reasonable request.

Author Contributions

OR performed the collection of water from tank and DNA isolation from syringe filters. ST designed the research, performed genetic analysis and analyzed the data. ST wrote the manuscript. Both authors contributed to editorial changes in the manuscript. Both authors read and approved the final manuscript. Both authors have participated sufficiently in the work to take public responsibility for appropriate portions of the content and agreed to be accountable for all aspects of the work in ensuring that questions related to its accuracy or integrity.

Ethics Approval and Consent to Participate

The Commission on Biomedical Ethics of A.V. Zhirmunsky National Scientific Center of Marine Biology, Far Eastern Branch of the Russian Academy of Sciences reviewed and approved the study (Approval Record No. 1-040924, Meeting No. 8, September 04, 2024). The commission determined that the experimental procedures involving water samples from aquariums containing various fish species are justified for the study’s objectives and comply with current Russian and international ethical regulations for scientific research involving animals.

Acknowledgment

The authors are grateful to staff of Primorsky Aquarium for assistance with environmental DNA sampling.

Funding

The study was funded by the Federal scientific and technical program in the field of environmental development of the Russian Federation and climate change for 2021 – 2030, Russian Federation (Project No 123080800009-5).

Conflict of Interest

The authors declare no conflict of interest.

Supplementary Material

Supplementary material associated with this article can be found, in the online version, at https://doi.org/10.31083/FBS26247.

References

Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cite

Share