IMR Press / FBL / Volume 27 / Issue 5 / DOI: 10.31083/j.fbl2705159
Open Access Original Research
GpemDB: A Scalable Database Architecture with the Multi-omics Entity-relationship Model to Integrate Heterogeneous Big-data for Precise Crop Breeding
Show Less
1 School of Mechanical Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
2 Shanghai Agrobiological Gene Center, 201106 Shanghai, China
3 College of Plant Science and Technology, Huazhong Agricultural University, 430070 Wuhan, Hubei, China
4 College of life science and technology, Shanghai Jiao Tong University, 200240 Shanghai, China
*Correspondence: gongliang_mi@sjtu.edu.cn (Liang Gong); chlliu@sjtu.edu.cn (Chengliang Liu)
These authors contributed equally.
Academic Editor: Tatsuya Akutsu
Front. Biosci. (Landmark Ed) 2022, 27(5), 159; https://doi.org/10.31083/j.fbl2705159
Submitted: 21 December 2021 | Revised: 25 March 2022 | Accepted: 24 April 2022 | Published: 17 May 2022
Copyright: © 2022 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.
Abstract

Background: With the development of high-throughput genome sequencing and phenotype screening techniques, there is a possibility of leveraging multi-omics to speed up the breeding process. However, the heterogeneity of big data handicaps the progress and the lack of a comprehensive database supporting end-to-end association analysis impedes the efficient use of these data. Methods: In response to this problem, a scalable entity-relationship model and a database architecture are firstly proposed in this paper to manage the cross-platform data sets and explore the relationship among multi-omics, and finally accelerate our breeding efficiency. First, the targeted omics data of crops should be normalized before being stored in the database. A typical breeding data content and structure is demonstrated with the case study of rice (Oryza sativa L). Second, the structure, patterns and hierarchy of multi-omics data are described with the entity-relationship modeling technique. Third, some statistical tools used frequently in the agricultural analysis have been embedded into the database to help breeding. Results: As a result, a general-purpose scalable database, called GpemDB integrating genomics, phenomics, enviromics and management, is developed. It is the first database designed to manage all these four omics data together. The GpemDB involving Gpem metadata-level layer and informative-level layer provides a visualized scheme to display the content of the database and facilitates users to manage, analyze and share breeding data. Conclusions: GpemDB has been successfully applied to a rice population, which demonstrates this database architecture and model are promising to serve as a powerful tool to utilize the big data for high precise and efficient research and breeding of crops.

Keywords
database
multi-omics
phenomics
metadata-level
informative-level
visualization platform
big data
crop
precise breeding
rice
Figures
Fig. 1.
Share
Back to top