IMR Press / FBL / Volume 25 / Issue 10 / DOI: 10.2741/4883
Article
Comparative analysis of Coronaviridae nucleocapsid and surface glycoprotein sequences
Show Less
1 Southern University and A and M College, Baton Rouge, LA 70813
2 University of Missouri Dalton Cardiovascular Research Center, Columbia, MO 65211
Send correspondence to: Babu V. Bassa, Department of Environmental Toxicology 108 Fisher Hall, P.O. Box 9264 Southern University and A &M College Baton Rouge, LA 70813, Tel: 573-449-7444, Fax: 225/771-5350, E-mail: bbassa9824@gmail.com
Front. Biosci. (Landmark Ed) 2020, 25(10), 1894–1900; https://doi.org/10.2741/4883
Published: 1 June 2020
Abstract

We analyzed the nucleocapsid and surface proteins from several Coronaviridae viruses using an alignment-free computer program. Three isolates of novel, human coronavirus (SARS0CoV-2) (2019) that are responsible for the current pandemic and older SARS strains of human and animal coronaviruses were examined. The nucleocapsid and glycoprotein sequences are identical for the three novel 2019 human isolates and they are closely related to these sequences in six bat and human SARS coronaviruses. This strongly supports the bat origin of the pandemic, novel coronavirus. One surface glycoprotein fragment of 111 amino acids is the largest, conserved, common permutation in the examined bat SARS-like and human SARS viruses, including the Covid-19 virus. BLAST analysis confirmed that this fragment is conserved only in the human and bat SARS strains. This fragment likely is involved in infectivity and is of interest for vaccine development. Surface glycoprotein and nucleocapsid protein sequence homologies of 58.9% and 82.5%, respectively, between the novel SARS0CoV-2 strains and the human SARS (2018) virus suggest that existing anti-SARS vaccines may provide some protection against the novel coronavirus.

Keywords
Coronavirus
Nucleocapsid Protein
SARS0CoV-2
Sequence Homology
Surface Glycoprotein
2. INTRODUCTION

Viruses belonging to the family Coronaviridae are known to cause severe and acute lung inflammation in humans and other animals (1). Comparative analysis of coronavirus proteins is useful for understanding the relationships of these viruses with respect to their origins, for developing more specific diagnostic tests, and to design vaccines against the novel coronavirus that is causing widespread morbidity and mortality across the world. We used an alignment-free software program (Compare) developed by one of us (Babu V. Bassa) for comparing surface glycoprotein and nucleocapsid proteins of the coronaviruses. Non-alignment programs are considered to be superior to the alignment programs because of known uncertainties associated with the alignment of sequences (2). Our program extracts common amino acid sequences (permutations) that are five residues or larger from any given pair of proteins. This procedure identifies conserved fragments and provides information on the physical similarities among the primary structures of biological sequences.

3. METHODS

The analysis was done using the unique software tool “Compare”. The algorithm was implemented in Microsoft’s Visual Basic language for the Windows Operating System. An outline of the algorithm is presented in Figure 1. The source code and the raw data will be made available to the Journal for distribution.

Figure 1

This example shows how the common permutation of the two sequences (“mnopqrs”) is identified.

In keeping with the current terminology, the coronavirus strain of the current outbreak is referred to as “novel coronavirus” and the strains prior to the 2019 outbreak that caused severe acute respiratory syndrome are referred to as “SARS strains” throughout the manuscript. Similarly bat SARS viruses are referred to as “bat SARS-like strains”.

The coronavirus sequences used in this analysis were obtained from the GeneBank. The gene bank accession numbers for all comparisons are given in Table 1. Additionally, a large number of animal coronaviruses were screened and were found to be very distant in terms of sequence homologies (described later in this section) to the novel coronavirus. The severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV WHU01, GeneBank- Accession number: MN988668 (11-FEB-2020), was obtained as the complete genome (3). The severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, GeneBank-Accession number: NC-045512.2 (13-MAR-2020), was obtained as the complete genome (4). The SARS coronavirus, GeneBank, Accession number: NC_004718 (13-AUG-2018), was obtained as the complete genome (5).

Table 1 Nucleocapsid and surface glycoprotein sequence relationships among various strains of coronaviruses
Coronavirus SARS0CoV-2 AC: MT072688
Nucleocapsid protein Surface glycoprotein Accession
%Homology Largest Common Permutation %Homology Largest Common Permutation
SARS0CoV-2(1) Feb 2020 100 419 100 1273 MN988668
SARS0CoV-2(2) Mar 2020 100 419 100 1273 NC_045512
Bat.RaTG13 Mar 2020 99 177 99 440 MN996532
Rhinophol. affins-2014 85.4 99 59.2 111 KF569996
SARS.BAT-July2017 83.7 66 57.9 111 JX993988
SARS.Human Aug 2018 82.5 43 58.9 111 NC_004718
SARS.Bat Dec 2017 82.5 43 59.4 111 KY417152
Kenyan Bats Feb 2020 73.3 44 50.9 97 KY352407
M.East.Res.Syndrome May 2016 16.6 8 5.4 14 KX034100
Avian Aug 2018 8.2 6 3.1 8 NC_001451
Human. Dec.2018 6.5 12 4.8 7 NC_003045
Bat.Corona.HKUB Apr 2008 1 4 2.1 8 EU420139
Actual fragments are shown in Figures 1-3. Viruses are arranged in decreasing order of homology with the novel coronavirus

Several-hundred whole, genome sequences from the family Coronaviridae are available in the GeneBank; however, many were found to be repeats of the same strains. In the initial phase of this study we screened multiple combinations (more than 100) of nucleocapsid proteins and surface glycoproteins from various coronaviruses. The three novel coronavirus isolates (SARS0CoV-2) were identical and they were similar to SARS and bat SARS-like viruses. Based on this initial screening, pairs of viral strains were chosen for final analyses. Nucleocapsid protein (NCP) and surface glycoprotein (SGP) were selected for comparative analyses because of their known importance in infection and in the immune response. To calculate sequence homologies, the character lengths of the common fragments equal to or larger than five amino acid residues were summed and the percentages were computed based on the total sequence lengths. The homology parameter so obtained is a relative index applicable only to this method of calculation.

4. RESULTS

Protein fragments from the three isolates of novel coronavirus (SARS0CoV-2) are identical and they have higher degrees of sequence homology with SARS and bat SARS-like strains reported prior to the current outbreak (Table 1). The latest reported bat strain, Bat.RaTG13-Mar-2020, (6) has 99% sequence homology with the novel coronavirus with respect to both SGP and NCP. The Kenyan bat coronavirus genome that was deposited in 2016 by the Centers for Disease Control laboratory, Atlanta Georgia (7), has significant homology with novel coronavirus with respect to both NCP and SGP (Table 1). There is a largest common permutation (111 residues long) that is conserved in the SGP of novel coronavirus, SARS strains, and bat SARS-like coronaviruses. It is part of the 440 fragment of Bat.RaTG13 (Table 1). Therefore, the 111-residue is present in the novel coronavirus, in at least one human SARS strain of coronavirus, and in at least one bat SARS-like strain of coronavirus (Table 1). Based on this observation we have subjected the 111-fragment to a BLAST search and found that the 111-fragment is preserved only in SARS viruses (data not presented). The 111 SGP motif is absent in avian, MERS, some human, and some bat strains of coronaviruses. The NCP in corona viruses is only 419 amino acids long. As shown in Figure 3 and Table 1 there are several polypeptide motifs originating from this protein that are common to novel coronavirus, SARS and bat SARS- like strains of coronavirus. A compilation of common polypeptide motifs is presented in Figure 1, Figure 2, and Figure 3. These polypeptide motifs will be useful as detection tools in studying the origins of novel coronavirus. They also will be helpful in designing vaccine candidates.

Figure 2

For each numbered motif, the size is given in parenthesis and the location of the fragment in the molecule is indicated by the underlined residue number.

Figure 3

For each numbered motif, the size is given in parenthesis and the location of the fragment in the molecule is indicated by the underlined residue number.

Figure 4

For each numbered motif, the size is given in parenthesis and the location of the fragment in the molecule is indicated by the underlined residue number.

5. DISCUSSION

Unlike alignment-based sequence comparison programs, our software tool allows comparison of sequences by identifying and making profiles of common permutations between given pairs of biological sequences. The picture captured is easily understood and interpreted and does not have some uncertainties associated with alignment-based sequence comparison programs. The size of the largest common permutation is an easily understood parameter of the relationships among the sequence pairs. The program and its applications are more fully described in the methods section and by Figure 1. The validity of the program was established by usage that has reproduced results obtained by other programs with GeneBank data.

The nucleocapsid protein and surface glycoprotein of the SARS coronaviruses (Figure 5) are integral parts of the virus structure. They can be identical or can have varying degrees of similarity (homology) among viruses in this group (Table 1). As complex molecules on the virus surface, they are responsible for differences in host range, infectivity and pathogenicity.

Figure 5

Structural depiction of coronavirus (Source: Drazen JM). Spike glycoprotein (surface glycoprotein) and the nucleocapsid protein sequences were analyzed in the current study. One 111 amino acid residues long fragment belonging to the surface glycoprotein is conserved in many lethal strains of human and bat coronaviruses as revealed by our analysis.

The data presented strongly support a very close relationship among some bat and the novel human coronaviruses that are causing much morbidity and mortality across the world. The statistical probability of the occurrence of so many common permutations for two different proteins between any two strains of viruses purely by chance is infinitesimally small. The similarity between bat and human strains is, however, disputed by some scientists (8). With regard to this dispute and the natural selection hypothesis, we strongly disagree with the idea of using biological activities to determine the origins of viruses or any other species. We prefer, and have presented, physical evidence. Regardless of the dispute on the origins of the highly lethal virus strains, the high degree of homology (a physical characteristic) raises the theoretical possibility that vaccines against pre-2019 SARS strains will provide some cross-protection against the novel coronavirus strains. The comparative data between current and past strains of coronaviruses specifically establishes an approach for interim vaccine development. Avian, bovine, equine, canine, feline, calf-giraffe, rabbit, water deer, and some strains of human and bat coronaviruses have very low sequence homology with novel coronaviruses as analyzed by “Compare” (data not shown).

In conclusion, our data strongly support a close relationship among bat, the human SARS (2018 strain) and the novel coronavirus. The identified protein fragments are highly conserved in the lethal and in the highly-contagious SARS strains of the viruses including the older and the most recent ones. These proteins are essential to the virulence, lethality and infectivity of the viruses. They will be useful in designing vaccines and future improved diagnostic tests, and for understanding the nature of infection by these viruses and their potential future mutations.

Abbreviations
Abbreviation Expansion

NCP: Nucleocapsid protein, SGP: Surface glycoprotein, MERS: Middle East Respiratory Syndrome.

References
[1]
SQDengHJPengCharacteristics of and public health responses to the coronavirus disease 2019 outbreak in China. J Clin Med. E575, 9922020DOI: 10.3390/jcm9020575 PMid:32093211 PMCid:PMC7074453
[2]
VingaSAlmeidaJAlignment-free sequence comparison-a review.19, 513-23 92003)DOI: 10.1093/bioinformatics/btg005 PMid:12611807
[3]
ChenLLiuWZhangQXuKYeGWuWSunZLiuFWuKZhongBMeiYZhangWChenYLiYShiMLanKLiuYRNA based mNGS approach identifies a novel human coronavirus from two pneumonia cases in 2019 Wuhan outbreak. Emerg Microbes Infect.20209313319DOI: 10.1080/22221751.2020.1725399 PMid:32020836 PMCid:PMC7033720
[4]
Wu F Zhao S Yu B Chen Y, M Wang W Hu Y Song ZG Tao ZW Tian JH Pei YY Yuan ML Zhang YL Dai FH Liu Y Wang QM Zheng JJ Xu L Holmes EC Zhang YZ A novel coronavirus associated a respiratory disease in Wuhan of Hubei province, China. National Center for Biotechnology Information. Submitted (17-JAN-2020)
[5]
Snijder EJ Bredenbeek PJ Dobbe JC Thiel V Ziebuhr J Poon LL Guan Y Rozanov M Spaan WJ Gorbalenya AE Biochem. Biophys. Res. Commun. 2004 316 476 483
[6]
ZhouPYangXLWangXGHuBZhangLSiHRZhuYLiBHuangCLChenHDChenJLuoYGuoHJiangRDLiuMQChenYShenXRWangXZhenXSZhaoKChenQJDengFLiuLLYangBZhangFXWangYYXiaoGFShiZLA pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 569, 270 -2020273DOI: 10.1038/s41586-020-2012-7 PMid:32015507 PMCid:PMC7095418
[7]
TaoYTongSComplete genome sequence of a severe acute respiratory syndrome-related coronavirus from Kenyan bats. Microbiol Resour Announc. 8, 00548 -201919DOI: 10.1128/MRA.00548-19 PMid:31296683 PMCid:PMC6624766
Share
Back to top