Site-Specific Profiling of N -Glycans in Drosophila melanogaster

Background : Drosophila melanogaster is a well-studied and highly tractable genetic model system for deciphering the molecular mechanisms underlying various biological processes. Although being one of the most critical post-translational modifications of proteins, the understanding of glycosylation in Drosophila is still lagging behind compared with that of other model organisms. Methods : In this study, we systematically investigated the site-specific N -glycan profile of Drosophila melanogaster using intact glycopeptide analysis technique. This approach identified the glycans, proteins, and their glycosites in Drosophila , as well as information on site-specific glycosylation, which allowed us to know which glycans are attached to which glycosylation sites. Results : The results showed that the majority of N -glycans in Drosophila were high-mannose type (69.3%), consistent with reports in other insects. Meanwhile, fu-cosylated N -glycans were also highly abundant (22.7%), and the majority of them were mono-fucosylated. In addition, 24 different sialylated glycans attached with 16 glycoproteins were identified, and these proteins were mainly associated with developmental processes. Gene ontology analysis showed that N -glycosylated proteins in Drosophila were involved in multiple biological processes, such as axon guidance, N-linked glycosylation, cell migration, cell spreading, and tissue development. Interestingly, we found that seven glycosyltransferases and four glycosidases were N -glycosylated, which suggested that N -glycans may play a regulatory role in the synthesis and degradation of N -glycans and glycoproteins. Conclusions : To our knowledge, this work represents the first comprehensive analysis of site-specific N -glycosylation in Drosophila , thereby providing new perspectives for the understanding of biological functions of glycosylation in insects.


Introduction
Glycosylation is one of the most important posttranslational modifications of proteins.It can be found in all eukaryotes, ranging from single-celled microorganisms like yeast to mammalians like humans [1].The diversity in types of glycans and structures has expanded in the evolution of eukaryotes, likely due to the need for increased molecular cues and regulation [2].Despite their diverse structures, glycans often share certain features, such as common core structures, and terminal modifications.Protein glycosylation exists in two major forms, N-glycans and O-glycans [3].N-glycans are covalently attached to proteins at asparagine (Asn) residues by an N-glycosidic bond, while O-glycans normally represent the link of glycans to the amino acids Ser/Thr with an O-glycosidic linkage.So far, N-glycans and O-glycans are the most studied glycosylation in the field of glycobiology.
The analysis of protein glycosylation relies mostly on glycomic and glycoproteomic approaches.Glycomics studies the whole glycome of a cell, analyzing the glycans with mass spectrometry by releasing them from the glycoproteins [4].In contrast, glycoproteomics focuses more on the analysis of the intact glycoproteins release, aiming to map the glycosites of proteins.The combination of gly-comics and glycoproteomics has contributed greatly to the analysis of glycosylation in a variety of organisms, in particular humans [5].In recent years, direct analysis of intact glycopeptide/glycoproteins has been enabled due to the development of mass spectrometry technologies and bioinformatics tools [6][7][8].This strategy not only identifies the glycans, proteins, and their glycosites but also obtains additional site-specific glycosylation information, which allows us to identify glycans and their respective glycosylation sites.However, the applications of intact glycopeptide analysis currently are still focused mostly on mammalians, in particular humans and mice, due to their importance for the understanding of biological roles of glycosylation in many diseases like cancers.Intact glycopeptide analysis in lower animals like invertebrates has yet to be fully explored compared with that of mammalians.
Drosophila melanogaster, also known as the fruit fly, is a well-studied and highly tractable genetic model system used to decipher the molecular mechanisms underlying various biological processes.Presently, most of the data on insect glycosylations comes from research in Drosophila [9][10][11].To better study the functions of glycosylation in Drosophila, it is critical to understand the diversity of glycans and their expressions.Analysis of glycosylation has shown that N-glycan profile of Drosophila changes as development proceeds, indicating possible roles for certain glycan structures during different stages of development [12].Vandenborre et al. [13] investigated the glycosylation of Drosophila with lectin affinity chromatography and characterized the expression of hybrid and complex types of glycans.Despite all these efforts, information obtained from previous studies was very limited due to the technical constraints of mass spectrometry methods used.With the development of intact glycopeptide analysis, comprehensive profiling of glycosylation at site-specific levels in Drosophila is finally approachable.
Here, the expression of site-specific N-glycans in Drosophila is profiled with cutting-edge glycoproteomic approaches.The application of intact glycopeptide analysis enabled the identification of N-glycans with different structures, as well as their modifications, glycosites, and glycoproteins.This work provides important data for understanding the expression and biological functions of sitespecific N-glycans in Drosophila.

Insects and Protein Extraction
Drosophila melanogaster was maintained on a corn meal-based diet under standard conditions of 23-25 °C, 65-70% relative humidity, and a 16:8 (light/dark) photoperiod.Homogenates of around 50 mg adult Drosophila melanogaster (approximately 30-50 insects, 5 days postemergence) were prepared in 8 M urea/1 M NH 4 HCO 3 (Sigma-Aldrich, St Louis, MO, USA) solution using a tissue homogenizer.The extract was sonicated until the upper solution was clear and the supernatant was harvested as total proteins after centrifugation at 15,000 g for 15 min.The protein concentration was determined by BCA protein assay reagent (Beyotime, Beijing, China).

Protein Digestion
The sample was reduced with 5 mM DTT (Sigma-Aldrich) for 1 h at room temperature before alkylation with 20 mM iodoacetamide (Sigma-Aldrich) at room temperature in the dark for 30 min.After diluting 5 times with deionized water, proteins were digested by a sequencing grade trypsin (Promega, Madison, WI, USA; protein: enzyme, 50:1, w/w) at 37 °C overnight with shaking.The digested sample was centrifuged at 13,000 g for 10 min after adjusting to pH <3 with 10% trifluoroacetic acid (TFA; Sigma-Aldrich).The supernatant was purified with Sep-Pak C18 cartridges (Waters, Milford, MA, USA) and peptides were eluted in 60% acetonitrile (ACN)/0.1% TFA (Sigma-Aldrich).The peptide concentration was measured by BCA protein assay reagent.

Glycopeptide Enrichment
Glycopeptide enrichment was performed by Oasis MAX column (Waters) using 200 µg of digested pep-tides from tissues of Drosophila for glycoproteomic analysis.First, the Oasis MAX cartridge was sequentially equilibrated with ACN, 100 mM triethylammonium acetate (Sigma-Aldrich, ddH 2 O, and 95% ACN/1% TFA.Then, the desalted peptides were loaded onto cartridges twice after being dried and reconstituted in 95% ACN/1% TFA.The cartridge was washed with 95% ACN/1% TFA three times.The glycopeptides were eluted with 600 µL of 50% ACN/0.1% formic acid (FA) (Sigma-Aldrich).Finally, the glycopeptides were dried and resuspended in 20 µL of 0.1% FA.One microgram of glycopeptides was applied to each liquid chromatography tandem-mass spectrometry (LC-MS/MS) analysis.

Mass Spectrometry Analysis
The enriched glycopeptides were analyzed by an Easy-nanoLC 1200 system coupled with an Orbitrap fusion lumos mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA).Peptides were separated by a 75 µm × 25 cm Acclaim PepMap100 separating column protected by a 2 cm guarding column.Mobile phase flow rate was 300 nL/min and consisted of 0.1% FA in water (A) and 0.1% FA in 80% ACN (B).The LC gradient profile was set as follows: 1-7% B for 3 min, 7-40% B for 107 min, 40-68% B for 3 min, 68-99% B for 4 min, and 99% B for 3 min.All experiments were done in data-dependent acquisition (DDA) mode with the top 10 ions isolated at a window of 0.7 m/z.Only precursors with charge states ranging from +2 to +7 were considered for MS/MS events.Full MS scans were acquired in the Orbitrap mass analyzer over m/z 700-2,000 with a resolution of 120,000.Spectra were collected with an automatic gain control (AGC) target of 5 × 10 5 from 700-2000 m/z at a resolution of 120,000 followed by data-dependent higher-energy collision dissociation (HCD) MS/MS (at a resolution of 15,000) with two HCD energies (20% and 33%).The MS2 analysis was only performed when two oxonium ions (m/z = 138.055and 204.087) were detected.A dynamic exclusion time of 5 seconds was used to discriminate against previously selected ions.Three LC-MS/MS replicates were performed in the analysis.

Database Search
All LC-MS/MS data for intact glycopeptide analysis were searched using Glyco-Decipher software v1.0 [8].The parameters were set as follows: precursor mass tolerance of 5 ppm; fragment mass tolerance of 20 ppm; enzyme was set as full trypsin digestion with three maximum missed cleavages; carbamidomethylation at C was set as fixed modification; and oxidation at M was set as variable modification.The Drosophila melanogaster reference proteome database (Proteome ID UP000000803 from Uniprot.org) was used.The results were filtered at 1% FDR at both peptide and glycan levels.

Gene Ontology and Pathway Analysis.
The Gene Ontology (GO) and (Kyoto Encyclopedia of Genes and Genomes) KEGG pathway analyses were performed using David Bioinformatics Resources (https: //david.ncifcrf.gov/home.jsp).

Overall Profile of N-Glycosylation in Drosophila
N-glycan profiles in Drosophila were determined using intact glycopeptide analysis technique.The analysis of three LC-MS/MS runs identified a total of 657 unique intact glycopeptides, which consisted of 67 different glycans attached to 274 glycosites from 166 proteins (Fig. 1a, Supplementary Table 1).Based on previous studies [14], we classified N-glycans from Drosophila into six categories: pauci-mannose, high-mannose, complex, fucosylated, sialylated N-glycans, as well as N-glycans with both fucosylation & sialylation (Fig. 1b).When considering unique glycans, around 2/3 of the N-glycans were fucosylated or sialylated, consisting of 21 fucosylated N-glycans, 15 sialylated N-glycans, and 9 N-glycans with both fucosylation & sialylation.This is a great contrast to their abundance, which accounted for 26.5% of all glycans according to identified Peptide Spectrum Matches (PSMs).These results demonstrated the high structural diversity of fucosylated and sialylated N-glycans in Drosophila.Contrary to fucosylated and sialylated glycans, only ten of the 67 N-glycans in Drosophila were pauci-mannose and high-mannose unique N-glycans, yet they modified 69.3% of all glycopeptides despite their expression levels.Fig. 1c showed the top ten most expressed N-glycans according to PSM numbers in LC-MS/MS analysis.Seven of the top ten glycans were high-mannose glycans, suggesting their high abundance in Drosophila.In addition, one pauci-mannose and two fucosylated N-glycans were also highly expressed.

High-Mannose and Fucosylated N-Glycans are Predominant in N-Glycome of Drosophila
Since high-mannose and fucosylated glycans were predominant in the glycome of Drosophila, we further investigated their expressions concerning different glycan structures.In high-mannose glycans, the number of hexoses ranged from four (Hex 4 HexNAc 2 ) to as high as ten (Hex 10 HexNAc 2 ) with vastly different expressions (Fig. 2a).The expression level of Hex 5 HexNAc 2 was the highest, modifying around 20% of all glycopeptides, while the expression of Hex 4 HexNAc 2 was lowest, identified in less than 1% of all glycopeptides.In fucosylated glycans, the majority of them were mono-fucosylated (98%) with only 2% being di-fucosylated glycans (Fig. 2b).
To further understand their biological functions, we performed gene ontology analysis on proteins that were modified with high-mannose and fucosylated glycans (Fig. 2c).In the biological process, common GO terms for proteins with high-mannose and fucosylated glycans in-cluded axon guidance, substrate adhesion-dependent cell spreading, positive regulation of cell migration, cell adhesion by integrin, tissue development, and organ morphogenesis, which suggested that these proteins might be closely associated with the development of the fruit fly.Current studies have also shown that N-glycosylation may influence the activities of some development-related proteins we identified [15,16].For example, N-glycosylation of laminin 332 was reported to regulate its biological activities [17].Further, N-glycosylation of DSCAM1 was critical for its dimer formation and hence could modulate its biological functions [18].Notably, proteins with highmannose glycans were more closely associated with GO terms such as the protein N-glycosylation via asparagine, basement membrane assembly and organization, and lymph gland development, while proteins with fucosylated glycans participated more in cell adhesion.In addition, further analysis of KEGG pathways also suggested that proteins with high-mannose were associated with N-glycosylation biosynthesis processes, revealing pathways such as Nglycan biosynthesis, various types of N-glycan biosynthesis and protein processing in endoplasmic reticulum.

Sialylated Glycans in Drosophila and its Validation
The occurrence of sialic acids in the fruit fly was reported three decades ago.Studies have shown that sialylation in Drosophila is probably restricted to the nervous system and is tightly regulated by spatially and temporally restricted expression of the Drosophila sialyltransferase, DsiaT, throughout development [19,20].Since then, there have been very few studies that mapped the sialylation of proteins in Drosophila.In our results, a total of 38 intact glycopeptides with sialylated glycans were identified, which consisted of 24 different sialylated glycans attached to 16 glycoproteins (Supplementary Table 1).Fig. 3a showed the expression of all sialylated glycans with at least two PSMs.In terms of sialic acid numbers, the expression of glycans with two sialic acids was the highest, followed by glycans with a single sialic acid, and glycans with three sialic acids.In addition, a few glycans containing four sialic acids were also identified (Fig. 3b).
To validate the expression of sialylated glycans, we manually checked the LC-MS/MS spectra of the glycan Hex 5 HexNAc 4 NeuAc 2 , which was the most expressed sialylated glycan in Drosophila (Fig. 3c).According to the spectra, Hex 5 HexNAc 4 NeuAc 2 had two possible structures, both were modified with two sialic acids at the ends.This is similar to a previous study on the developmental elaboration of N-glycans in the Drosophila embryo [12].Since insects generally do not express complex type Nglycans [9], the first structure was more likely to be accurate.Furthermore, we checked the tissue specificity of proteins with sialylated glycans.Several proteins were found to be expressed in tissues like the embryo and brain.For example, Magi, modified mostly by Hex 5 HexNAc 4 NeuAc 2 , was specifically expressed in neural cells and cleavage embryos.These results were consistent with previous reports and confirmed the reliability of our intact glycopeptide identifications [20].

Functional Annotation of Glycosylated Proteins
To understand the overall biological significance of glycosylation in Drosophila, we performed gene ontology analysis using all identified glycosylated proteins.Glycosylated proteins were involved mostly in the biological processes of axon guidance, proteins N-glycosylation via asparagine, positive regulation of cell migration, substrate adhesion-dependent cell spreading, and basement membrane assembly (Fig. 4a).The cellular component analysis showed that the majority of glycosylated proteins were localized in the membrane system, including endomembrane system, integral component of membrane, plasma membrane, basement membrane, and endoplasmic reticulum (Fig. 4b).In terms of molecular function, these pro-teins were involved in calcium ion binding, transmembrane receptor protein tyrosine kinase activity, lipid transporter activity, metalloexopeptidase activity, and receptor binding (Fig. 4c).Altogether, it was evident that glycosylated proteins in Drosophila may play a crucial role in the Nglycosylation process itself.Subsequently, we analyzed the KEGG pathway of these glycosylated proteins identified in Drosophila (Fig. 4d).KEGG analysis revealed that glycosylated proteins participated in pathways like protein processing in endoplasmic reticulum, N-glycan biosynthesis, and various types of N-glycan biosynthesis, which were all closely associated to the N-glycosylation process and consistent with our observations.

Glycosylation of Glyco-Related Enzymes
To explore the significance of glycosylated proteins in N-glycosylation process, we analyzed the glycosylation profiles of glyco-related enzymes.In Drosophila, a total of six proteins involved in N-glycan biosynthesis were found to be glycosylated, which consisted of five glycosyltransferases and one glycosidase (Table 1).Among the five glycosyltransferases, four were subunits of the oligosaccharyltransferase (OST) complex, which transfers N-glycan precursor (GlcNAc 2 Man 9 Glc 3 ) from dolichol to a polypeptide and is one of the most essential enzymes in the Nglycosylation pathway (Fig. 5a).These four glyco-related enzymes were modified with high-mannose glycans, including Hex 7 HexNAc 2 , Hex 8 HexNAc 2 , Hex 9 HexNAc 2 , and Hex 10 HexNAc 2 .The other glycosyltransferase FucT6, an alpha-1,6-fucosyltransferase that adds fucose at the common core structure of N-glycans, was revealed to be modified by two fucosylated glycans, Hex 3 HexNAc 2 Fuc 1 and Hex 3 HexNAc 3 Fuc 1 .Finally, the only glycosidase, Glucosidase 2 alpha subunit that trims the terminal glucose residues from N-glycan GlcNAc 2 Man 9 Glc 3 , was modified by high-mannose glycans Hex 6 HexNAc 2 , Hex 7 HexNAc 2 , Hex 8 HexNAc 2 , and Hex 9 HexNAc 2 .Furthermore, we found that three glyco-related enzymes involved in the degradation of glycoproteins were modified by pauci-mannose and high-mannose glycans (Table 1, Fig. 5b).Uggt, a glucosyltransferase in the endoplasmic reticulum (ER) that recycles glycoproteins possessing unfolded domains, was modified by Hex 8 HexNAc 2 and Hex 9 HexNAc 2 .The other two glyco-related enzymes, Edem1 and Edem2, are both alpha-1,2-mannosidases, which are involved in the ER-associated degradation of incorrectly folded glycoproteins.These two proteins were modified by Hex 3 HexNAc 2 , Hex 6 HexNAc 2 , and Hex 7 HexNAc 2 .

Discussion
With the development of the mass spectrometry-based glycomic and glycoproteomic approaches, especially intact glycopeptide analysis tools, our understanding of the diversity of N-glycans has improved greatly.So far, a growing number of software solutions have enabled the comprehensive analysis of glycosylation events, which involves de-termining the peptide sequence, the site of glycosylation, and the identification of the attached glycans.However, most tools for glycosylation analysis are more suitable for the analysis of mammalians, like humans and mice, which are closely associated with the study of human health and diseases.The adaptation to apply the most cutting-edge technologies to invertebrates like insects often lags as most software tools for glycopeptide annotation rely on a glycan database, which is still poorly established for insects compared with mammalians.For example, N-glycan databases in Byonic, a well-known intact glycopeptide analysis tool, include a human glycan database of 182 glycans and a mam- malian glycan database with 309 glycans, yet it has no glycan database for insects [21].Many software has similar issues when applied to insects.Unfortunately, the current glycan branch database in StrucGP is built based on human and mouse glycome, and may not be appropriate for the analysis of insect N-glycans.Recently, a tool named Glyco-Decipher uses a glycan database-independent peptide matching approach and implements a glycan database containing 10,936 entries corresponding to 1766 unique glycan compositions extracted from GlyTouCan [8,22].Glyco-Decipher claims better performance for unusual glycans or poor peptide backbone fragmentation compared with other software, ideal for the glycosylation analysis of insects.Therefore, Glyco-Decipher was chosen for glycosylation analysis of Drosophila in this study.
In recent years, the characterization of glycosylation in insects has been expanding rapidly, in particular for Drosophila melanogaster, which is a widely used model organism for several decades [12,23].In general, insects follow a similar initial processing pathway of N-glycosylation within the endoplasmic reticulum similar to most eukaryotes, whereas the processing in the Golgi apparatus diverges significantly from mammalian species like humans.Studies have shown that insects express mostly paucimannose glycans or oligo-mannose glycans.Our result showed that 69.3% of all glycopeptides in the fruit fly were modified with high-mannose glycans, consistent with previous reports.Furthermore, 22.7% of N-glycans were found to be fucosylated in the fruit fly.In addition, we identified 16 glycoproteins that were modified with sialylated glycans, which were mostly involved in the development of Drosophila.Finally, we also identified a minority of glycans (0.6% in PSM) with both fucosylation and sialylation, which were not reported previously.The presence of these glycans was not verified in this work and may require further studies to validate.
Owing to the intact glycopeptide analysis, we further discovered that several glyco-related enzymes were N-glycosylated in Drosophila.This includes both glycosyltransferases and glycosidases, which were involved in N-glycan biosynthesis as well as N-glycoprotein degradation.One of the glycosyltransferases encoded by Stt3A was also identified in previous work [11].In addition, we observed that N-glycan varies among different glycorelated enzymes, which is likely due to their catalytic activities.Accumulating evidence has shown that N-glycans may influence the processing and functions of glycosyltransferases, including their secretion, stability, and substrate/acceptor specificity, which may have a profound impact on glycosyltransferase activity [24].Our results suggested that Drosophila N-glycans may have a regulatory role in the biosynthesis of N-glycosylation, as well as the degradation of N-glycoproteins.Further studies will be required to further explore the relationship between glycorelated enzymes and their N-glycan modifications.

Conclusions
This work systematically analyzed the expression of site-specific N-glycans in the classic model organism Drosophila melanogaster using intact glycopeptide analysis coupled with Glyco-Decipher.To our knowledge, this work represents the first comprehensive analysis of sitespecific N-glycosylation in Drosophila, and should shed light on the understanding of biological roles of glycosylation in insects.

Fig. 1 .
Fig. 1.Overall profile of site-specific N-glycans in Drosophila.(a) Heatmap of identified N-glycans on intact glycopeptides from Drosophila.The peptide spectrum matches (PSMs) of the intact glycopeptides, comprising of different glycans (upper) and their glycosite locations in different glycoproteins (left) were exhibited in the heatmap.Glycans were classified into six categories: pauci-mannose, high-mannose, complex, fucosylated, sialylated N-glycans, and N-glycans with both fucosylation & sialylation.Glycosites were sorted into five groups according to the types of glycan modifications: G1: pauci-mannose; G2: pauci-mannose and high-mannose; G3: paucimannose, high-mannose, and complex; G4: pauci-mannose, high-mannose, complex and sialylated glycans; G5: all types of glycans.(b) Expressions of different glycan types.The inner circle represented numbers of unique intact glycopeptides, while the outer circle showed PSM numbers of intact glycopeptides.The classification of glycans indicated by color was the same as Fig. 1a.(c) Top ten N-glycans and their possible structures identified in Drosophila.

Fig. 2 .
Fig. 2. Structural and functional analysis of high-mannose and fucosylated N-glycans in Drosophila.(a) Hexose numbers of highmannose glycans in Drosophila.(b) Distribution of mono-fucosylated and di-fucosylated glycans in Drosophila.(c) Biological processes of proteins with high-mannose and fucosylated N-glycans in Drosophila.(d) KEGG pathways of proteins with high-mannose glycans in Drosophila.

Fig. 3 .
Fig. 3. Sialylated N-glycans in Drosophila.(a) Relative expressions of different sialylated glycans in Drosophila according to PSM numbers.(b) Expressions of sialylated glycans with one, two, three, and four sialic acids.The left axis showed the number of unique intact glycopeptide (IGP) and the right axis showed the PSM numbers.(c) Representative spectra of intact glycopeptide INTTLLK+Hex5HexNAc4NeuAc2 from Magi (accession Q9W2L2) in Drosophila.

Fig. 5 .
Fig. 5. Glycosylation of glyco-related enzymes.(a) Glycosylation (left) and functions (right) of enzymes in N-glycosylation synthesis.(b) Glycosylation (left) and functions (right) of enzymes in N-glycoprotein degradation.The colors in the heatmap represent the PSMs of intact glycopeptides identified in liquid chromatography tandem-mass spectrometry (LC-MS/MS).