In Silico Optimization of SARS-CoV-2 Spike Specific Nanobodies

¹ Warshel Institute for Computational Biology, School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, 518172 Shenzhen, Guangdong, China

² School of Chemistry and Materials Science, University of Science and Technology of China, 230026 Hefei, Anhui, China

³ Chenzhu Biotechnology Co., Ltd., 310005 Hangzhou, Zhejiang, China

^*Correspondence: baichen@cuhk.edu.cn (Chen Bai)
^†These authors contributed equally.

Front. Biosci. (Landmark Ed) 2023, 28(4), 67; https://doi.org/10.31083/j.fbl2804067

Submitted: 24 September 2022 | Revised: 1 December 2022 | Accepted: 12 December 2022 | Published: 6 April 2023

This is an open access article under the CC BY 4.0 license.

Download PDF

Brower Figures

Cite

Abstract

Background: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread worldwide, caused a global pandemic, and killed millions of people. The spike protein embedded in the viral membrane is essential for recognizing human receptors and invading host cells. Many nanobodies have been designed to block the interaction between spike and other proteins. However, the constantly emerging viral variants limit the effectiveness of these therapeutic nanobodies. Therefore, it is necessary to find a prospective antibody designing and optimization approach to deal with existing or future viral variants. Methods: We attempted to optimize nanobody sequences based on the understanding of molecular details by using computational approaches. First, we employed a coarse-grained (CG) model to learn the energetic mechanism of the spike protein activation. Next, we analyzed the binding modes of several representative nanobodies with the spike protein and identified the key residues on their interfaces. Then, we performed saturated mutagenesis of these key residue sites and employed the CG model to calculate the binding energies. Results: Based on analysis of the folding energy of the angiotensin-converting enzyme 2 (ACE2) -spike complex, we constructed a detailed free energy profile of the activation process of the spike protein which provided a clear mechanistic explanation. In addition, by analyzing the results of binding free energy changes following mutations, we determined how the mutations can improve the complementarity with the nanobodies on spike protein. Then we chose 7KSG nanobody as a template for further optimization and designed four potent nanobodies. Finally, based on the results of the single-site saturated mutagenesis in complementarity determining regions (CDRs), combinations of mutations were performed. We designed four novel, potent nanobodies, all exhibiting higher binding affinity to the spike protein than the original ones. Conclusions: These results provide a molecular basis for the interactions between spike protein and antibodies and promote the development of new specific neutralizing nanobodies.

Keywords

SARS-CoV-2 spike protein

nanobody

coarse-grained (CG) model

binding free energy

1. Introduction

The spread of the coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome, has resulted in over 608 million infections and more than 6.50 million deaths worldwide as of September 16th, 2022 (https://www.who.int/emergencies/diseases/novel-coronavirus-2019). The COVID-19 pandemic has emerged as a global international health crisis world with far-reaching implications for the global economy, science, peace, and security [1]. Therefore, to meet this challenge, tremendous efforts have been devoted to developing therapeutic approaches against SARS-CoV-2.

The SARS-COV-2 spike protein is the focus of therapeutic and vaccine developmental efforts. The spike protein of SARS-CoV-2 is a large class I trimeric fusion protein, which consists of two subunits, S1 and S2 (Fig. 1A, Ref. [2]) [3, 4, 5]. The S1 subunit contains a receptor-binding domain (RBD) which is responsible for the interaction with angiotensin-converting enzyme 2 (ACE2) to gain entry into the host [6], while the S2 subunit mediates membrane fusion and viral entry [7]. Another key structural feature of the spike protein is its extensive glycosylation which plays a crucial role in viral pathogenesis (gray part in Fig. 1A) [8]. The activation of SARS-CoV-2 spike protein is closely related to the approach and binding of the ACE2 receptor. Xu et al. [9] described the process of spike activation and elucidated the Cryo-EM structures of three key conformational states of the spike trimer (Fig. 1B). Their results suggested that ACE2 facilitates the capture of the pre-existing open conformation of S trimers rather than triggering a trimer opening event. Under the ACE2-free condition, the majority of the spike trimers (94%) are in the tightly closed ground prefusion state (S-closed), and only a minority (6%) are in the intrinsically transient open state with one RBD up representing the fusion-prone state (S-open), forming a dynamic balance between the two states under equilibrium conditions. The ACE2 would trap the RBD and then overcome the energy barrier, break the balance, and shift the conformational landscape toward the open state. Once the ACE2 traps the up RBD, the associated ACE2-RBD exhibits combined continuous swing motions on the topmost surface of the S trimer. After ACE2 binding, there are 26.2% in S-open state, 73.8% in S-ACE2 state (an open state when the spike protein binds to the ACE2 receptor, S-complex state), and 0% in S-closed state. After the spike protein binds to the receptor, TM protease serine 2 (TMPRSS2) [10], a type 2 TM serine protease located on the host cell membrane, promotes virus entry into the cell by activating the spike protein. In order to fulfill the function of the spike protein, the spike protein of SARS-CoV-2 first binds to the receptor ACE2 through the RBD and then is proteolytically activated by human proteases. Therefore, blocking the interaction between RBD and ACE2 plays a vital role in inhibiting the infection of pathogenic SARS-CoV-2 in the host cells.

Fig. 1.

The structure and the activation process of SARS-CoV-2 spike protein. (A) The structure of SARS-CoV-2 spike protein. The dashed line indicates the rotation symmetry axis of the spike trimers. (B) The activation process of SARS-CoV-2 spike protein. Three conformations of the spike protein of SARS-CoV-2 (trimers in forest green, glass green, and lavender pink), and the position of the ACE2 receptor (yellow). The shallow yellow one represents the initial position of ACE2 when binding to the spike protein. The binding poses of ACE2 to the spike protein are obtained by using a protein-protein docking method, HDOCK [2]. The red dot indicates the center of the ACE2. The distance between ACE2 and the spike is defined as the distance between the center of the distant ACE2 (yellow) and the center of ACE2 in the initial position (shallow yellow).

Nanobodies are single-domain antibodies derived from camelids and sharks. They show a large sequence identity with the human VH gene family III [11]. Nanobodies have favorable biomedical properties, including high thermostability, high solubility, and deep tissue penetration because of their small size (~15 kDa). Thus, nanobodies are popular for many biotechnology and medical applications [12, 13, 14, 15, 16, 17]. Recently, the potency of nanobodies against SARS-CoV-2 infection has been demonstrated in cell-based assays [18, 19, 20, 21, 22, 23, 24, 25, 26, 27] and most recently in animal studies [28, 29]. The high preclinical efficacy of an ultrapotent nanobody construct (PiN-21) has been demonstrated to prevent viral pneumonia at a very low dose (0.2 mg/kg) [28]. Zhou et al. [27] employed nanobody maturation technology to develop several nanobodies targeting SARSCoV-2 spike protein. Their crystal structures showed that the nanobodies successfully block the interaction between RBD and ACE2 [27]. Thus, stable and potent nanobodies that target the RBD of SARS-CoV-2 are promising therapeutics to help mitigate the evolving pandemic.

In recent years, numerous efforts have been made in the development of vaccines, but no completely efficient treatment has yet been found. Therefore, in this study, we report a computational approach to optimize and design potent nanobodies that broadly target all SARS-CoV-2 variants. First, we employed the coarse-grained (CG) model to simulate the activation process of SARS-CoV-2 spike protein. Our results give an adequate explanation of the activation mechanism of the spike protein based on energy. Next, fourteen different nanobodies were used to identify their binding modes to the viral spike protein, and we identified the critical residues for their binding. Then, we selected the 7KSG nanobody as a template for further optimization. We performed saturated mutagenesis of these key residue sites and employed a CG model to calculate the binding energies. We found that mutations V27D, L29E, Y32E, S49D, C50K, S53D, R57E, A97D, T102E, Y104E, S105E, N107K, H109K, Y110E, C112D, S113K, M116D, and Y118D in complementarity determining regions (CDRs) of the nanobody can strengthen the binding between the spike protein and the nanobody. Based on the results of single mutants, we employed multiple mutations on nanobodies and designed four potent nanobodies exhibiting a higher binding affinity than the original one.

2. Materials and Methods

2.1 Modelling the S Trimers

In this work, we used Modeller [30, 31] to perform homology modeling in constructing the binding complexes of ACE2-SARS-CoV-2 and nanobody-SARS-CoV-2. First, the structures of the three key states of S trimers were extracted from the Cryo-EM structure (Protein Data Bank identification (PDB ID): 7DF3, 7DK3, and 7DF4) resolved by Xu et al. [9]. The experiment structures underwent a repair process that includes completing the missing residues, removing extra ligands, and trimming all structures to the same length. After repairing, a targeted molecular simulation (TMD) method was conducted to construct the conformational pathways between different structures and sample a series of intermediate conformations representing the transition process. At each TMD step, the initial structure was aligned to the target structure using the backbone heavy atoms, then a force was applied to the initial structure to move them toward the corresponding atom of the target structure. The system was restrained throughout the simulation to prevent abnormal translation and rotation. For these structures, the solvent was treated implicitly. Extensive MD relaxation by Molaris-XG software (version 9.15, Los Angeles, CA, USA) [32, 33] was carried out until reached the convergence. This software is developed by Dr. Warshel and his team at the University of Southern California.

There is no conformational transition of ACE2, only position and pose changes, so the intermediate conformations of ACE2 are obtained by an angle-distance interpolation method. The method first calculated 4 parameters between the initial structure and the target structure in the Cartesian coordinate system: the center-of-mass distance, and the differences of their Euler angles. Then, these four parameters are divided equally depending on the number of intermediates to be obtained. The intermediate structure is obtained by moving/rotating the initial structure according to the values of the four parameters after divination.

2.2 Coarse-Grained (CG) Model and the Total Energy Calculation

The coarse-grained model was employed to calculate the free energy of each structure and the relevant binding energies. The CG model we employed was developed by Arieh Warshel not only gives a reliable description for protein stability and functions, but also considers the importance of electrostatic effects of proteins [34, 35]. In CG model, the side chain is reduced to a simplified united atom and the backbone atoms of each residue are treated explicitly. The total CG energy is defined as follows:

(1) $\displaystyle\Delta G_{\text{fold }}^{CG}=\Delta G_{\text{side }}^{CG}+\Delta G% _{\text{main }}^{CG}+\Delta G_{\text{main-side }}^{CG}$
$\displaystyle=c_{1}\Delta G_{\text{side }}^{vdw}+c_{2}\Delta G_{\text{solv }}^% {CG}+c_{3}\Delta G_{HB}^{CG}+\Delta G_{\text{side }}^{\text{elec }}$ $\displaystyle\quad+\Delta G_{\text{side }}^{\text{polar }}+\Delta G_{\text{% side }}^{\text{hyd }}+\Delta G_{\text{main-side }}^{elec}+\Delta G_{\text{main% -side }}^{vdw}$

Here the terms are the side chain van der Waals energy, main chain solvation energy, main chain hydrogen bond energy, side chain electrostatic energy, side chain polar energy, side chain hydrophobic energy, main chain/side chain electrostatic energy, and main chain/side chain van der Waals energy, respectively. The scaling coefficients c1, c2, and c3 are 0.10, 0.25, and 0.15, respectively, in this work [34, 36].

To evaluate the CG energy, we first calculated the reliable charges for the protein ionized groups using the Monte Carlo Proton Transfer algorithm (MCPT) [37, 38]. This method allows a proton transfer between pair of ionizable residues or within an ionizable residue and bulk solvent. The transferring is repeated until the electrostatic interaction of the folded protein converges, then the ionization states of the protein residues are obtained to evaluate the CG free energy.

2.3 The Binding Free Energy Change Calculation

The binding free energies for the nanobody-spike protein are defined as follows.

(2) $\Delta G_{\text{binding }}=\Delta G_{\text{nanobody--spike }}-\Delta G_{\text{% nanobody }}-\Delta G_{\text{spike }}$

For the mutated binding free energy change of the nanobody-spike protein:

(3) $\displaystyle\Delta\Delta G_{\text{binding }}=\Delta G_{\text{binding }_{\text% {mutant }}}-\Delta G_{\text{binding }_{WT}}$
$\displaystyle=\left(\Delta G_{\text{nanobody--spike }_{\text{mutant }}}-\Delta G% _{\text{nanobody }_{\text{mutant }}}-\Delta G_{ACE}\right)$ $\displaystyle\quad-\left(\Delta G_{\text{nanobody--spike }_{WT}}-\Delta G_{% \text{nanobody }_{WT}}-\Delta G_{ACE}\right)$
$\displaystyle=\left(\Delta G_{\text{nanobody--spike }_{\text{mutant }}}-\Delta G% _{\text{nanobody--spike }_{WT}}\right)$ $\displaystyle\quad+\left(\Delta G_{\text{nanobody }_{WT}}-\Delta G_{\text{% nanobody }_{\text{mutatant }}}\right)$
$\displaystyle=\Delta G_{1}+\Delta G_{2}$

WT: Wild type.

3. Results

3.1 The Activation Process of the SARS-CoV-2 Spike Protein

Based on the structures of spike trimer in the different conformational states, Xu et al. [9] revealed the mechanism of ACE2-induced conformational transitions of S trimer from the ground prefusion state toward the post-fusion state. During the S trimer’s activation, the S-protein’s conformation transitioned from a “closed” to an “open” state, with one of its receptor binding domains up, obtaining the ability to infect host cells. The presence of ACE2 alters the conformational distribution of the S-protein and promotes its activation, however, these findings lack a structural/energetic explanation.

In order to understand the energetic mechanism of spike protein activation, we constructed a series of structural models of the coupling process of ACE2 approaching and conformational changes of S trimers. First, we extracted the structures of the three key states of S trimers from the Cryo-EM structure (PDB ID: 7DF3, 7DK3, and 7DF4) resolved by Xu et al. [9]. After repairing the experimental structures (See methods), we used a targeted molecular simulation (TMD) [39] method to obtain the intermediate structures connecting the three states (Fig. 1B) to form the conformational change trajectory. For the three key states, we found the optimal binding poses of ACE2 to their RBD by using a protein-protein docking method, HDOCK [2]. This method is developed by Huang’s group at the Huazhong University of Science and Technology. Then, we identified the center of the ACE2 in the optimal binding mode as the initial position and pulled ACE2 away along the S trimers’ rotation symmetry axis (the dashed line in Fig. 1A). The Y-axis in Fig. 2B represents the distance between the center of the pulled ACE2 and the center of ACE2 in the initial position (Fig. 1B and Fig. 2). The X-axis represents the conformational change trajectory of the S trimer (Fig. 2A). Since the binding position of ACE2 is different between the three key states, the intermediate conformation of ACE2 was obtained by an angle-distance interpolation method (Fig. 1B, See methods). By this modeling, we simulated the complete process that ACE2 gradually approaches to the S trimer, during the conformation changes of the S trimer from S-closed to S-open and the S-complex.

Fig. 2.

The energetics of the activation of SARS-CoV-2. (A) The energy landscape of the activation of spike protein coupling with the distance of ACE2. The gray region indicates the area that is covered by glycan ligands. The Y-axis represents the distance between the center of the pulled ACE2 and the center of ACE2 in the initial position (see Fig. 1B). The X-axis represents the conformational change trajectory of the S trimer, from S-closed to S-open and then to S-complex. (B) The energy profile of Path 3 and the comparison of the energy barrier during the activation process between the wild-type and omicron variant (inset figure).

We calculated the folding energy of each ACE2-spike complex and obtained the energy landscape (Fig. 2). The color indicates the relative folding free energy, with the point in the upper left corner as the zero point (Fig. 2A). There are three paths depicted as white lines on the energy landscape (Fig. 2A). Path 1 indicates that, when ACE2 is far enough away (100 Å) from the S trimer, the conversion of the spike protein to the activate conformation (S-complex) requires crossing a larger energy barrier. The result is consistent with the conclusions we obtained from our previous calculation that the energy barrier is 25.44 kcal/mol of spike conformational change in the absence of ACE2 [40]. Path 2 demonstrates the energy change resulting from the approach of ACE2 when the S trimer conformation does not change. The lower left corner of the energy landscape indicates that ACE2 has a stable interaction with the spike protein in the S-closed state. In the S-closed state, the surface of the spike is densely packed with glycan ligands [3, 41]. The glycan ligands block the binding pathway of ACE2 near the region (the gray region in Fig. 2A). According to the energy profile, path 3 is the most reasonable pathway which the spike protein will take. As ACE2 approaches, the conformation of spike protein bypasses the high-energy barrier region to reach the S-open state, where ACE2 can bind to the spike protein. The spike protein continues to convert to the activated S-complex state (Supplementary Movie 1). We extracted the energy profile of the path 3 and identified three possible energy barriers (Fig. 2B). These three energy barriers are lower than the barrier in the absence of ACE2 (25.44 kcal/mol), which confirms the induction role of ACE2 in the activation of the spike protein. We also calculated the change of energy barriers when introducing the mutations of the Omicron variant to the spike protein. The result shows that the mutations led to significant decreases in all three energy barriers (Fig. 2B). In addition, we also calculated the change of the energy barrier for other SARS-CoV-2 variants (Supplementary Fig. 1). Compared with the wild-type, almost all the energy barriers are decreased from other SARS-CoV-2 variants. This indicates that the spike protein is more readily activated in the SARS-CoV-2 variants and may explain the high transmission of this variant.

Overall, we calculated a 2D energy landscape, identified a least energy pathway from the S-closed to the S-complex and calculated the changes of the energy barrier of all the SARS-CoV-2 variants. Our results suggest that the distance between the ACE2 and the spike protein is vital for spike protein activation. Compared to the wild type, the new coronavirus variant has a lower energy barrier, which can explain why the new variants show higher transmissibility. Moreover, these new variants have higher viral infectivity, and higher potential for immune evasion. Therefore, it is necessary to design a vaccine against these new variants. Stable and potent nanobodies that target the RBD of SARS-CoV-2 are promising therapeutics to help neutralize the new variants. The RBD of the spike protein is a prime target for therapeutic nanobodies. By blocking the interaction between the ACE2 and the spike protein, nanobodies can inhibit the entry of SARS-CoV-2. Thus, it is important for us to identify key residues of the RBD in the spike protein for nanobody binding.

3.2 Identify Key Residues of the RBD in the Spike Protein for Nanobody Binding

The region of the RBD surface in contact with nanobodies differs between the structures. Therefore, it is necessary to explore the detailed structural information of their epitopes and binding modes to the viral spike protein. To compare the nanobodies-interacting residues on the SARS-CoV-2 RBD, we obtained the spike protein of the SARS-CoV-2 at the “S-complex” state (PDB ID: 7DF4), which contains an “up” conformation of the RBD. Steric effects play an important role in blocking the ACE2 binding to spike by nanobodies. When nanobodies bind to the side sits, the nanobodies cannot generate sufficient steric hindrance to block the interaction between ACE2 and the spike protein [42]. Furthermore, the resolved crystal structures of those nanobodies binding to the side sites only have the structure “up” conformation of the RBD, and not the whole structure of the spike protein. When we superimposed them to the wild type, their binding sites spatially clash with the rest of the structures of the spike protein. Thus, in this study, we only considered the binding sites in the top positions of the RBD. In this study, 14 different nanobodies were used to identify the nanobody binding sites of the spike protein. All of them targeted the RBD by aligning the “up” RBD structure in the spike protein (Fig. 3A). The PDB ID of all 14 nanobodies are 6ZHD, 6ZXN, 7B17, 7B18, 7C8V, 7C8W, 7KGK, 7KSG, 7LX5, 7MDW, 7N9A, 7N9T, 7TPR, and 7VBN respectively [19, 21, 29, 43, 44, 45, 46, 47].

Fig. 3.

The interaction mode and the binding free energy between the spike protein and nanobodies. (A) The binding sites on the spike protein of different nanobodies. (B) The binding free energy of nanobodies. (C) The key residues of spike protein binding to four high-affinity nanobodies. (D) The binding interface of the highest affinity nanobody (PDB ID: 7KSG) and spike protein. The same residues in the interacting interface of the nanobody-spike complex are highlighted by different colors. Green indicates the same residues appear in the 7B17, 7KSG, and 7C8W; purple indicates the same residues present in the 7B17, 7KSG, 7C8W, and 7N9T; pink indicates the identical residues appear in the 7KSG, and 7C8W and 7N9T. Key residues are shown in stick representations.

The results for the binding free energy among these 14 different nanobodies are shown in Fig. 3B. They all show high binding affinity to the wild SARS-CoV-2. We then selected four nanobodies with the highest affinity (PDB ID: 7B17, 7C8W, 7KSG, and 7N9T) to further analyze their binding modes. As shown in Fig. 3C, we counted all the residues of the spike protein at the RBD-nanobodies interfaces to find the key residues that drive the binding process. As expected, many interacting residues are identical between different nanobodies. Five identical residues (Y449, L452, E484, F490, and L492) appeared in all the interfaces of these four high-binding nanobodies. In the intersection of the three high-binding nanobodies, twelve identical residues including G446, L455, F456, G485, F486, Y489, Q493, S494, Y495, G496, Q498, and N501, appear in the interfaces of 7B17, 7KSG, and 7C8W, while only three identical residues (Y351, N450, and T470) are present in the interfaces of 7KSG, 7C8W, and 7N9T. There is no identical residue in the interfaces of 7B17, 7KSG, and 7C8W, or the interfaces of 7KSG, 7C8W, and 7N9T, or the interfaces 7B17, 7C8W, and 7N9T. In total, there are 20 same residues present in at least three different interfaces of high-binding nanobodies. These residues are key sites that contribute to the high nanobody-binding affinity of SARS-CoV-2. Among them, residues G446, Y449, L455, F456, F486, Y489, Q493, G496, Q498, and N501 also are the key sites within the RBD involved in ACE2 binding [48, 49]. Among these 14 different nanobodies, 7KSG has the highest binding affinity ( ${\Delta{}G}_{\text{binding}}$ = –25.07 kcal/mol). The interface of the 7KSG-spike complex also contains all 20 key residues, which are critical sites for nanobody binding (Fig. 3D). Therefore, we chose 7KSG as an initial template for further nanobody optimization. The details of the structure of 7KSG are shown in Supplementary Fig. 2.

3.3 Mutation Impacts on SARS-CoV-2 Nanobodies

To better understand whether there is any connection between these 14 nanobodies, we carried out multiple sequence alignments (MSA) to further understand the similarity and differences among these nanobodies. The MSA was performed using the Kalign web server of EMBL-EBI services [50]. As shown in Fig. 4A, all nanobodies are very similar to each other. The differences among these 14 nanobodies are mainly concentrated in the CDR1, CDR2, and CDR3.

Fig. 4.

Analysis of the mutation effects. (A) The multiple sequence alignment of nanobody. The color code designates the conserved region in nanobodies while the residues in CDRs are shown in the logo. (B) The mode of nanobody binding to spike protein for nanobody optimization. (C) The heatmap of the binding free energy change ( ${\Delta{}\Delta{}G}_{\text{binding}}$ ) due to the amino acid substitution of CDRs. Mutations V27D, L29E, and Y32E in CDR1, mutations S49D, C50K, S53D, and R57E in CDR2, and mutations A97D, T102E, Y104E, S105E, N107K, H109K, Y110E, C112D, S113K, M116D, and Y118D in CDR3, 17 mutations were selected for further optimization (circled by black dashed box).

The interface between the spike protein (orange regions in Fig. 4B) and the nanobody (blue region in Fig. 4B) is a major factor in determining whether the nanobody can bind well to the RBD of the spike protein. We introduced mutations into the nanobody to find positions that can improve the complementarity with higher binding affinity. In saturated mutagenesis of residues in CDRs (G26-I34, S49-T58, and A97-Y118), a total of 779 mutations were considered in nanobodies (PDB ID: 7KSG). The binding free energy changes of the spike protein and mutated nanobodies are shown in Fig. 4C. Overall, most mutations on CDRs lead to mild negative binding free energy changes. Compared with the mutation in CDR1 and CDR2, most mutations in CDR3 lead to the strengthening of the spike protein and nanobody binding (blue squares in Fig. 4C). Mutations A97D, T102E, Y104E, S105E, N107K, H109K, Y110E, C112D, S113K, M116D, and Y118D in CDR3 all give rise to negative binding free energy changes for nanobodies, which strengthen the bindings. Some disruptive mutations, such as D30C, G101N, Y108H, D114A, D115C, and D117A in CDRs, lead to positive binding free energy changes (red squares in Fig. 4C), indicating weakening bindings between the spike protein and nanobodies.

Some residues that are directly in contact with or close to the spike protein, have a larger impact on increasing the binding energy than those residues that do not directly contact the spike protein. For example, residues T102, Y104, and S105 are closer to the spike protein than residues G101, D114, and D115 (Supplementary Fig. 3). After single-site mutation, mutations T102E, Y104K, and S105E, all lead to the strengthening of the spike protein and nanobody binding, while mutations on G101, D114, D115 have little or negative effect on the binding affinity between the spike protein and nanobodies. We found that most residues in CDRs were mutated to negatively charged residues in favor of strengthening the binding affinity. Previous studies have demonstrated that ACE2 has many negatively charged residues, which resulted in a large increase in Coulomb’s force between the spike protein and ACE2 [51, 52]. Therefore, the more negative charges on the epitope of the nanobodies, the higher the attraction with the spike protein. This is consistent with our results. Our calculations confirmed the concept that the binding energy change is a practical approach for predicting mutational effects.

The variants of SARS-CoV-2 with multiple mutations in RBD are the major factor in the development of resistance against vaccines. In order to design potent nanobodies with broad-spectrum activity neutralizing SARS-CoV-2 variants, not only did we consider the single mutation, but also the multiple mutations in nanobodies. We selected those residues that make the ${\Delta{}\Delta{}G}_{\text{binding}}$ is less than –2 kcal/mol after a single mutation. If a residue is mutated to other residues, and the ${\Delta{}\Delta{}G}_{\text{binding}}$ of many mutated nanobodies is less than 2 kcal/mol, then we chose the one with the highest binding affinity. Therefore, mutations V27D, L29E, Y32E in CDR1, mutations S49D, C50K, S53D, and R57E in CDR2, and mutations A97D, T102E, Y104E, S105E, N107K, H109K, Y110E, C112D, S113K, M116D, and Y118D in CDR3 were selected for multiple mutations. Next, we split these mutations into three categories based on their locations in the nanobodies and built four new mutated nanobodies, 7KSG_CDR1 (mutations V27D, L29E, and Y32E in CDR1), 7KSG_CDR2 (mutations S49D, C50K, S53D, and R57E in CDR2), 7KSG_CDR3 (mutations A97D, T102E, Y104E, S105E, N107K, H109K, Y110E, C112D, S113K, M116D, and Y118D in CDR3), and 7KSG_ALL (all 17 mutations in CDRs). As expected, these four nanobodies exhibit higher binding affinity than the original one (Supplementary Table 1). 7KSG_ALL had the highest binding affinity ( ${\Delta{}\Delta{}G}_{\text{binding}}$ = –40.04 kcal/mol). The specific mutation combination can facilitate the optimization of a potent nanobody with a higher binding affinity.

4. Discussion

Nanobodies are composed of the target-binding fragment of monoclonal antibodies. Compared with traditional antibodies, nanobodies have more advantages. For example, they are significantly smaller in size so they are able to access and lodge onto conventionally inaccessible regions on therapeutic targets [53]. Also they exhibit favorable biophysical properties. In addition, nanobodies can be efficiently produced in prokaryotic expression systems at a low cost. Thus, the search for potent nanobody therapies on an industrial-scale is becoming one of the most feasible strategies for combating SARS-CoV-2.

However, in this early-stage trial, the nanobodies optimized in this study are mainly against the wild type of coronavirus. Considering the future variants and escape mutants, we will systematically analyze all the variants of coronavirus through computational approaches in our future studies in an attempt to work: find the key sites of their binding interface, computationally design a single-site saturated mutagenesis library in epitopes of the nanobodies, calculate their binding free energy, and further perform the combinations of mutations to design novel, potent nanobodies with broad-spectrum activity. In this study, we only calculated the conformational free energy of the spike protein in the S-complex state for the nanobody design. The lower the free energy, the more stable the structure. This suggests that the spike protein and antibody are not easy to dissociate. In the future, we will expand from only considering the binding energy of nanobody and spike protein in the S-complex state to considering its energy barrier changes according to the energy landscape. If the energy barrier of the designed nanobody is lower than that in the activation path of the spike protein and ACE2, the designed nanobody may be a potential candidate against SARS-CoV-2. This may be a promising method to design stable and potent nanobodies in the future. The SARS-CoV-2 spike protein is composed of the S1 and S2 subunits. Compared to S1 subunit, the S2 subunit contains more conserved residues [54]. Recent studies [55, 56] have found that some antibodies are designed to bind the conserved fusion peptide region adjacent to the S2 subunit and they can broadly target the SARS-CoV-2 variants. However, because of the steric constraints of the spike density, it is hard for antibodies to access these regions [57]. Therefore, it is important to design nanobodies that can target the conserved and functionally essential sites on coronaviruses. We plan to study these concepts in the future.

Artificial intelligence technologies have been widely applied in the development of antibodies, such as recognizing antigen epitope [58, 59, 60, 61], exploring the sequence space of CDRs [62, 63, 64], optimizing CDR sequences, predicting antibody structure [65], predicting binding modes [66], and predicting the binding affinity of antibodies to antigens [67]. Many attempts in the development of SARS-CoV-2 antibodies have also been reported [68, 69, 70, 71]. Using expanded data sets and new deep learning technologies resulting in the potential for the development of better antibodies [72]. Our work provides a data set for the relationship of residue substitutions of nanobody CDRs with the binding affinity to spike protein. The dataset can be used to develop models predicting the neutralization ability of artificially designed antibodies. The sequence space of CDRs is too large to be fully explored. Therefore, based on the assumption that the effect of multiple point mutations is approximately equal to the cumulative effect of single point mutations, we designed a combination of mutations to obtain antibodies with stronger affinity. By using artificial intelligence methods such as reinforcement learning and/or genetic algorithm, we can try as many mutation combinations as possible at an affordable computational cost. Overall, integrating artificial intelligence technologies expands our abilities to conduct these research studies.

5. Conclusions

In this work, we constructed a series of intermediate structures of the coupling process of ACE2 approaching and S trimers to explore the energy basis of the activation of the spike protein. By utilizing these structures, we have generated the free energy profile for conformational changes and found one possible lower energy pathway. To investigate the key residues in the nanobody-spike interface, we compared 14 nanobody-bound structures and analyzed the binding modes of 4 nanobodies with the highest binding affinity. We found that there are 20 conserved residues (Y449, L452, E484, F490, L492, G446, L455, F456, G485, F486, Y489, Q493, S494, Y495, G496, Q498, N501, Y351, N450, and T470) appearing at the interface of three or four nanobodies. Some of these residues (G446, Y449, L455, F456, F486, Y489, Q493, G496, Q498, and N501) are also the key sites within the RBD involved in the interface of the ACE2-spike complex. Next, we selected the one with the best binding affinity among the 14 nanobodies as a preliminary structure to optimize and design novel nanobodies. We introduced a single-site saturated mutagenesis library of CDR position to explore the effect of various mutations on binding affinity. After calculating the binding free energy changes followed by mutations, we found that most residues in CDRs were mutated to negatively charged residues in favor of strengthening the binding affinity. Based on the results of the single-site mutation, we employed a combination of mutations on CDRs and designed four novel nanobodies. The optimized nanobodies all exhibit higher binding affinity than the original ones.

In conclusion, studying the mechanism of the activation process gives us a more comprehensive understanding of the coronavirus infection and immune evasion. Identifying the key residues in the interface between the nanobody and the spike protein can provide useful information for understanding the binding mechanism of the nanobody-spike complex. Our results suggest that this approach can be a promising method to develop nanobodies with high binding affinity and broad-spectrum activity to neutralize SARS-CoV-2 variants.

Availability of Data and Materials

The datasets used during the current study are available from the corresponding author on reasonable request.

Author Contributions

XZ and KA designed the research study. XZ, KA, JY, and PX performed the research. XZ and KA analyzed the data and drafted the manuscript. CB provided help and advice on conception, acquisition of data and supervision. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate

Not applicable.

Acknowledgment

Not applicable.

Funding

This research was funded by the National Natural Science Foundation of Youth Fund Project (grant no. 22103066), the 2021 Basic Research General Project of Shenzhen, China (grant no. 20210316202830001) and Warshel Institute for Computational Biology at the Chinese University of Hong Kong, Shenzhen (grant no. C10120180043).

Conflict of Interest

CB is the founder of Chenzhu Biotechnology Co., Ltd.. CB took participated in this research. All authors declare that they have no conflict of interest.

Supplementary Material

Supplementary material.zip

References

[1]

Wang C, Horby PW, Hayden FG, Gao GF. A novel coronavirus outbreak of global health concern. Lancet. 2020; 395: 470–473.

| Google Scholar | PubMed | Crossref

[2]

Yan Y, Tao H, He J, Huang S. The HDOCK server for integrated protein-protein docking. Nature Protocols. 2020; 15: 1829–1852.