† These authors contributed equally.
As a tool for modifying the genome, gene editing technology has developed rapidly in recent years, especially in the past two years. With the emergence of new gene editing technologies, such as transposon editing tools, numerous advancements have been made including precise editing of the genome, double base editing, and pilot editing. This report focuses on the development of gene editing tools in recent years, elaborates the progress made in classic editing tools, base editor and other new editing tools, and provides insights into challenges and opportunities.
Gene editing technology, also known as genome editing, is used to engineer specific modification(s) within target genes. Double-strand breaks (DSBs) are produced at specific location(s) in the genome using specially engineered nucleases. Once the DSBs are introduced in the genome, cells principally repair these lesions using either non-homologous end-joining (NHEJ) or homologous recombination (HR) repair mechanisms [1, 2, 3, 4, 5, 6]. Prior to the introduction of gene editing technologies, natural, physical, or chemical mutagenesis and random insertion of transgenic DNA were the principal approaches used to generate mutations within targeted cells. However, by their nature, none of these approaches could achieve gene editing at specific desired loci, were disadvantaged by random mutagenic events, low efficiency, and were time-consuming, laborious, and costly procedures for most laboratories. The recent development and application of gene editing technology , which shows great potential in agricultural breeding and crop improvement [8, 9, 10, 11, 12, 13, 14] as well as gene therapy for human disease [15, 16, 17, 18, 19, 20, 21, 22, 23], has ushered in a new era for global life science research. For example, rice quality was improved by regulating the amylose content through engineering of the Waxy (Wx) gene . Further, tumor cell apoptosis was induced by specifically targeting the PLK1 gene in mice with glioblastoma multiforme [25, 26, 27], and gene editing approaches were used to screen for new potential therapeutic targets for SARS-CoV-2 . In October 2020, Doudna and Chapentier won the Nobel Prize for chemistry for their contributions to Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-based gene editing, indicating that gene editing technology is developing rapidly and is likely to play a significant role in the future . In this manuscript, we review the traditional gene editing tools as well as introduce the recent developments of new gene editing tools. See Fig. 1 (Ref. [1, 29]) for details.
Traditional gene editing tools principally include Zinc Finger Nucleases (ZFNs) [30, 31, 32], Transcriptional Activator Effector Nucleases (TALENs) , and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) . ZFN and TALEN are first- and second-generation technologies, both of which use nucleases that contain both DNA recognition binding domains and DNA cleavage domains [35, 36]. In these technologies, the DNA-binding domain is specifically engineered to target specific DNA sequences that are cleaved by the DNA nuclease domain. However, the wide applications of these first- and second-generation gene editing technologies suffer from low target recognition rate, high cost, high off-target probability, and complex structure.
These shortcomings motivated the development of the third-generation gene editing technology, the CRISPR-Cas system, which is derived from an adaptive immune system in bacteria. In the laboratory, CRISPR gene editing relies on two key components: a CRISPR-associated (Cas) protein, and a trans-active single guide RNA (sgRNA). Cas9 is a nuclease that binds to sgRNA and this both activates and targets Cas9 to a specified genomic locus (termed the Protospacer Adjacent Motif or PAM site) through a 20 bp nucleotide sequence present within the sgRNA [37, 38, 39, 40, 41, 42, 43, 44, 45]. Cas9 subsequently catalyzes a DSB close to the PAM site and low-fidelity DSB repair by NHEJ will form a small insertion/deletion (indel) at the digested site thus placing a mutation within the targeted locus. In addition to Cas9, other Cas proteins with divergent PAM recognition motifs have been indentified and characterized (see Table 1, Ref. [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]).
|Name||Type||PAM recognition sites||Year||Developer|
|SpCas9||Cas9||NGG||2013||Zhang et al. |
|xCas9||Cas9||NGN||2016||Liu et al. |
|SpCas9-NG||Cas9||NGN/NANG||2018||Nishimasu et al. |
|SPG||Cas9||NNN||2020||Kleinstiver et al. |
|SPRY||Cas9||NNN||2020||Kleinstiver et al. |
|LbCpf1||Cas12||TTTN||2018||Qi et al. |
|FnCpf1||Cas12||NTTN||2018||Qi et al. |
|CasX||Cas12||TTCA||2019||Doudna et al. |
|AaCas12b||Cas12||TTTN/ NTTN||2019||Zhang et al. |
|BhCas12b||Cas12||TTTN/ NTTN||2019||Zhang et al. |
|MAD7||Cas12||TTTN/ CTTN||2018||Inscripta [53, 54]|
|CasΦ||Cas12||TTTN||2020||Doudna et al. |
|Cas13a||Cas13||NNN||2017||Zhang et al. |
Cas9 contains two nuclease domains termed RuvC and HNH which cut the non-targeted and targeted DNA strands, respectively. An enzyme capable of digesting one of the DNA strands (i.e., nicking) is obtained when one of two key amino acid residues within RuvC is converted to alanine (i.e., D10A or H840A). This mutant form of Cas9 is termed nCas9. If RuvC and HNH domains in Cas9 are simultaneously mutated, a nuclease-dead Cas9 that retains the ability to bind sgRNA is obtained (termed dCas9).
Plant gene editing was first reported in 2013, but more recently CRISPR technology has resulted in the production of high yield rice cultivars [57, 58, 59, 60], high oleic acid soybeans [61, 62], and fragrant corn . CRISPR/Cas9 has been used in medical applications such as the development of a CRISPR test strip for COVID-19 [64, 65], and removal HIV sequences from living cells [66, 67]. Various Cas proteins identified and developed in recent years are shown in Table 1 (Ref. [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]), Fig. 2 provides the molecular principles behind CRISPR/Cas9-based gene editing.
CRISPR/Cas9 gene editing.
In 2016, David Liu’s lab developed a single base editor using a specifically-designed Cas9 fusion protein. Termed cytosine base editor (CBE), this technique does not require DSBs for single base conversion, and can greatly improve the efficiency of base editing. Similar to classic CRISPR/Cas9 technology, a sgRNA binds to the fusion protein . The fusion protein is a complex of modified Cas9 protein, a cytosine deaminase, and a uracil glycosylation enzyme inhibitor. The fusion protein consists of dCas9, or a nCas9, which delivers an active cytosine deaminase to the targeted locus through interaction with a specific sgRNA. The cytosine deaminase activity then catalyzes deamination of a targeted cytosine (C) within the genome to uracil (U). Uracil glycosylase inhibitor (also a component in the Cas9 fusion protein) prohibits the U from being removed from the DNA, and following a round of DNA replication, a C to T mutation is placed within the targeted gene [69, 70, 71, 72, 73, 74, 75, 76, 77].
In 2017, David Liu also reported the adenine base editor (ABE), which promotes mutation of adenine (A) to guanine (G) by using an adenine deaminase enzyme [2, 49, 71]. When the Cas9 fusion protein containing the adenine deaminase is targeted to genomic DNA by the sgRNA, the adenine deaminase catalyzes adenine deamination to inosine (I), which is read and replicated as a guanine residue. Thus, the direct substitution of an A-T base pair to G-C base pair is realized following DNA replication [78, 79]. At present, single base editor has been applied to gene editing, gene therapy, generating relevant animal models, and conducting functional gene screening.
In base editing technology, the development of more flexible tools has proven important. For example, codon optimized sequence and base editors with additional core location signals, BE4max and ABEmax can be increased by 1.9 and 1.3–7.9 fold, respectively, in terms of base editing efficiency . The nuclear localization sequence and codon optimization resulted in these variants helped improve the editing efficiencies in mammalian cells and in vivo [81, 82]. In 2020, Xueli Zhang and Changhao Bi of the Chinese Academy of Sciences constructed the cytosine deaminase-nCas9-Ung protein complex, which created a novel glycosylase base editor (GBE). This advance resulted in a single base gene editing system which can induce interconversions between pyrimidines and purines [83, 84]. The principle of base editing is shown in Fig. 3.
Schematic diagram of base editor.
The combination of CBE and ABE can be used to perform four base conversions
Designing strategies for temporal control of base editing activity, or using exogenous small molecules, will enhance their clinical potential as gene editing tools [94, 95, 96, 97]. However, the application of genome editing technologies must take into account the probability of off-target editing. This important caveat was thoroughly and systematically evaluated by Caixia Gao’s team in April of 2021 [98, 99]. The team first examined the tolerance of lead-editing in plant cells and concluded that the frequency of editing was influenced by the number and location of mismatches between primer-binding sites (PBS) and prime editing guide RNAs (pegRNAs). The team also evaluated the activity of 12 pegRNAs at 179 loci and scored off-target editing sites by whole genome sequencing of 29 rice plants edited using prime editing technology. The team found that lead editing did not result in single nucleotide variation (SNV) or small indels. Rice mutants with single-base, multi-base, and precise deletion were successfully obtained with an efficiency of up to 21.8%, which could not be achieved by the existing gene editing system.
The single base editor can only catalyze the conversion of a single type of
base, limiting its wide application. Therefore, development of a new base editor
technology, that can efficiently produce two different base mutations at the same
time, would greatly enrich base editing tools especially in areas of gene
therapy. About 58% of genetic diseases in humans are caused by base mutations.
When only the conversion of a single type of base can be catalyzed, it is
difficult to treat genetic diseases caused by two or more base mutations. In June
2020, Dali Li’s group fused activation-induced human cytosine deaminase (termed
hAID), adenine deaminase, and nCas9 to develop a new type of dual-function,
high-activity base editor termed A&C-BEmax. A&C-BEmax can efficiently convert C
At the same time, Nozomu Yachie’s laboratory at the University of Tokyo developed a new animal double base editor model where the cytosine deaminase PMCDA1 from lamprey and the cytosine deaminase Rapobec1 from mouse were fused to the C and N-termini of nCas9, respectively. This engineering step resulted in three new double base editors (Target-ACE/Target-ACEmax/ACBEmax). Using this system, the investigators achieved simultaneous A/C mutations in mammalian cells with average editing efficiencies of up to 50% C and 40% A . In addition, Keith Joung’s lab developed a double base editing tool using a similar strategy. Specifically, they used a shorter adenosine deaminase fusion at the N-terminus, and PMCDA1 fused to the C-terminus of nCas9. The resultant co-mutation efficiency of A and C at 18 of 25 targets was higher than 15% on average [100, 102].
Caixia Gao and Jiayang Li of the Institute of Genetics of the Chinese Academy of
Sciences fused both the cytosine deaminase APOBEC3A and the adenine deaminase
ABE7.10 at the N-terminus of nCas9. In this way, four new types of
saturated-targeted endogenous gene mutation editors (STEME), termed STEME-1 to
STEME-4, were constructed. These editors have the distinct advantage of inducing
simultaneous mutations at C
Gene editing tools have typically functioned by breaking double or single DNA strands. Developing a method for accurate editing by which an engineered sequence can be directly introduced and inserted into the cell genome without causing cell disruption has long been desired. Therefore, researchers sought to harness the phenomenon of transposition (e.g., gene jumping) to place the insertion of desired DNA sequence at a targeted locus without cell disruption.
In June 2019, after discovering a unique transposable element in Vibrio cholerae [109, 110], a Columbia University team built a gene editing tool termed INTEGRATE (Insertion of transposable elements by guide RNA-assisted targeting). In this technology, large gene segments can be inserted into the genome without introducing DNA breaks . Typical gene editing tools rely on DNA scission which commonly leads to errors being placed into the DNA at the site of strand break repair. In addition, DNA breaks trigger DNA damage responses that may result in other adverse cellular reactions. Sternberg found that transposons can be integrated into specific sites in the bacterial genome without the necessity to digest DNA. Importantly, the sites where integrases insert DNA are controlled entirely by their related CRISPR system. In the system Sternberg created, a gene-editing tool is capable of inserting any DNA sequence into any location within the bacterial genome. Sequencing the edited bacteria confirms that INTEGRATE achieves precise insertion, with no extra copies in non-target locations. Similar to CRISPR, integrase directs RNA to the targeted loci [112, 113].
Sternberg used low-temperature electron microscopy to model and refine the transposon protein TniQ. The TniQ protein binds to the Cascade complex as end-to-end dimers, to an interface near the 3’end of a CRISPR RNA (crRNA), Cas6 and Cas7. The natural Cas8 something Cas5 fusion protein binds to the 5’crRNA end and comes into contact with the TniQ dimer via a variable insertion domain [106, 114]. The target DNA binding structure reveals that the key interactions are necessary for PAM recognition and R ring formation. This work not only laid the foundation for understanding, structurally, how the DNA targeting of TniQ something Cascade can subsequently recruit downstream transposase proteins, but also provides guidance on protein design for programmable DNA insertion in genome engineering applications in the future .
In June 2019, Feng Zhang’s team obtained a transposase from the cyanobacterium Scytonema hofmanni whose three subunits are associated with a CRISPR effector protein, Cas12k [106, 116]. The system was termed CAST, or CRISPR-associated transposase, where Cas12k is used to search for specific sequence sites in the genome. Cas12k has no endonuclease activity and only binds DNA. Instead, Transposase insert gene fragments directly into the target sites. The lack of homologous recombination endows this process with an advantage of security. Feng Zhang’s CAST system coupled nCas9 to the single-stranded DNA transposon TNPA, and then detected the protein complex within the E. coli genome to promote site-specific integration of foreign DNA. This suggests that gene knock-in can be achieved using transposons; however, there are still difficulties in the preparation and in vivo delivery of single-strand DNA templates.
The team subsequently investigated the effects of different CAST genes and different lengths of tracrRNA on the activity of the CAST system. Four CAST genes were found to be essential for foreign gene integration, and 216 bp of tracrRNA was sufficient to produce foreign gene integration. Additionally, the team confirmed that this integration occurred only once, thus avoiding multiple gene insertions. Although not yet studied outside of bacteria, CAST still holds promise for a next-generation gene editing system with enhanced efficiency [116, 117, 118, 119].
In addition to CRISPR/Cas9 system, researchers also hope to creatively use some new enzymes with editing or sequence recognition activity to create new gene editing tools for researchers. Also, classic editing technologies charge high patent license fees when used in commercial applications. How to expand the range of gene editing tools and avoid high licensing fees has become one of the hot topics in the field.
FEN1 (Flap Endonuclease-1) is an endonuclease that recognizes a 3’ flap structure. The researchers combined FEN1 with the cleavage domain of the Fok I endonuclease (FN1) to create a structure-directed DNA editing tool termed SGN . After identifying the 3’ flap structure formed by the target sequence and the guide DNA (gDNA), the target sequence was cut by FN1 dimers. Studies have shown that a pair of gDNA can guide SGN to correctly cleave reporter and endogenous genes in the genome of zebrafish embryos. However, the results also indicate that the SGN system is inefficient, and researchers were unable to calculate the off-target probability from CYP26B1 and ZNF703 gene targeting experiments. See Fig. 4 for the action principle of SGN.
Schematic diagram of Structure-Guided Endonuclease.
Inspired by biodiversity, the US company Incripta has developed a new class of CRISPR endonuclease, known as MADzymes . Compared to Cas9, MADzymes have a lower off-target probability, have different PAM recognition sites, and a smaller volume. Most importantly, the MAD7 nuclease is the first MAD enzyme which can promote the wide use of the CRISPR tool in both academic and commercial settings. Inscripta announced future collaborations with researchers around the world to develop and use the enzyme free of charge. MAD7 nuclease is a 147.9 kDa polypeptide that showed a preference for TTTN and CTTN PAM sites. The researchers identified some structural similarities between the Mad7 nucleases and the Cpf1 family through homology modeling, with a similarity of about 31%. The use of Swiss-MODEL software (Biozentrum, University of Basel, Switzerland) for homologous modeling analysis is shown in Fig. 5. MAD7 nuclease has been proved to have editing function in mammalian cell and microbial systems [53, 54].
Structure of MAD7 homologous modeling protein.
In July 2020, Doudna reported the discovery of a new ultra-compact CRISPR-Cas
system termed CRISPR-Cas
The compact system has been demonstrated to be active in vitro, as well as in
human and plant cells, and its target recognition ability has been extended
compared with other CRISPR-Cas proteins. One of the pivotal advantages of the Cas
Researchers have long worked to optimize Cas9 by improving its compatibility with different PAM sequences with the hope to one day no longer require PAM sequences, said Benjamin P. Kleinstiver of Harvard Medical School in March 2020. To this end, Kleinstiver et al.  genetically engineered two Cas9 variants that bind and cleave DNA without the need for a specific PAM, and termed these enzymes SpG and SpRY.
Through editing tests in human cells, SpG has been found more efficient at recognizing NGN than SpCas9 and other variants. Further experiments showed that the editing activity of SpG to NGG was 51.2% and that of the other three groups was 53.7%. SpRY can recognize almost all PAM sequences, and the experimental data show that the off-target probability of SpG and SpRY is close to that displayed by Cas9. Cas9 mutant SpRY developed in this study is currently the most compatible Cas9 mutant to PAM sequence  almost completely removing the restriction of PAM sequence and greatly improving its editing ability in genome. The resulting single-base editing system extends precision editing almost to the whole genome [49, 128, 129].
Gene editing technology is constantly evolving, ranging from its application in microorganisms into animals and plants. Gene editing technology extends to human health which includes, especially in recent years, synthetic biology and other cutting-edge technologies to achieve multidisciplinary integration . This technology shows great potential in creating new agronomic traits and enhancing drug production by changing metabolic pathways in organisms and building new metabolic networks, such as high oleic soybean oil , high vitamin C containing lettuce, etc. Gene editing technology also promotes the development of agricultural breeding, and its role in improving crop stress resistance and disease resistance can not be overemphasized. On the other hand, gene editing technology has contributed positively to human gene therapy which includes correcting the thalassemia mutation through CRISPR/Cas9 gene editing, and using edited CAR-T cells to potentially increase the body’s resistance to blood cancers. Nevertheless, there still are several hurdles to be overcome in gene editing technology.
For biological application and clinical treatment, the issue of off-target editing will lead to unpredictable risks. In classic CRISPR/Cas9 editing, sgRNAs guide the Cas9 enzyme to cut at a corresponding sequence. Generally, the sgRNA recognition sequence is about 20 bp, but Cas9 allows mismatch cutting within a certain fault tolerance rate, such as the formation of protrusions outside the target sequence. Alternatively, high GC content in sgRNA design will lead to mismatches and off-target editing. Meanwhile, studies show that delivery of Cas9 protein and sgRNA with plasmids will result in over-expression and this will greatly increase the risk of off-target editing. In order to address the off-target problem, computers are used to predict the off-target rates of sgRNAs, reduce GC content, and shorten the sequence length of used sgRNA by modifying related Cas9 protein. Therefore, the risk of off-target editing can be mitigated. However, the tools are limited by PAM sequence requirements and NHEJ repair in vivo greatly increases the probability of off-target editing. Therefore, using HR-based repair in vivo to repair DSB is also an effective mean to improve the accuracy of gene editing.
Currently, transposon gene editing tools such as INTEGRATE and CAST have received significant attention since these approaches can not only realize fixed-point editing in microorganisms, but also reduce the off-target editing. Accurate editing can be achieved without double-strand breaks of genes, which represent a leap-forward in gene editing tools and provides new ideas for clinical application and gene therapy. However, at present, accurate editing is only realized in bacteria. In order to extend its application to animals and plants, the related sequences need to be modified by codon optimization. The challenges of the fixed-point knock-in, the preparation of single-stranded DNA template, and the delivery in vivo require solving. At the same time, we also hope to design and develop editing tools by using transposons in animals and plants, so as to realize related applications as soon as possible.
PAM sequence is a necessary condition for cleavage of CRISPR/Cas9 since Cas protein relies on PAM sequence for rapid recognition. When researchers do not consider PAM sequence, Cas protein will not be cleaved even if sgRNA is completely matched with target sequence. Therefore, PAM sequence is of great importance in CRISPR system. The stricter PAM sequence is, the lower the risk of off-target editing tools will be. The number of target sequences that can be designed will be greatly reduced, so it is particularly important to broaden PAM sequences, find new Cas proteins, and design new Cas protein variants or manufacture new tools.
At present, the widely used CRISPR/Cas system mainly delivers the DNA sequences encoding the essential gene editing components through viruses or plasmids. For example, Adeno-associated virus (AAV), which has been widely used in animal experiments as a powerful tool to deliver gene therapy as it will not cause any human disease. For gene editing in vivo, transgenes will be introduced into organisms; however, delivery efficiency is reduced when the virus or bacterial plasmid is too large. To tackle with this problem, researchers use physical microinjection or electrotransfer for gene delivery. At present, only the electrotransmission method has been applied, but this method requires specific equipment and is expensive, which has become a major obstacle to its application. Evolving nanotechnologies are expected to address this issue. For example, graphene derivatives, DNA nanocapsules, and gold nanoparticles have become the materials of choice for researchers due to their advantages of small size, low price, and wide application scenarios. Such advances are expected to solve the problem of editing tool delivery in the foreseeable future.
Advancements in the Cas9 RNP system have also attracted the attention of researchers because it is fast, safe, avoids the problem of plasmid integration, and can be applied to various model organisms and cell types. This DNA-free genome editing method involves assembling Cas9 protein and sgRNA into a ribonucleoprotein complex (RNP) in vitro, and then transferring the RNP into cells by particle bombardment. This approach has the advantages of accuracy, specificity, simplicity, and low cost, and has great application potential.
At present, CRISPR gene editing technology is becoming more mature. However, top research teams hold a large number of technology development copyrights. To develop and apply related technologies, an investigator is required to pay high copyright fees, which can place an undue economic burden on many researchers and research institutes. This is likely to limit the development and application of editing tools. In recent years, China and the United States have become the most competitive players in gene editing technology research and development. Emerging companies, however, led by Inscripta in the United States, choose to release, to some degree, the copyright of key tools such as MAD7 as this enables more researchers to participate in the investigation of gene editing, accelerates advancements in editing tool development, and expands potential applications of their technology.
Gene editing technology offers a significance advance in agriculture and gene therapy since it is a safer alternative to traditional transgenic technology. It is important that governments pay more attention to the development of related industries, and gradually pilot the application of related technologies (after the introduction of corresponding legislation), so as to improve the welfare of its people. In March 2018, the United States Department of Agriculture issued a statement on plant breeding innovation. Subsequently, the Australian federal government revised its genetic technology regulations and governments in various countries are gradually liberalizing their policies on gene editing crops. At the same time, we should also advocate the public to treat the development of gene editing technology scientifically and rationally, mobilize the enthusiasm of more researchers to explore gene editing technology, and build a scientifically healthy technical work force.
SH and YY wrote this manuscript; FS, XH, XJ, YD, PL, FC and DX participated in the writing and modification of this manuscript; YL conceptualized the idea. All authors have read and agreed to the published version of the manuscript.
In this paper, SWISS-MODEL software is used for homologous modeling, and BIORENDER software is used for drawing.
This research was funded by National Natural Science Foundation of China (31100456).
The authors declare no conflict of interest.