An in silico overview on the usefulness of tags and linkers in plant molecular pharming

Plant molecular pharming is a promising concept based on the large-scale production of recombinant proteins encompassing antibodies, vaccines and enzymes for human or veterinary uses and treatments. This new branch of biopharmaceutical industry offers pratcical and safety advantages over other traditional production systems. In higher plants, the complex cellular machinery makes possible synthesis and posttranslational modifications of heterologous protein macromolecules. The limiting obstacle to using this plant system at industrial scale is most often the low yield of the recombinant proteins. To improve this production level, many studies have been focusing on the choice of plant species, tissues, organs and cell suspension cultures or various upstream and downstream constituents in the expression cassettes. Likewise, new engineering technologies in plant molecular pharming have emerged relying on the usefulness of using soybean agglutinin (SBA), hydrophobin, zein and elastin-like peptide tags which are employed to extract and purify recombinant proteins in some host systems and under the control, and as a part, of different expression cassettes. Known to be very useful tools in recombinant proteins linkers separate different domains or units of the heterologous gene and thereby keep the functionality of the protein of interest. Here, we compare computationally one tag SBA as a part of fusion with a pharmaceutical human protein ADA joint directly or by the specific flexible (GGGGS)3 liker. The in silico analysis focuses on the mRNAs stability and fusions of tagged and tagged-linked ADA recombinant proteins.


Characteristics of plant molecular pharming
Genetically engineered plants have emerged as one of most attractive platforms for producing therapeutic recombinant proteins such as antibodies, vaccines, enzymes, and other pharmaceutical entities, such as human alpha-1-antitrypsin or human serum albumin, necessary for the improvement of veterinary and human healthcare systems throughout the world (He et al., 2011;Hefferon, 2012;Hefferon, 2014;Ou et al., 2014;Paul, Teh, Twyman, & Ma, 2013;Zhang et al., 2012).Plants offer scalability, low cost and safety, as they are devoid of mammalian pathogens.Moreover, plant cells have the capacity to correctly assemble and fold complex proteins by providing post-translational modifications indispensable to the functionality and stability of foreign multimeric proteins (Fu et al., 2014;Ma et al., 2013;Stoger, Fischer, Moloney, & Ma, 2014).One drawback to the production of pharmaceutical proteins in plants is the relatively low yield of protein products that are generated.However, maize, wheat, barley and rice are the most important cereals capable of producing recombinant proteins at high levels ranging from 7 to 10% of total soluble protein in seeds (Yang, Wakasa, & Takaiwa, 2008).

202
He and collaborators indicated that the expression levels of human serum albumin (HSA) in transgenic rice seeds is high enough for downstream and commercialized processing (He et al., 2011;Kuo et al., 2013).The rice endosperm is capable of accumulating pharmaceutical proteins such as human alpha-1-antitrypsin, and thus appears to be a promising production platform for high-value plant-made pharmaceuticals (Zhang et al., 2012).

Other approaches used to improve yield production
To increase the level of targeted proteins in the host plant, scientists have been employing other physiological, biochemical and molecular approaches (Matic et al., 2012).At the gene-transfer-system level, some interesting assays have focused on using bacterial plasmids containing expression cassettes, while others exploit viral vector systems to express heterologous genes and their proteins (Hefferon, 2014;Hefferon, 2012).Foreign genes can be expressed transiently using specific vectors and/or vacuum infiltration-centrifugation based on the stable integration of the gene into the chromatin structure (Kingsbury & McDonald, 2014;Shah, Almaghrabi, & Bohlmann, 2013).For example, in a study conducted by Chen and his co-workers on the factors controlling biosynthesis and accumulation of the wild and mutated human immune-regulatory interleukin-10 transformed in Arabidopsis, mutagenized transcripts (at 2762 and at 3262) demonstrated similar nascent mRNA synthesis but improved stability.Likewise, the mutated transcripts from plant lines with a more stable and efficient translational machinery than the wild IL-10 lines (Chen et al., 2013).
Innovative genomic and epigenetic approaches in plant molecular pharming have recently emerged.Some genome re-engineering and epigenetic approaches can have tremendous potential through the suppression or decrease of protease activities.For example, RNA interference is capable of shifting the metabolism of a given host plant and can repress the silencers that inhibit the expression of transiently transformed genes (Arzola, Chen, Rattanaporn, Maclean, & McDonald, 2011;Hakkinen et al., 2013;Rigano, De Guzman, Walmsley, Frusciante, & Barone, 2013;Tremblay, Diao, Huner, Jevnikar, & Ma, 2011).

Re-engineering the expression cassette
The yield of plant-made pharmaceuticals (PMPs) is in general low.To overcome this main limiting factor, many studies have been conducted to improve the production of therapeutic proteins by reengineering the expressed genes and proteins at upstream and downstream processing levels (Makhzoum et al., 2013;Twyman, Schillberg, & Fischer, 2013).Many promoters, codon-optimized ORFs, 5'UTR, 3'UTR and peptides have been tested to enhance the transcription and translation levels and therefore the yield production of recombinant proteins (Buyel, Kaever, Buyel, & Fischer, 2013;Gallie, 2002;Kanoria & Burma, 2012;Laguia-Becher et al., 2010).Promoter studies are very useful for increasing the expression and accumulation of the heterologous pharmaceutical recombinant proteins, such as the strong and ubiquitous 35S CaMV promoter, or for targeting spatiotemporal expression features by using promoters specific to certain cells, tissues or organs (Abdullah, Rahmah, Sinskey, & Rha, 2008;Makhzoum et al., 2013;Makhzoum, Petit-Paly, St Pierre, & Bernards, 2011).
Gene expression can also be enhanced through the 5' UTR regions.In addition to well known translational enhancers, including the viral 5' UTRs Ω and AMV for transgene expression in plant systems (especially in dicots such as tobacco) (Gallie, 2002), 5' UTR sequences from the photosystem I (PHOTO) gene and the geranyl geranyl reductase (GGR) gene in the model plant A. thaliana, have shown a high level of GUS expression in comparison to 5' UTR of Ω and AMV.In contrast, the alcohol dehydrogenase gene (NtADH) 5' UTR revealed similar levels of GUS expression, nonetheless, this was 30 to 100-fold level greater than the control (Agarwal et al., 2014;Satoh, Kato, & Shinmyo, 2004).As another example, a new synthetic (20 nt) 5' UTR enabled GUS and GFP reporter gene expression to reach levels 10 to 50-fold higher than controls in transgenic tobacco and cotton plants.Moreover, its promoting role was confirmed either under the control of the strong 35S promoter or the weak nos promoter (Kanoria & Burma, 2012).In addition to improving promoter strength, codon adaptation, 5'UTR and 3'UTR choice in expression cassettes, and other constituents such as signal peptide tags have been utilized to target accumulation to specific organelles in plant cells.For example, inclusion of the KDEL endoplasmic reticulum retention signal will cause the foreign protein of interest to accumulate in the endoplasmic reticulum, where it is protected against protease activity present in the cytosol (Bundo et al., 2014).
Targeting recombinant proteins to specific cellular compartments is also an important parameter that affects plant-made pharmaceutical expression levels.This Plant Science Today (2014) 1(4): 201-212 ISSN: 2348ISSN: -1900 Horizon e-Publishing Group

203
targeting influences post-translational modifications playing a key role on the biological activity and accumulation of recombinant proteins in plants (Warzecha, 2008).For example the endosperm of rice provides an ideal site offering correct folding and adequate protection for the foreign protein against proteolysis, without any detectable loss of activity (Rademacher et al., 2008).

Fusion tags in plant molecular pharming
Short peptide tags can be fused to constructs harbouring foreign genes in order to facilitate recovery during protein purification.Some short affinity tags (His tag, Stripe tag, Argo tag, and FLAG tag) have been successfully used in bacterial and yeast biosynthesis and purification systems (Agarwal et al., 2014;Zhao, Li, & Liang, 2013).However, the small tags in plant molecular pharming are characterized by their ineffectiveness, lack of scalability and high cost (Joensuu et al., 2010;Waugh, 2005).Some fusion tags have increasingly been gaining more attention as a new, effective way of facilitating the expression, extraction and purification of pharmaceutical proteins, such as hydrophobin (Joensuu et al., 2010;Joensuu, Conley, Linder, & Menassa, 2012), zein domain (Alvarez, Topal, Martin, & Cardineau, 2010), elastin-like polypeptides (Conley, Joensuu, Richman, & Menassa, 2011;Floss et al., 2009) and soybean agglutinin (SBA).The SBA affinity tag is an effective system which can be used for quick and efficient purification of recombinant proteins (Tremblay, et al. 2011).Elastin-like polypeptides (ELP) were employed by many researchers in recent studies to increase the stability and purification yield of human immunodeficiency virus (HIV)-neutralizing antibodies 2G12 (light or heavy chain), transformed and expressed in transgenic tobacco plants (with 1% total soluble proteins (TSP) from leaves and seeds) (Conley, Joensuu, Jevnikar, Menassa, & Brandle, 2009;Floss et al., 2009).Another interesting tag is the hydrophobin HFBI gene from Trichoderma reesei (hydrophobicity changer) which was used as part of a fusion with green fluorescent protein (GFP) (GFP-HFBI) and accumulated into protein bodies in transiently agro-infiltrated tobacco, with a high yield of 51% of TSP and a 91% recovery of GFP-HFBI fusion protein (Joensuu et al., 2010;Joensuu et al., 2012).The Zera® domain, (N-terminal proline-rich domain: γZein ER-accumulating domain) of maize Zein gene is able to form PB bodies.This domain, in addition to other constituents, led to a threefold increase in the Yersinia pestis F1-V antigen fusion protein accumulation when using the fusion construct, in comparison to the construct lacking it, in three separate transient or stable transformation systems in Nicotiana benthamiana, Medicago sativa (alfalfa) and Nicotiana tabacum NT1 cells (Alvarez et al., 2010).To facilitate the removal of these tags after purification, the recognition sequence of the protease TEV (the sequence Glu-Asn-Leu-Tyr-Phe-Gln-(Gly/Ser)(ENLYFQ(G/S)) can be added between the protein of interest and the tag (Kapust & Waugh, 2000;Tropea, Cherry, & Waugh, 2009;Waugh, 2011).

Linkers in plant molecular pharming
Linkers are short peptide sequences that occur between protein domains.An important consideration in recombinant protein production is the use of linker(s) between two fusion genes instead of direct fusion, and is showing promise in maintaining protein functionality and activity while avoiding interference from other connected proteins.On the same expression cassette, linkers are very attractive tools for expressing some proteins, ORFs or peptides together without perturbing the structure of each unit of the fusion or having a negative effect on their activity and stability.Linkers are often composed of flexible residues such as glycine and serine so that the adjacent protein domains are free to move relative to one another.Longer linkers are used when it is necessary to ensure that two adjacent domains do not sterically interfere with one another.Some scientists have agreed that short linkers can have a negative effect on SCFV antibody folding due to spatial occupancy.Conversely, long linkers can influence the functionality and enhance the antigenicity of ScFv antibodies (Guo et al., 2006;Le Gall, Reusch, Little, & Kipriyanov, 2004;J. Zhang, Yun, Shang, Zhang, & Pan, 2009).
Several short peptides have been used toward this goal.For example, the usefulness of the linker EAAAK to design a synthetic gene encoding a combination of the carboxy-terminal fragment of intimin, the middle region of Tir and the carboxy-terminal part of EspA of Enterohemorrhagic Escherichia coli O157:H7 (EHEC) has been demonstrated, reaching 0.2% to 0.33% TSP in transgenic canola seeds, and 0.1% to 0.3% TSP in transgenic tobacco lines (Amani, Mousavi, Rafati, & Salmanian, 2009, 2011).Another example is the linker AG, a simple alanine-glycine linker of six amino acids with three AG repeats (AGAGAG).When added between LT-B and GFP (green fluorescent protein),using oligonucleotide extensions in the PCR (polymerase chain reaction) primers as part of the expression cassette in the vectors pLM03, pLM08, pLM09 with the percentage of TSP of LT-B::GFP being line dependent, and the accumulation went from undetectable to 0.059 extracted in maize kernels (Moeller, Gan, & Wang, 2009).Modified linkers can improve the level of protein accumulation in the case of the FV single chain linked to the CH3 of a human anti-rat transfer in receptor IgG3 heavy chain by the flexible and optimized linker (GGGGS)3, and transformed stably into mammalian cells (Trinh, Gurbaxani, Morrison, & Seyfzadeh, 2004).Linkers have also been previously designed using a genetic algorithm, taking into consideration their lengths, flexibilities and compositions (Zhang et al., 2009)  In silico and computational analysis and optimization In molecular pharming as well as many other fields in applied molecular biology, gene design and optimization is a very useful strategy to enhance protein production yield.In order to estimate the quantity, quality and function of recombinant proteins, it is necessary to predict gene structure optimization, mRNA and protein structure and function.For these purposes, gene and protein design plays an important role in the quality and activity of production yield.Indeed, by using computational tools we can study gene structure, mRNA and the activities of proteins, as well as predict the yield production of the proteins.
Software such as Visual Gene Developer may help to optimize the study of pharmaceutical recombinant genes and proteins by analyzing synthetic genes and testing new algorithms in bioinformatics (Jung & McDonald, 2011).The software can be found and downloaded from http://www.visualgenedeveloper.net.
In silico analysis is a very good tool for the design of new drugs and for studying existing vaccines and antibodies, as Plant Science Today (2014) 1( 4 it is necessary to optimize drug design by the simulation of spatial structures and molecules (Jiang & Zhou, 2005;Kamphausen et al., 2002) The design of suitable linkers to separate domains of biofunctional proteins is an indispensable tool in protein engineering (Arai, Ueda, Kitayama, Kamiya, & Nagamune, 2001).A few scientists have suggested and created a web server and software for this purpose, such as ''Linker'', a program to generate linkers for protein fusions (Crasto & Feng, 2000).Additionally, a more recent web server, also called LINKER, serves to generate peptide sequences with extended conformation.Table 1 summarizes some potentially useful linkers employed in recombinant protein studies by many researchers.These linkers can be studied for their positive roles in pharmaceutical plant recombinant protein assays and for in silico optimization of pharmaceutical protein fusions.
Based on fusions with or without using linkers, it will be interesting to study and show that modified gene and protein structures are as stable and functional as their natural homologs.As an example, we demonstrate this feature using an interesting human enzyme of pharmaceutical interest such as human adenosine deaminase (ADA).Deficiency in ADA (41 kDa), a crucial Plant Science Today (2014) 1( 4 enzyme of the purine salvage pathway, causes a genetically inherited disorder of the immune system that can be treated by producing functional ADA treatment.This gene was empirically expressed and produced in some assays such as transgenic tobacco plants and transgenic tobacco BY-2 cell suspensions with specific activities of (0.001 and 0.003 units per mg (TSP) of leaves) and approximately 16 mg/L in transgenic tobacco BY-2 cell suspensions (Singhabahu, George, & Bringloe, 2013, 2014).
Therefore, as an interesting case study, we attempt to compare in silico modeling and design of ADA recombinant protein in plant molecular pharming with an SBA tag as a fusion with or without a specific linker.The goal of this paper is to show the useful role of linkers and tags in plant molecular pharming of recombinant proteins and gene fusions based on gene structure, mRNA and protein in comparison to proteins which lack tags and linkers.We compare computationally and predict some features leading to pharmaceutical protein yield production such as transcription (gene codon and structure optimization, mRNA frequency and stability) and translation (protein structure and function, stability and instability and post-translation).

Results and discussion
The proteins obtained after translation with maximum Plant Science Today (2014) 1(4): 201-212 209 amino acid sequences were selected.The functional characterization of both proteins is shown in Table 2.The hydrophilic nature of protein is associated with a low GRAVY value.The GRAVY value for a peptide or protein can be calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence.The proteins which have large negative values indicated that these proteins had relatively more hydropathicity as compared to proteins those possess less negative values (Roy, Maheshwari, Chauhan, Sen, & Sharma, 2011).The instability index of both proteins was calculated less than 40 which classified them as stable molecules (Roy et al., 2011) but the protein without a linker is slightly more stable as compared to the protein with a linker.Similarly, pI values of both proteins were determined to be less than 7 indicating an acidic nature.The aliphatic index for SBA-ADA is slightly higher than SBA-linker-ADA, which suggested that the SBA-ADA has a higher thermal stability (Gasteiger et al., 2005;Wilkins et al., 1999).The secondary structure analysis of SBA-ADA and SBA-linker-ADA describes almost identical results.Both proteins have a high number of alpha helices, which is as expected because alpha helices are the most common structural motifs.The mRNA structure of SBA-linker-ADA protein shows a lower energy value of -564.10 kcal/mol, which is an attribute of structural stability, whereas the value for SBA-ADA is -536.10kcal/mol.The best predicted 3D structures for SBA-ADA and SBA-linker-ADA proteins are shown in Fig. 1.The structural stability for these proteins is validated through different servers.The PROCHECK analysis revealed that SBA-linker-ADA protein has a more stable structure than SBA-ADA (Fig. 2).The stereochemical properties, including Ramachandran plots, G-factor, and the number of bad contacts are illustrated in Table 3.The protein with the linker has more residues in allowed regions and fewer residues in disallowed regions, and similarly has a better G-factor value.The statistical analysis of non-bonded interactions between different atom types was performed using ERRAT 2.0, in a graphical form in Fig. 3.The ERRAT score for SBA-linker-ADA is 86.304 which indicates a better structure.The solvent accessibility for residues of both proteins indicates that SBA-linker-ADA has more accessible residues to solvent (Fig. 4).

Conclusions and perspectives
The in silico studies on SBA-ADA and SBA-linker-ADA proteins suggested that in terms of sequence analysis, SBA-ADA is slightly more stable but the overall combined sequence (Physico-chemical properties) and structural (mRNA and 3D structure) analysis of these proteins revealed notable differences This in silico analysis illustrates the potential role of tags and linkers to increase the yield and the efficacy of pharmaceutical recombinant proteins in plant molecular pharming.It will be very interesting to extend these analyses by employing more tags and linkers based on comparative modeling and design to optimize expression cassettes and predict the putative effects of these constituents on the level of foreign protein production.Other tags like hydrophobin, zein and elastin-like peptide tags and linkers (Table1 and other linkers) can be studied for their suitability and enhancing effects on extracting and purifying recombinant proteins in plant host systems.
Other computational measurements like molecular dynamics simulations may also be employed to understand time dependent changes in protein linker association.Such simulations provide deep understanding of protein structure and function at the atomic level.

Table 1 .
. For Summarize some linkers have used in recombinant proteins and have a good potential to be used in plant molecular pharming.The linkers below are generated from the International Genetically Engineered Machine (iGEM) Foundation website.

Table 2 :
Functional Characterization of SBA-ADA and SBA-linker-ADA proteins

Table 3 .
PROCHECK values for SBA-ADA and SBA-linker-ADA proteins