In - silico structural modelling of cytochrome complex proteins of white turmeric ( Curcuma zedoaria (Christm.) Roscoe

Curcuma zedoaria (Christm.) Roscoe (white turmeric) is a perennial herba-ceous plant of family Zingiberaceae and mainly found in the wild areas of tropical and subtropical regions worldwide. The cytochrome proteins in plants play important roles in promoting their growth and development, as well as protecting them from stresses and diseases. Cytochrome proteins like psbF, psbE, petB, petD, petN, petG and ccsA play important roles in deg-radation of misfolded proteins, ATP formation, cyclic electron flow and biogenesis of c - type cytochrome of C. zedoaria . However, due to lack of structural availability of these C. zedoaria cytochrome proteins in structural data-bases, the physiochemical parameters of sequences were estimated using Expasy ProtParam web tool. Self - Optimized Prediction Method with Alignment (SOPMA) server and MODELLER version 9.23 were used for modelling along with Qualitative Model Energy Analysis (QMEAN) and Protein Structure Analysis (ProSA) servers were implemented for validating the secondary and tertiary structures of these proteins. The obtained QMEAN4 values of the modelled cytochrome proteins were - 2.04, - 1.20, - 3.01, - 1.57, - 2.11, -1.74 and - 12.87. The Z - scores obtained from ProSA server were 0.5, - 0.83, -1.5, - 0.58, - 0.02, 0.14 and - 3.73. All seven modelled structures have been submitted to protein model database (PMDB). The derived results will be helpful in further investigations towards determining the crystal structure of the hypothetical proteins, structural motifs, physiochemical properties and also protein - protein interaction studies of various cytochrome proteins.


Introduction
Curcuma zedoaria (white turmeric) is one of the rhizomatous medicinal herbs that belongs to Zingiberaceae family. This plant is native to Bangladesh, India and Sri Lanka and is also cultivated in Brazil, China, Japan, Nepal and Thailand (1). C. zedoaria has been used in many folk medicines, especially in Ayurveda for the treatment of menstrual disorders, nausea, stomach diseases as well as carcinoma (2). The rhizome of the plant is used for its carminative, rubefacient, expectorant, diuretic, demulcent and stimulant properties while the root is used for the treatment of dyspepsia, flatulence, cough, cold and fever by the rural inhabitants (3). It has been also used in treating liver cancer, chronic pelvic inflammation, coronary heart disease, and helps to prevent leukopenia caused due to cancer therapies (4). In spite of having various medicinal importance, many works have not yet been done in this plant like availability of cytochrome proteins structures. Cytochrome proteins play a very important role in regulating electron transport chain and redox catalysis in living organisms. In plants, cytochrome b559 subunit beta (psbF) and cytochrome b559 subunit alpha (psbE) proteins are the key components of photosystem II (PSII) reaction centre and participate in secondary electron transfer pathways as well as protect PSII against photoinhibition (5). Proteins such as cytochrome b6 (petB), cytochrome b6-f complex subunit 4 (petD), cytochrome b6-f complex subunit 8 (petN) and cytochrome b6-f complex subunit 5 (petG) are the intermediate cytochrome proteins which are transferred between photosystem II (PSII) and photosystem I (PSI) (6). The protein cytochrome C Biogenesis protein (ccsA) is required during biogenesis of c-type cytochromes (cytochrome c6 and cytochrome f) at the step of heme attachment. This information was available in Uniprot database (7). However, these cytochrome proteins of C. zedoaria are lacking 3D structures in protein databases.
Protein structures are very essential for the elucidation of various molecular mechanisms underlying the biological processes. Generally quality protein structures are developed by using X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy which are very time consuming, costly and difficult processes specifically for membrane proteins (8). However, in-silico 3D structure prediction of proteins using homology modelling is an emerging alternative technique to develop protein structures (8). It is also essential to know the physiochemical properties for better understanding about the protein structures. Expasy ProtParam is one of the important tools for investigating the physiochemical properties (8,9). Secondary structure prediction can be also achieved by the SOPMA server using the consensus multiple alignments. This method correctly predicts 69.5% of amino acids for a three-state description of the secondary structure (alphahelix, beta-sheet and coil) (10). Further, MODELLER is well known for homology modelling and comparative tertiary structure prediction of proteins (11,12). For validation of the modelled structures, PROCHECK, Quantitative Model Energy Analysis (QMEAN) and Protein Structure Analysis (ProSA) servers are standard web servers for the validation of the modelled protein 3D structures (13,14).
In the present study, the 7 cytochrome proteins were targeted for in-silico structure prediction through various tools such as Expasy ProtParam, SOPMA, MODEL-LER, and RaptorX. Further, this study was also deigned to validate the 3D structures using WHAT IF, PROCHECK, QMEAN and ProSA servers.

Primary structure prediction
Primary structure prediction of proteins was carried out using the Expasy ProtParam web server (15). This was resulted with various physiochemical properties like theoretical isoelectric point (pI), molecular weight, total number of positive (+R) and negative residues (-R), extinction coefficient (Ec), instability index (II), aliphatic index (AI) and grand average of hydropathy (GRAVY).

Secondary structure prediction
The secondary structures of the targeted proteins were predicted by SOPMA (10). The parameters of the server were set as follows for all the proteins: numbers of conformational states to 4 (helix, sheet, turn and coil), similarity threshold to 8 and window width to 17.

Homology modelling
All the targeted protein sequences were taken as query for BLAST (Basic local alignment search tool) (16) using blastp tool. The protein sequences were used one by one as input data for the blastp tool against PDB database (17) to identify homologous structure. For homology modelling, MOD-ELLER version 9.23 (11) was used. Further advance modelling and loop refinement were also carried out. As no blastp hits were found for cytochrome C Biogenesis protein, RaptorX (18) web server was employed for 3D structural modelling of protein. The 3D structures were visualised in BIOVIA Discovery studio 2020 tool.

Model evaluation
The validation of 3D structures was done using online servers like WHAT IF web server to get fixed PDB structure and PROCHECK server (19) for Ramachandran plots. QMEAN server (20) was employed for composite scoring function of 3D structures to estimate both local (i.e., per residue) and global (i.e., for the entire structure) errors. The ProSA tool was used to evaluate overall quality of each model. The results of Z-score reflected the quality of modelled proteins in comparison to all other experimental protein structures which were estimated by X-ray crystallography and NMR (21). Quality protein structures were submitted to PMDB.

Primary structure prediction
ProtParam tool of Expasy server was employed for primary structure prediction. This analysis resulted with physiochemical properties of the sequences. The molecular weight of the proteins ranged from 3169.83 to 37482.02 Daltons ( Table 2). The pI values were from 4.60 to 10.74 for the cytochrome proteins. The GRAVY score was the lowest (0.076) for psbE. Additionally, petB, ccsA, petD, psbF, petG and petN were resulted with GRAVY score 0.503, 0.528, 0.605, 0.646, 1.114 and 1.545 respectively ( Table 2). Other physiochemical properties like -R, +R, EC, II, and AI of petB, petD, petN, petG and ccsA were shown in Table 2.
The random coils were found as 48.72%, 28.92%, 23.50%, 43.12%, 6.90%, 24.32% and 30.09% (Table 3). The cytochrome proteins revealed the cardinal nature of helix and coiling type reflects the more compact with strong attachment as well as existence in transmembrane region of the modelled proteins. The detailed secondary structure prediction information was listed in Table 3.
Here, MODELLER 9.23 version used for building 3D structures of proteins using the blastp results. As no template sequence was found for ccsA protein, RaptorX online server has been used for modelling. The modelled 3D protein structures were shown in Fig. 1. For structural conformation of the 3D-structures, PROCHECK server was used for Ramachandran plot analysis as it provides a convenient way to view the distribution of torsion angles in a protein structure. QMEAN and ProSA servers were used for the quality estimation of predicted models. The Ramachandran plot for psbF showed that out of 39 residues, 33 were non-glycine and non-proline residues from which 31 (93.9%) residues were found to be in the most favoured region, 2 (6.1%) in additionally allowed region, 0 (0.00%) in generously allowed region and 0 (0.00%) in disallowed region shown in Fig. 2A and Table 4. In addition, 2 numbers of end residues, 2 numbers of glycine residues and 2 proline residues were found. For psbE, 69 (98.6%) residues  were found in the most favoured region, 1 (1.4%) in additionally allowed region, 0 (0.0%) in generously allowed region, 0 (0.00%) in disallowed region shown in Fig. 2B and Table 4. For petB, 176 (94.1%) residues found in the most favoured region, 11 (5.9%) in additionally allowed region, 0 (0.0%) in generously allowed region, 0 (0.00%) in disallowed region shown in Fig. 2C and Table 4. For petD, 123 (95.3%) residues were found in the most favoured region, 5 (3.9%) in additionally allowed region, 1 (0.8%) in generously allowed region, 0 (0.00%) in disallowed region shown in Fig. 2D and Table 4. For petN, 24 (96.0%) residues found in the most favoured region, 1 (4.0%) in additionally allowed region, 0 (0.0%) in generously allowed region, 0 (0.0%) in disallowed region (Fig. 2E, Table 4). For petG, 28 (93.3%) residues were grouped in the most favoured region, 2 (6.7%) in additionally allowed region, 0 (0.0%) in generously allowed region, 0 (0.0%) in disallowed region shown in Fig. 2F and Table 4. For ccsA, 285 (94.1%) residues found in the most favoured region, 17 (5.6%) in additionally allowed region, 1 (0.3%) in generously allowed region, 0 (0.0%) in disallowed region shown in Fig. 2G and Table 4.
The QMEAN web server was used for determination of the model quality for psbF, psbE, petB, petD, petN, petG and ccsA. All the models were with good overall quality, as indicated by the QMEAN4 global score and Z-score. QMEAN Z-score with negative score shows low quality model whereas QMEAN4 score of above zero shows high quality model (20). The web server ProSA tool was used to evaluate each model's overall grade, Z-score reflected these proteins in comparison to all other structures estimated with X-ray crystallography and NMR. The psbF, psbE, petB, petD, petN, petG and ccsA have QMEAN4 values of -2.04, -1.20, -3.01, -1.57, -2.11, -1.74 and -12.87 respectively indicating good model quality. Fig. 3 showed QMEAN4 values of the proteins.

Discussion
C. zedoaria is having enough medicinal properties mainly due to presence of secondary metabolites and they were synthesized through several pathways. Cytochrome complex proteins play vital role to produce the secondary metabolites. Also, these proteins have important role in ATP formation, cyclic electron flow, biogenesis of c-type cytochrome of this plant. Most of the available protein sequences have not yet been structurally analysed in C. zedoaria through in-silico approach except few proteins like CzR1 gene (22,23). Seven such complex proteins like psbF, psbE, petB, petD, petN, petG and ccsA have been targeted in the current study to predict their structures, which are not available in structural databases. As the function of a protein is mainly depends on its structure, it is very much important to predict the structure of protein before further validation. In this current study, the structures of the seven cytochrome proteins have been modelled through in-silico approach. The protein sequences were retrieved from Uni-Prot database. From ProtParam server of Expasy, it was found that 4 proteins namely psbE, petD, petN and petG are having pI less than 7, indicating the acidic nature along with another 3 proteins namely psbF, petB and ccsA are having pI greater than 7 indicating the basic nature (Table 2). At pI, proteins are stable and compact (24). Five proteins namely psbF, petB, petN, petG and ccsA showed II smaller than 40, which indicated that the proteins were stable. Range of GRAVY of C. zedoaria proteins were 0.076 -1.545. The lowest value of GRAVY indicating the possibility of better interaction with water molecules (25) and here psbE was having the lowest GRAVY value. Then the SOPMA server had been used for predicting 2D-Structures of the selected proteins. It was found that the percentage of alpha helices and extended beta strands were dominating over all other parameters. It defines that the proteins are highly stable in nature because these two parameters are highly responsible for the formation of hydrogen bonds leading towards the authentication of the rigidity of the protein structures (26). Then, best templates were selected from blastp results for 3D structures building. In-silico modelling had been done using MODELLER 9.23 version in advance modelling techniques. The best structures were selected based on DOPE score. Further, for structural validation and Ramachandran plot verification, PROCHECK server has been used. It was observed that, the residues of all the amino acids of the modelled proteins perfectly fit in the required allowed regions of the Ramachandran plot which indicated the better quality of the proteins (27). Then QMEAN server had been used for getting the QMEAN4 scores in which red star indicated the position of the modelled proteins related to the experimental structures of similar size. The scores nearer to zero value specify superior quality of the modelled protein structures. For each residue of the model, the 'local quality' was represented in x-axis and expected similarity to the native structure in y-axis. Usually, residues showing a score above 0.6 were considered to be high quality model (20). Finally, the ProSA server results gave the statistical values for native structures leading to confirmation and overall quality of the models. All the Z-score of the modelled proteins were found to be within the native conformation range. The black dot in the ProSA analysis in Fig. 4 was showing the position of the modelled protein with comparison to all other reported protein structures determined by X -ray crystallography and NMR (28). The ccsA protein along with petB fallen in X-ray regions (light blue region of Fig. 4) and psbF, psbE, petD, petN along with petG were found in the NMR regions (deep blue region of Fig. 4). These results indicated that, the overall quality of the predicted models of proteins were quite good and could be helpful for further characterisation.

Conclusion
The current study was focused on cytochrome proteins of C. zedoaria (Christm.) Roscoe. Various tools and servers have been implemented and the 3D structures of seven cytochrome complex proteins have been predicted and validated. These obtained 3D structures might be helpful in further investigations like determining the crystal structure of these hypothetical proteins, structural motifs, physiochemical properties, protein-protein interaction studies of various cytochrome proteins as well as function and mechanism of protein action. C. zedoaria is a highly endangered plant with immense medicinal properties and is needed to be analysed extensively for its future improvement.