Metagenomic insights into plant growth promoting genes inherent in bacterial endophytes of Emilia sonchifolia ( Linn . ) DC

Studies on the genome of endophytes reveal the metabolic potential of endophytic microbiome including both culturable and unculturable fractions. The metagenome analysis through the Illumina HiSeq platform gives access to the genetic data encrypted for the molecular machinery, which takes part in plant growth promotion activity of the endophyte in various aspects including production of plant growth hormones and enhancing nutrient availability for the host plant. The present work was undertaken to identify the genes involved in plant growth promotion activities from the endophytes of Emilia sonchifolia (Linn.) DC. through metagenome analysis. Metagenomic studies include the analysis of functional annotations which aid in the detection of biocatalysts taking part in the metabolic pathway of host plants. The annotations of expressed genes in different databases like NCBI Nr, KEGG, eggNOG and CAZy resulted in enlisting the vast array of information on the genetic diversity of the endophytic microbiome. The metagenome analysis of endophytic bacteria from the medicinal plant E. sonchifolia unveiled characteristic functional genes involved in plant growth promotion such as nitrogen metabolism (nif) and siderophore production (enterobactin category), ipdC and tnaA (IAA producing), ACC deaminase coding genes (regulation of elevated ethylene levels in host tissues), MoNitrogenase, nitrous-oxide reductase (nosZ), nitrate reductase (narG, napA), nitrite reductase (nirD) (nutrient assimilation and absorption) enterobactin siderophore synthetase components F and D and acid phosphatase genes. This clearly explains the effective plant-microbe relationship and the role of bacterial endophytic microbes in regulating the growth of host plants.


Introduction
Endophytes are non-pathogenic microbiome occupying internally in plant tissues and occupy in many plants. Many researchers screened the endophytic microbes associated with medicinal plants for analysing the active and passive role of these microorganisms in the synthesis of host-derived bioactive compounds (1,2). Its multilevel interactions like plant-microbe, microbe-microbe, and microbe-environment enhanced the complexity of endophytic microbiome analysis (3). The identification of closely related endophytic microbes and their functional genes becomes difficult because of their genome plasticity (4). sis further widened our understanding of microbial genetic diversity associated with such plants. The metagenome analysis of microbial communities explains the community composition, perspectives of plant-microbe interactive development, physiological and biosynthetic potential of the association (5). Metagenome data mining opens new avenues in DNA/gene-level information followed by the identification and expression pattern of genes. The genomic analyses provide information on many of the unanswered problems in endophytism like the reason for the coexistence of different microbes, the extent of plantmicrobe interaction and the plant-microbe symbiotic coevolution (6). The construction of metagenomic libraries followed by phylogenetic analysis can explicate the diversity of microbial communities associated with different plants.
High throughput sequencing techniques like Illumina shifted the end barriers further in the genomic studies and it provided a new reliable platform for researchers to screen the hidden world of microbes living inside the microhabitat in plant tissues. Ilumina HiSeq technology generates a large quantity of genomic data from which relatively high-quality data on protein-coding genes are generated. This sequenced data on comparison with the known database can identify the functional roles of those identified genes. This genomic information thus created will lead to the identification of metabolic capacities of microbes and to tackle many of the hidden areas of plantmicrobe interactions.
In order to keep a stable symbiotic relationship, the endophytic bacteria enhance the production of some growth regulators for the host. In this way, they activate the host metabolic machinery by the production of some compounds that sustain their endophytic mode of life (7). Endophytes produce plant growth regulators like Indole Acetic Acid (8), ACC deaminase (9), and enhance the availability of nutrients like phosphorous (10), nitrogen (11) and iron (12). Indirect ways of plant growth promotion occur through the ability of endophytes to prevent the growth of pathogenic microorganisms (13, 14) and they help the plant tolerate different stress conditions (15,16). Endophytic association resulted in the enhancement of expression of selected genes of the host (17) and the alterations in metabolism were beneficial to both partners. Hence, these associations are considered as symbiotic rather than pathogenic.
Most endophytic bacteria reported were from the group Proteobacteria which are soil bacteria (18,19). Microbial ecology and diversity studies on Illumina MiSeq/ HiSeq platform reduces the errors and increases the likelihood of finding rare and beneficial endophytes with higher phylogenetic resolution at a comparatively lower cost (20, 21,22). Because of very effective therapeutic applications in traditional medicines, E. sonchifolia was screened thoroughly to find out the phytochemical components (23,24). Therefore, high throughput screening will reveal the role of endophytes in the biosynthetic potential of the host. During the biodiversity analysis, two different phyla identified from E. sonchifolia were Proteobacteria and Firmicutes (25).The present investigation on endophytic genes and their functional role unveiled the role of the endophyte in the plant growth promotion activity of this beneficial medicinal plant.

DNA extraction and library construction
DNA was extracted from surface-sterilized Emilia sonchifolia (Linn) DC It was quantified (Qubit®4.0 fluorometer, Invitrogen, Carlsbad, CA, USA) and fragmented randomly by sonication (Covaris 220). The adaptors were indexed which helped for the easy identification of reads in Illumina platform. Amplification of the fragments was done by PCR for 8 cycles using P5 (5' AATGATACGGCGACCACCGAGATCTACAC 3') and P7 (5' CAAGCAGAAGACGGCATACGAGAT 3') primers. These primers are universal primers and both primers had specific sequences which can anneal with flow cell to perform bridge PCR and the P7 primer carried a six base index which helped for multiplexing. The VAHTSTM DNA clean beads were used for cleaning the PCR amplified products and they were quantified by Qubit®4.0 fluorometer (Invitrogen, Carlsbad, CA, USA). These next-generation sequencing library preparations were done by following the manufacturer's protocol (VAHTS Universal DNA library preparation kit for Illumina).

Illumina HiSeq sequencing
The indexed libraries prepared were loaded on the Illumina HiSeq instrument and the different indices were multiplexed and sequenced according to the manufacturer's instructions (Illumia, San Diego, CA, USA). Sequencing was carried out in 2 x 150 paired-end (PE) configuration; image analysis and base calling were conducted by the HiSeq control software (HCS), OLB, and GA pipeline-1.6 (Illumina) on the HiSeq instrument. Bcl2fastq analysed the original Image data (V2.17.14) for base calling and quality analysis and saved in fastq format.

Processing and assembly of data
The Phred quality score was calculated based on ASCII standards, and the quality score (Q20) less than 20 were discarded. The pass filtered data saved in fast q format files one for read 1 and the other for read 2 were selected for creating paired-end data. GC content was also calculated to reduce the AT-GC sequencing bias. Next-generation data quality software Cutadapt (v.1.9.1) was used to trim adaptors, eliminate low-quality reads and N-rich reads. We also removed the primers and reads with lengths less than 75bp in this filtering process. The chance for the contamination of reads with host sequence was removed with the help of BWA (v0.7.12) software which filtered out host sequences based on the host genome.

Metagenome assembly and gene prediction
The clean data was used for the whole genome de novo assemblies and that was processed using MEGAHIT (v 1.1.3), with different K-mer (39,59,79,119). After assembling, the scaffold with the biggest N50 was selected for further gene prediction analysis. The reads were analysed for coding genes to specify gene predictions of metagenomic or unknown microorganisms using Prodigal (v 3.02). Sequence clustering was done with CD-HIT (v4.5.6) which reduced the redundancy of the predicted gene sequences and clustering of unique gene sequences was processed at 95% identity and 90% coverage level by default. Pre-processed reads were then aligned to a non-redundant set of genes with the help of Soap Aligner (v2.21) generated gene abundance or read coverage of the genes. The number aligned and normalised reads were calculated based on gene length which was used to measure the gene abundance.

Gene functional annotation
Gene functional annotations were predicted by aligning the predicted gene with different databases like Nr database (non-redundant protein database), KEGG pathway database (Kyoto Encyclopedia of Genes and Genomes database), eggNOG (evolutionary genealogy of genes: Non Supervised Orthologous Groups, Version 4.0) and CAZy database (Carbohydrate Active enZYmes Database, Diamond Version 0.8.15.77 and BLAST Version 2.2.31+) were used for the database search and alignment to predict the gene functional annotations. Gene annotation resulted from each database was used to categorise relative abundance of different functional categories..

Results
The original data analysed using Bcl2fast q (v2.17.1.14) software, and it checked the base quality of the first 25 bases in a read and determined the conversion of data to FASTA format. The pass filtered data without error and PHRED score higher than 20 (Q20) were kept. The quality score of a base and the reads mean quality distribution were calculated.

Metagenome Assembly and Gene Prediction
We assembled clean quality optimized data to generate a scaffold, and it generated the detailed assembly results. After the assembly of metagenome data, 92250 reads were generated with an average length of 1432.89bp. The Prodigal (v3.02) software analysed the genome data, especially for gene prediction from metagenomic data or the data of unknown organisms. The sequence clustering generated unique sequences and the unigene statistics were analysed. The average length of the annotated read was 672, and it created 82bp and 161694 sequence reads. Soap Aligner detected the coding regions of the assembled scaffolds (Version 2.2.1). The count of each unigene was marked and calculated the unigene abundance. The unigene sequence ID K141-73893-1 gave a maximum count of 69. The sequence data was submitted in NCBI Biosample database SRA with accession numbers SAMN11617377 and SAMN11616726.

Gene functional annotation
The protein sequences of the predicted gene were compared with the protein database to get the gene functional annotations. The protein sequence of the predicted gene and the reference gene from the database showed significant similarity while searching the NCBI Nr (non-redundant) database. The sequence alignment length was set as over 60% similarity between predicted gene and a threshold error rate of 1e-5. This database search revealed 148289 annotated sequences which showed sequence similarity. In the Nr (non-redundant) database annotation pattern represented shows the protein and the species in which that protein belongs (Table 1).
In the present data analysis, the functional potential of the endophytic microbiome was analysed using the KEGG database. From this, we identified 250 pathways and divided them into six major categories viz. cellular processes, environmental information processing, genetic information processing, human disease, metabolism and the organismal system. Each category was further subdivided and analysed. The summary of annotated genes and major functional annotations of genes according to KEGG database was shown in Table 2.
The functional annotation based on Orthologous Groups was done against the eggNOG (v4.5) database. In the gene annotations, 25 different functional categories like energy production and conservation, general function, cell motility, function unknown etc were included. CAZy is the database used to analyze the carbohydrate-active enzymes. These include genes of six major functional categories like glycoside hydrolases (GHs), glycosyl transferases (GTs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), auxiliary activities (AAs), and carbohydrate-binding modules (CBMs). The microbial carbohydrate metabolism of endophytes was clearly understood from CAZy annotations and some of the major genes of carbohydrate metabolism were recognized in the annotation.

Role of endophyte in plant growth promotion
The plant growth promotion activity of endophytic bacteria was resolved through metagenome analysis because it disclosed many proteins coding genes and transport systems that enhance the growth of the host plant. Plant growth enhancement occurs either through the production of plant growth hormones or the endophytes augment the nutrient uptake and utilization. The endophytic metagenome carries genes like ipdC, tnaA , ytrE, acuA, B, C, trpC, F and these gene products were taking part in the production or degradation of phytohormones like indole-3-acetic acid and ethylene. Genes involved in the pathway of zeatin biosynthesis indicates the role of endophytes in the production of plant growth regulators. Activation of plant defence generates the plant hormone salicylic acid. The presence of salicylate hydrocyclase annotation was found in the metagenomic data. The annotated genes were identified in the phenyl propanoid biosynthesis, terpenoid backbone biosynthesis, and inositol phosphate metabolism, which lead to the production of plant growth regulators. The annotated genes like speA, B and E give compounds like agmatinase, arginine decarboxylase etc. which will increase the fitness of the plant cell and it promotes the growth of both endophyte and host (Table 2).
Nutrients like phosphorus, nitrogen and iron always act as limiting factors in plant growth. The endophyticmetagenome data contains different genes like pstS, B, A, C, yjbB which actively take part in the phosphate transport. Phosphonate and phosphinate metabolism (Fig. 1) involved  Table 2. Annotated genes and their major functional annotations involved in plant growth promotion based on KEGG database. in phosphate mobilization was noticed in the annotated pathway. Based on endophytic metagenome data phosphonate metabolic pathway contains many genes which enhance phosphate availability (Table 3). Nitrogen metabolism-related genes were commonly found in the root nodule-forming bacteria. In the present analysis, important genes and their products taking part in the metabolism of nitrogen were reported. Enzymes like nitrous oxide reductase (nosZ), nitrite reductase (large (nirB) and small (nirD) subunits), nitrate reductase (alpha (nar G, Z) and gamma (narI, narV) subunits), nitrogenate monooxygenase and nitrilase were identified from the annotations. Proteins involved in the fixation of nitrogen (nifU), nitrate/nitrite transporter system and regulation or sensor system for transport and utilization of nitrogen was also found in the metagenome data (Table 4). Glutamine was involved in the nitrogen metabolism and a metabolic pathway for glutamine and glutamate with gene annotations were recognized (Fig.  2).
Sequestration of iron takes place in association with some iron chelators like siderophore in microbes. Plant gains this iron for their metabolic activities. In the metagenome analysis, a pathway specifying the synthesis of a siderophore group of non-ribosomal peptides was recognised. Enterobactin category of siderophore synthesizing enterobactin synthetase F and D (entF, D) and catecholate siderophore receptors (Fiu) were identified ( Table 5). The iron complex transport system of ABC transporters showed   Table 4. Annotated genes and their functional role in nitrogen metabolism based on KEGG database the iron uptake capacity of the endophytes. Many genes involved in the biosynthesis of siderophore group of nonribosomal peptides (Fig. 3) were also observed in this study.

Discussion
The plant growth promotion activities of endophytic microbes were under research for application. We can incorporate it into the agricultural sector for the growth en-hancement of economically valuable crops. Plant growthpromotion activities were mainly studied in endophytes from cultivable crops like rice, wheat, sugarcane etc. Plant growth-promoting activity and increase in productivity and biomass of rice plants were analysed after inoculation with Azospirillum sp. B510 (26,27). Endophytes increase the growth of their host by various mechanisms. They can take part in the production of different classes of plant growth hormones (28,29,30) or they enhance the plant growth by

Irr
Fur family transcriptional regulator, iron response regulator --sitA manganese/iron transport system substrate-binding protein --sitB manganese/iron transport system ATP-binding protein --sitC,D manganese/iron transport system permease protein --ko02010 ABC transporters; efeO iron uptake system component EfeO -- Table 5. Annotated genes and their functional role in iron acquisition based on KEGG database.
increasing the availability of nutrients or will enhance its uptake (31).
The best-known phytohormone produced by endophytes is IAA which is synthesised from tryptophan through indole pyruvate (32). The presence of two notable enzymes, indole pyruvate decarboxylase (ipdC) and tryptophanase (tnaA) confirmed the IAA production capacity of the endophytic microbiome. Along with this, enzyme salicylate hydrolase showed the chance for conversion of tryptophan to IAA and salicylic acid (SA). Nitrilase is an enzyme (EC 3.5.5.1) that was reported to be involved in the biosynthesis of IAA from indole-3-acetonitrile, which is a tryptophan-independent pathway (33). The different enzymes and pathways for IAA biosynthesis explained the role of IAA as signalling molecules along with plant growth promotion (34). Acetoin and 2, 3-butanediol were reported to have plant growth promotion activity in Arabidopsis and their biosynthesis was recognized from rhizobacteria (35). A small fraction of acetoin production occurs by the activity of poxB gene during the pyruvate metabolism (36). The endophytic metagenome study carries the gene for pyruvate dehydrogenase (poxB), acetoin utilization proteins C, B, A (acuC, B, A) and ATP binding protein (ytrE, F) involved in the acetoin utilization transport system. Many endophytes produce cytokinins (37). In this study, enzymes involved in zeatin biosynthetic pathway like tRNA dimethyallyl transferase (miaA) were detected.
Most endophytic bacterial studies described the enzyme 1-aminocyclopropane-1-carboxylate (ACC) deaminase (9,38,39) which is produced during stress conditions to manage the elevated levels of ethylene. It cleaves the precursor of ACC to α-ketobutyrate and ammonium (40) and reduces the conversion of ACC to ethylene which affects the root growth. The presence of the enzyme ACC deaminase was also identified in this study. Studies using endophytic plant growth-promoting bacterium Burkholderia phytofirmans PsJN showed that the mutation in ACC deaminase gene inactivates the root elongation ability of the endophytic bacterium (41).
Spermidine is a plant growth promoting substance reported from the rhizobacterium Bacillus subtilis OkB105 (42). Three different genes speE (spermidine synthase), speB (Agmatinase), speA (arginine decarboxylase) which were involved in the synthesis of spermidine, spermine and putrescine respectively were also detected in the present study. Terpenoids play an important role in plants in the assembly of reaction centres in photosynthesis, stress tolerance (43) and in defence mechanisms (44). Terpenoids include a vast family of compounds or natural products. Different pathways like mevalonate and isoprenoid were identified that synthesise terpenoids. Different microbial organisms were screened for the large-scale production of terpenoids. Microbes like Yeast, E. coli (45) and Streptomyces (46,47) were used and proved as a promising method for the large-scale production of terpenoids. In this study, the metagenome analysis revealed that endophytic genome carries genes that take part in the mevalonate pathway of terpenoid biosynthesis and so they were interacting with the host plant for the growth promotion of the host.
Nitrogen and phosphorus are important nutrients for plant growth. Because of the unavailability of phosphorus in its insoluble form, it acts as a limiting factor for plant growth (48,49). The endophytic bacteria can solubilise poorly soluble inorganic phosphate and can enhance plant growth. The plant growth promotion activity of phosphate solubilising bacteria like Pseudomonas (50) and Enterobactor (51) were reported.. Endophytic bacteria could solubilise both organic and inorganic phosphate and the genome of different plant growth-promoting endophytic bacteria contains genes for specific enzymes and regulatory systems taking part in it (52). Phytases are enzymes involved in the solubilisation of organic phosphate and they were thermally stable. The storage form of phosphate in plants was phytate (myoinositol 1, 2, 3, 4, 5, 6-hexakisphosphate). Phytases are enzymes that can remove the phosphate group from the phytate. Along with that, it prevents the chelateforming capacity of phytase with mineral nutrients and thus making other mineral nutrients also available (10,53). appA gene identified and isolated from E. coli which codes for acid phosphatase (54). The solubilisation of inorganic or mineral phosphate in bacteria was found to be associated with the synthesis of organic acids and the direct oxidation of glucose to gluconic acid (GA) (55,56). The enzyme taking part in this oxidation reaction is glucose dehydrogenase and pyrroloquinoline quinine (PQQ) as a cofactor (pqqABCDE). Metagenome analysis of endophytic bacteria from this medicinal plant shows endophytes with great potential to solubilise both organic and inorganic phosphate. The genome contains appA genes for 4-phytase/acid phosphatase, 3-phytase which take part in the solubilisation of organic phosphate. The complete operon pqqABCDE for the co-factor pyrroloquinoline quinone (PQQ) and the gene gcd for glucose dehydrogenase were present and it highlights the solubilisation potential of inorganic phosphate. Presence of exopolyphosphatases (ppx-gppA) and inorganic phosphatase (ppa) shows its potential to solubilise the phosphate to make them available to plants. Along with all these phosphate solubilisation capacities the metagenome contains genes for phosphate starvation inducible proteins (PhoH/L), phosphate transport system substrate / ATP binding protein (pstS/B) and low affinity inorganic phosphate transporter (pit).
Many researchers conducted the studies on the nitrogen fixation ability of endophytic bacteria. In the present study, the dominant family of endophytic bacteria was Enterobacteriaceae. Many members from family enterobacteriaceae were already reported as nitrogen fixers. Enterobactor sp. And Klebsiella sp. from sugarcane showed nifH gene for nitrogen fixation (57). The processes of nitrogen fixation in most of the nitrogen-fixing microbes occur with the help of Mo-Nitrogenase. This enzyme is composed of two metallo proteins-NifDK and NifH (58). The genes nifD, nifK, nifH, and the protein coding genes regulating and assembling Mo -Nitrogenase like nifU, nifA, nifN, nifE, nifX, nifQ, nifT, nifS were present in this study. Along with genes for nitrogen fixation, there were enzymes for dissimilatory nitrate reduction to ammonia. The additional mechanism for nitrogen assimilation occurs in two steps viz. denitrification and reduction (59). We also found the different enzymes involved in the metagenome analysis like nitrous-oxide reductase (nosZ), nitrate reductase (narG, napA), nitrite reductase (nirD) etc. All these enzymes were also reported from an endophytic Enterobactor sp. SA187 isolated from Indigofera argentea (60). Glutamine is one of the amino acid metabolites then takes a major role in ammonium assimilation. Cellular nitrogen status regulates the amount of glutamine metabolism in a bacterial cell. This showed the role of glutamine in the assimilation of nitrogen and as a signalling molecule in nitrogen metabolism (61).
Complete genome analysis of plant growthpromoting endophytic bacterium Enterobactor sp.638 isolated from Poplar contains different iron uptake systems (62). The microorganism developed different solutions for iron acquisition like the production of chelators (siderophores). Siderophore act as a transport vehicle for iron (63). The presence of microbial siderophore act as a direct supply of iron for plants and a sufficient amount of iron-related to the immune system of the plant helps to prevent some diseases. The biocontrol efficiency of Pseudomonas fluorescens against fusarium wilt in tomato was analysed, and it was also based on siderophore production. The P. fluorescens was found effective in siderophore production and prevention of fusarium wilt (64). The presence of genes for enterobactin siderophore synthetase components F and D (entF, D), other proteins and enzymes involved in its production (entE, C, B) and enterobactin exporter (entS) and discloses the iron acquisition potential of the endophytic bacteria from E. sonchifolia.

Conclusion
The metagenome analysis of endophytic bacterial genome revealed high genetic diversity and diverse functional genes. The endophytic microbiome incorporates different genes which were potentially involved in the plant growth promotion activities like the production of plant growth regulators, enhancement of nitrogen availability by its assimilation through various pathways, solubilisation of phosphate and increase in the iron acquisition by the production of siderophore. Different genes with a functional role as participation in nitrogen metabolism (nif) and siderophore production (enterobactin category) were noticed in the annotations. IAA production capacity of the endophytic microbiome was indicated by enzyme coding gene annotations like ipdC and tnaA. Presence of enzyme ACC deaminase coding gene in the metagenomic data shows the endophytic role in regulating elevated ethylene levels in host tissues. A great majority of genes having functional role in nutrient assimilation and absorption like genes for assembling Mo-Nitrogenase, nitrous-oxide reductase (nosZ), nitrate reductase (narG, napA), nitrite reductase (nirD), enterobactin siderophore synthetase components F and D and acid phosphatase explains the effective plant-microbe interactive relationship and the role of bacterial endophytic microbes in regulating the growth of host plant.