Comparative Analyses of Cytochrome P450s and Those Associated with Secondary Metabolism in Bacillus Species

Cytochrome P450 monooxygenases (CYPs/P450s) are among the most catalytically-diverse enzymes, capable of performing enzymatic reactions with chemo-, regio-, and stereo-selectivity. Our understanding of P450s’ role in secondary metabolite biosynthesis is becoming broader. Among bacteria, Bacillus species are known to produce secondary metabolites, and recent studies have revealed the presence of secondary metabolite biosynthetic gene clusters (BGCs) in these species. However, a comprehensive comparative analysis of P450s and P450s involved in the synthesis of secondary metabolites in Bacillus species has not been reported. This study intends to address these two research gaps. In silico analysis of P450s in 128 Bacillus species revealed the presence of 507 P450s that can be grouped into 13 P450 families and 28 subfamilies. No P450 family was found to be conserved in Bacillus species. Bacillus species were found to have lower numbers of P450s, P450 families and subfamilies, and a lower P450 diversity percentage compared to mycobacterial species. This study revealed that a large number of P450s (112 P450s) are part of different secondary metabolite BGCs, and also identified an association between a specific P450 family and secondary metabolite BGCs in Bacillus species. This study opened new vistas for further characterization of secondary metabolite BGCs, especially P450s in Bacillus species.


Bacillus Species Have the Lowest Number of P450 Families and Subfamilies'
As per the International P450 Nomenclature Committee rules [27][28][29], all 507 P450s found in 114 Bacillus species can be grouped into 13 P450 families and 28 subfamilies (Figures 3 and 4). Phylogenetic analysis of Bacillus P450s revealed P450s belonging to the same family grouped together, suggesting that the annotation of P450s in this study is accurate (Figure 1). The number of P450 families and subfamilies found in Bacillus species is lower compared to mycobacterial species (60 species), which have 77 P450 families and 132 subfamilies [25]. Because of the presence of the lowest number of P450 families, the P450 diversity percentage in Bacillus species was found to be lowest (3.9%) compared to mycobacterial species (72%) [25]. Among

Bacillus Species Have the Lowest Number of P450 Families and Subfamilies'
As per the International P450 Nomenclature Committee rules [27][28][29], all 507 P450s found in 114 Bacillus species can be grouped into 13 P450 families and 28 subfamilies (Figures 3 and 4). Phylogenetic analysis of Bacillus P450s revealed P450s belonging to the same family grouped together, suggesting that the annotation of P450s in this study is accurate (Figure 1). The number of P450 families and subfamilies found in Bacillus species is lower compared to mycobacterial species (60 species), which have 77 P450 families and 132 subfamilies [25]. Because of the presence of the lowest number of P450 families, the P450 diversity percentage in Bacillus species was found to be lowest (3.9%) compared to mycobacterial species (72%) [25]. Among

Bacillus Species Have the Lowest Number of P450 Families and Subfamilies'
As per the International P450 Nomenclature Committee rules [27][28][29], all 507 P450s found in 114 Bacillus species can be grouped into 13 P450 families and 28 subfamilies (Figures 3 and 4). Phylogenetic analysis of Bacillus P450s revealed P450s belonging to the same family grouped together, suggesting that the annotation of P450s in this study is accurate (Figure 1). The number of P450 families and subfamilies found in Bacillus species is lower compared to mycobacterial species (60 species), which have 77 P450 families and 132 subfamilies [25]. Because of the presence of the lowest number of P450 families, the P450 diversity percentage in Bacillus species was found to be lowest (3.9%) compared to mycobacterial species (72%) [25]. Among 13 P450 families, the CYP107 P450 family has the highest number of P450s (165 P450s) contributing 31.5% of 507 P450s (Figure 3 Table S1.   Table S1.

Bacillus Species Have the Lowest Number of P450 Families and Subfamilies'
As per the International P450 Nomenclature Committee rules [27][28][29], all 507 P450s found in 114 Bacillus species can be grouped into 13 P450 families and 28 subfamilies (Figures 3 and 4). Phylogenetic analysis of Bacillus P450s revealed P450s belonging to the same family grouped together, suggesting that the annotation of P450s in this study is accurate (Figure 1). The number of P450 families and subfamilies found in Bacillus species is lower compared to mycobacterial species (60 species), which have 77 P450 families and 132 subfamilies [25]. Because of the presence of the lowest number of P450 families, the P450 diversity percentage in Bacillus species was found to be lowest (3.9%) compared to mycobacterial species (72%) [25]. Among 13 P450 families, the CYP107 P450 family has the highest number of P450s (165 P450s) contributing 31.5% of 507 P450s (Figure 3 Table S1. P450 subfamily analysis revealed that most P450 families have a single subfamily ( Figure 4). Among P450 families, CYP107 has the highest number of P450 subfamilies (eight subfamilies) followed by CYP109 (six subfamilies), CYP152 (three subfamilies), and CYP106 (two subfamilies) ( Figure 4).
The remaining nine P450 families, CYP102, CYP113, CYP1179, CYP1221, CYP1341, CYP134, CYP1756, CYP197, and CYP223, all have a single subfamily ( Figure 4). It is interesting to note that the CYP102 P450 family, despite having the second largest number of P450s, has a single subfamily "A". Analysis of P450 subfamily profiles revealed that specific subfamilies are dominant in a particular family ( Figure 4). Subfamily "J" is dominant in the CYP107 family, subfamily "B" is dominant in the CYP109 family and subfamily "A" is dominant in the CYP152 family ( Figure 4). Analysis of P450 family profiles revealed that no P450 family is conserved across Bacillus species ( Figure 5). Most Bacillus species have CYP102, CYP107, CYP109, CYP106, and CYP152 P450 families ( Figure 5). CYP197, CYP223, and CYP1341 are present in a single Bacillus species (Supplementary Dataset 1). P450 subfamily analysis revealed that most P450 families have a single subfamily ( Figure 4). Among P450 families, CYP107 has the highest number of P450 subfamilies (eight subfamilies) followed by CYP109 (six subfamilies), CYP152 (three subfamilies), and CYP106 (two subfamilies) ( Figure 4). The remaining nine P450 families, CYP102, CYP113, CYP1179, CYP1221, CYP1341, CYP134, CYP1756, CYP197, and CYP223, all have a single subfamily ( Figure 4). It is interesting to note that the CYP102 P450 family, despite having the second largest number of P450s, has a single subfamily "A". Analysis of P450 subfamily profiles revealed that specific subfamilies are dominant in a particular family ( Figure 4). Subfamily "J" is dominant in the CYP107 family, subfamily "B" is dominant in the CYP109 family and subfamily "A" is dominant in the CYP152 family ( Figure 4). Analysis of P450 family profiles revealed that no P450 family is conserved across Bacillus species ( Figure 5). Most Bacillus species have CYP102, CYP107, CYP109, CYP106, and CYP152 P450 families ( Figure 5). CYP197, CYP223, and CYP1341 are present in a single Bacillus species (Supplementary Dataset 1).

Bacillus Species Have the Lowest Number of Secondary Metabolite BGCs
In order to identify P450s involved in the biosynthesis of secondary metabolites, the gDNA and plasmid DNA of each Bacillus species (Table S2) has been subjected to secondary metabolite BGCs analysis using anti-SMASH [30]. In total, 203 plasmids were identified in 60 Bacillus species (Table  S2). Analysis of 128 Bacillus species genomes revealed the presence of 1098 and 26 secondary metabolite BGCs on gDNA and plasmid DNA, respectively ( Figure 6 and Table S3).  Table S3.
The number of secondary metabolite BGCs varied from a maximum of 14 to one in Bacillus species gDNA. Interestingly, among 203 plasmid DNAs from 60 Bacillus species (Table S3), only 21 plasmid DNAs from 18 Bacillus species were found to have secondary metabolite BGCs ( Figure 6 and Table S3). The number of secondary metabolites BGCs on plasmid DNAs varied from a maximum of four to one ( Figure 6 and Table S3).

Bacillus Species Have the Lowest Number of Secondary Metabolite BGCs
In order to identify P450s involved in the biosynthesis of secondary metabolites, the gDNA and plasmid DNA of each Bacillus species (Table S2) has been subjected to secondary metabolite BGCs analysis using anti-SMASH [30]. In total, 203 plasmids were identified in 60 Bacillus species (Table S2). Analysis of 128 Bacillus species genomes revealed the presence of 1098 and 26 secondary metabolite BGCs on gDNA and plasmid DNA, respectively ( Figure 6 and Table S3). P450 subfamily analysis revealed that most P450 families have a single subfamily ( Figure 4). Among P450 families, CYP107 has the highest number of P450 subfamilies (eight subfamilies) followed by CYP109 (six subfamilies), CYP152 (three subfamilies), and CYP106 (two subfamilies) ( Figure 4). The remaining nine P450 families, CYP102, CYP113, CYP1179, CYP1221, CYP1341, CYP134, CYP1756, CYP197, and CYP223, all have a single subfamily (Figure 4). It is interesting to note that the CYP102 P450 family, despite having the second largest number of P450s, has a single subfamily "A". Analysis of P450 subfamily profiles revealed that specific subfamilies are dominant in a particular family (Figure 4). Subfamily "J" is dominant in the CYP107 family, subfamily "B" is dominant in the CYP109 family and subfamily "A" is dominant in the CYP152 family ( Figure 4). Analysis of P450 family profiles revealed that no P450 family is conserved across Bacillus species ( Figure 5). Most Bacillus species have CYP102, CYP107, CYP109, CYP106, and CYP152 P450 families ( Figure 5). CYP197, CYP223, and CYP1341 are present in a single Bacillus species (Supplementary Dataset 1).

Bacillus Species Have the Lowest Number of Secondary Metabolite BGCs
In order to identify P450s involved in the biosynthesis of secondary metabolites, the gDNA and plasmid DNA of each Bacillus species (Table S2) has been subjected to secondary metabolite BGCs analysis using anti-SMASH [30]. In total, 203 plasmids were identified in 60 Bacillus species (Table  S2). Analysis of 128 Bacillus species genomes revealed the presence of 1098 and 26 secondary metabolite BGCs on gDNA and plasmid DNA, respectively ( Figure 6 and Table S3).  Table S3.
The number of secondary metabolite BGCs varied from a maximum of 14 to one in Bacillus species gDNA. Interestingly, among 203 plasmid DNAs from 60 Bacillus species (Table S3), only 21 plasmid DNAs from 18 Bacillus species were found to have secondary metabolite BGCs ( Figure 6 and Table S3). The number of secondary metabolites BGCs on plasmid DNAs varied from a maximum of four to one ( Figure 6 and Table S3).  Table S3.
The number of secondary metabolite BGCs varied from a maximum of 14 to one in Bacillus species gDNA. Interestingly, among 203 plasmid DNAs from 60 Bacillus species (Table S3), only 21 plasmid DNAs from 18 Bacillus species were found to have secondary metabolite BGCs ( Figure 6 and Table S3). The number of secondary metabolites BGCs on plasmid DNAs varied from a maximum of four to one ( Figure 6 and Table S3).
Analysis of types of secondary metabolite BGCs revealed the presence of 33 and 10 types of BGCs on gDNA and plasmid DNAs in Bacillus species (Figure 7 and Supplementary Dataset 4). The types of BGCs in individual Bacillus species varied from a maximum of 10 types to one (Figure 7). Among types of BGCs, Nonribosomal peptides secondary metabolite (Nrps) BGCs were dominant in Bacillus species, both on gDNA and plasmid DNAs (Figure 7).
Analysis of types of secondary metabolite BGCs revealed the presence of 33 and 10 types of BGCs on gDNA and plasmid DNAs in Bacillus species (Figure 7 and Supplementary Dataset 4). The types of BGCs in individual Bacillus species varied from a maximum of 10 types to one (Figure 7). Among types of BGCs, Nonribosomal peptides secondary metabolite (Nrps) BGCs were dominant in Bacillus species, both on gDNA and plasmid DNAs (Figure 7). Analysis of types of BGCs on gDNA and plasmid DNAs revealed the presence of seven types of common BGCs between gDNA and plasmid DNAs (Figure 7). This suggests that these plasmids might be involved in horizontal gene transfer of different BGCs among Bacillus species. It is important to note that horizontal gene transfer of BGCs is a common phenomenon among Bacillus species [21]. Interestingly, three distinct types of secondary metabolite gene clusters, namely Sactipeptide-Lantipeptide-T1pks-Nrps, Arylpolyene-Nrps, and Lantipeptide-T1pks-Nrps, were only identified on plasmid DNAs (Figure 7).

Large Number of P450s Found to Be Part of Secondary Metabolites BGCs in Bacillus Species
Among 507 P450s identified in 128 Bacillus species, 112 P450s (22%) from 50 Bacillus species were found to be part of secondary metabolite BGCs (Table 1). Among 13 P450 families, only seven families, namely CYP107, CYP113, CYP134, CYP152, CYP102, CYP109, and CYP1179, were found to be part of different secondary metabolite BGCs (Figure 8). P450 subfamily level analysis revealed that P450s belonging to the subfamilies H and K in the CYP107 family were part of secondary metabolite BGCs, despite subfamily J being dominant in that family (Figure 4). In the CYP152 family, P450s belonging to subfamily A were found to be part of the secondary metabolite BGCs. Analysis of P450s involving secondary metabolite biosynthesis revealed that P450s belonging to the CYP107 family are dominant by 61% (68 P450s) of all P450s (112 P450s) involved in secondary metabolite BGCs, followed by CYP113, CYP152, CYP102, CYP109, and CYP1179 ( Figure 8). It is interesting to note that these P450 families are highly-populated in Bacillus species (Figures 3 and 5). This further supports the previous hypothesis that species populate specific P450s if they are useful in their adaptation to certain ecological niches or useful in their physiology [31][32][33][34][35]. Considering the large number of P450s, their widespread nature, and part in secondary metabolite BGCs, it can be hypothesized that the P450s belonging to the CYP107, CYP102, CYP109, CYP152, and CYP113 families play a key role in Bacillus species' physiology, including synthesis of different secondary metabolites. Despite secondary metabolite BGCs being found on plasmid DNA, no P450 was found to be part of these clusters. Analysis of association between P450 families and secondary metabolite Analysis of types of BGCs on gDNA and plasmid DNAs revealed the presence of seven types of common BGCs between gDNA and plasmid DNAs (Figure 7). This suggests that these plasmids might be involved in horizontal gene transfer of different BGCs among Bacillus species. It is important to note that horizontal gene transfer of BGCs is a common phenomenon among Bacillus species [21]. Interestingly, three distinct types of secondary metabolite gene clusters, namely Sactipeptide-Lantipeptide-T1pks-Nrps, Arylpolyene-Nrps, and Lantipeptide-T1pks-Nrps, were only identified on plasmid DNAs (Figure 7).

Large Number of P450s Found to Be Part of Secondary Metabolites BGCs in Bacillus Species
Among 507 P450s identified in 128 Bacillus species, 112 P450s (22%) from 50 Bacillus species were found to be part of secondary metabolite BGCs (Table 1). Among 13 P450 families, only seven families, namely CYP107, CYP113, CYP134, CYP152, CYP102, CYP109, and CYP1179, were found to be part of different secondary metabolite BGCs (Figure 8). P450 subfamily level analysis revealed that P450s belonging to the subfamilies H and K in the CYP107 family were part of secondary metabolite BGCs, despite subfamily J being dominant in that family (Figure 4). In the CYP152 family, P450s belonging to subfamily A were found to be part of the secondary metabolite BGCs. Analysis of P450s involving secondary metabolite biosynthesis revealed that P450s belonging to the CYP107 family are dominant by 61% (68 P450s) of all P450s (112 P450s) involved in secondary metabolite BGCs, followed by CYP113, CYP152, CYP102, CYP109, and CYP1179 ( Figure 8). It is interesting to note that these P450 families are highly-populated in Bacillus species (Figures 3 and 5). This further supports the previous hypothesis that species populate specific P450s if they are useful in their adaptation to certain ecological niches or useful in their physiology [31][32][33][34][35]. Considering the large number of P450s, their widespread nature, and part in secondary metabolite BGCs, it can be hypothesized that the P450s belonging to the CYP107, CYP102, CYP109, CYP152, and CYP113 families play a key role in Bacillus species' physiology, including synthesis of different secondary metabolites. Despite secondary metabolite BGCs being found on plasmid DNA, no P450 was found to be part of these clusters. Analysis of association between P450 families and secondary metabolite BGCs revealed that CYP107 family P450s were mostly associated with BGCs Nrps-Transatpks-Otherks and Transatpks-Nrps; CYP113 family P450s are associated with Transatpks BGC, and CYP134 family P450s are associated with other, a putative gene cluster (Table 1). Transatpks-Nrps CYP107H4 9 Bacteriocin-Nrps CYP113L1 Bacillus velezensis UCMB5033 6 Transatpks-Nrps CYP107K3 7 Transatpks-Nrps CYP107H4 10 Transatpks CYP113L1 Nrps-Transatpks-Otherks CYP107K1 7 Other CYP102A48 9 Other CYP134A1 Transatpks-Nrps CYP107K3 11 Transatpks CYP113L1

Bacillus P450s Indeed Involved in the Synthesis of Secondary Metabolites
Based on in silico analysis (in this study), seven P450 families, namely CYP107, CYP113, CYP134, CYP152, CYP102, CYP109, and CYP1179, were identified as part of secondary metabolite BGCs in Bacillus species (Figure 8). Functional data available for some P450s confirms that the predicted P450s, in this study, are indeed involved in biosynthesis of different secondary metabolites, and some of the P450 families, such as CYP105, CYP107, and CYP109, have been found to display highly-diverse functions [9,12,36]. CYP102A1 from B. megaterium [24,37,38] and CYP152A1 from B. subtilis [39,40] were found to be fatty acid hydroxylases. P450s belonging to the CYP106, CYP107, CYP109, and CYP134 families were found to hydroxylate different steroids, albeit with different substrate specificities [22]. CYP134A1 is involved in the synthesis of pulcherriminic acid, a natural product [41], and CYP107H1 (P450 biol) is involved in the synthesis of polyketides [42]. Based on functionally characterized homolog P450s from other organisms, CYP105, CYP107, and CYP109 family P450s have been found to be associated with the degradation and

Bacillus P450s Indeed Involved in the Synthesis of Secondary Metabolites
Based on in silico analysis (in this study), seven P450 families, namely CYP107, CYP113, CYP134, CYP152, CYP102, CYP109, and CYP1179, were identified as part of secondary metabolite BGCs in Bacillus species (Figure 8). Functional data available for some P450s confirms that the predicted P450s, in this study, are indeed involved in biosynthesis of different secondary metabolites, and some of the P450 families, such as CYP105, CYP107, and CYP109, have been found to display highly-diverse functions [9,12,36]. CYP102A1 from B. megaterium [24,37,38] and CYP152A1 from B. subtilis [39,40] were found to be fatty acid hydroxylases. P450s belonging to the CYP106, CYP107, CYP109, and CYP134 families were found to hydroxylate different steroids, albeit with different substrate specificities [22]. CYP134A1 is involved in the synthesis of pulcherriminic acid, a natural product [41], and CYP107H1 (P450 biol) is involved in the synthesis of polyketides [42]. Based on functionally characterized homolog P450s from other organisms, CYP105, CYP107, and CYP109 family P450s have been found to be associated with the degradation and biotransformation of a diverse array of xenobiotics and secondary metabolites [36,43,44]. CYP113 P450s are involved in the biosynthesis of secondary metabolites such as erythromycin [45,46] and tylosin [47,48]. Despite CYP102 and CYP152 P450s being found in secondary metabolite BGCs (in this study), their role in secondary metabolites biosynthesis has not been yet elucidated.

Species and Database
In total, 128 Bacillus species genomes available for public use at KEGG (https://www.genome. jp/kegg-bin/show_organism?category=Bacillus) were used in this study (Table S4). Bacillus species used in this study, along with their names, species codes, and individual genome database links, were presented in Table S4.

Genome Data Mining and Annotation of P450s
P450 mining in Bacillus species was carried out following the methods described elsewhere [25]. Briefly, the whole proteome of Bacillus species was downloaded from the databases listed in Table  S4, and subjected to the NCBI Batch Web CD-Search Tool (http://www.ncbi.nlm.nih.gov/Structure/ bwrpsb/bwrpsb.cgi). Proteins that belong to a P450 superfamily were selected and based on the International P450 Nomenclature Committee rule; proteins with >40% identity and >55% identity were grouped under the same family and subfamily, respectively [27][28][29]. Proteins with less than 40% identity were assigned to a new P450 family.

Phylogenetic Analysis of P450s
The phylogenetic tree of Bacillus species P450s was built as described elsewhere [25], with slight modifications. Briefly, the Bacillus P450s protein sequences along with the outgroup M. tuberculosis CYP51B1 (Rv0764c) protein were aligned by MAFFT v6.864 [49], embedded on the Trex web server [50]. Then, the alignments were automatically subjected to tree inferring and optimization by the Trex web server. Finally, the best-inferred trees were visualized, colored, and generated by iTOL (http://itol.embl.de/about.cgi) [51].

P450 Diversity Percentage Analysis
P450 diversity percentage analysis was carried out as described elsewhere [25,34]. Briefly, the P450 diversity percentage in Bacillus species was measured as a percentage contribution of the number of P450 families in the total number of P450s.

Generation of P450 Profile Heat-Maps
The presence or absence of P450s in Bacillus species was shown with heat-maps generated using P450 family data. The data was represented as −3 for family presence (green) and 3 for family absence (red). A tab-delimited file was imported into Mev (Multi-experiment viewer) [52]. Hierarchical clustering using a Euclidean distance metric was used to cluster the data. A hundred and twenty-eight Bacillus species formed the horizontal axis (see Supplementary dataset 3 for codes) and CYP family numbers formed the vertical axis.

Secondary Metabolite BGCs Analysis
Individual Bacillus species genome ID and plasmids IDs from the various species databases (Table S2) were submitted to anti-SMASH [30] for identification of secondary metabolite BGCs. Results were downloaded both in the form of gene cluster sequences and Excel spreadsheets representing species-wise cluster information, and finally, P450s that are part of a specific gene cluster were identified. Standard gene cluster abbreviation terminology available at anti-SMASH database [30] was maintained in this study.

Comparative Analysis of P450s
Mycobacterial P450s were retrieved from a published article [25] and used for comparative analysis with Bacillus species P450s. P450 families and subfamilies and the P450 diversity percentage were compared between the genera Mycobacterium and Bacillus.

Conclusions
Comparative analysis of P450s in bacterial species is gaining momentum and the availability of a large number of bacterial genome sequences is fueling this process. This study is an attempt to perform a comprehensive comparative analysis of P450s and to identify the P450s involved in secondary metabolite synthesis in Bacillus species. Future work involves understanding the role of different Bacillus P450s, identified in this study, in the synthesis of various secondary metabolites.