Physicochemical Characterization and Comparative Analysis of Ribulose 1,5 Biphosphate Carboxylase-Oxygenase Like Proteins (RLP) from Halophilic Chromohalobacter salexigens BKL 5 and Non-Halophilic Counterparts Using in Silico Approaches

This study is intended to analyze and compare halophilic RuBisCo-like proteins (RLP) with non-halophilic homologs using computational analysis approaches. The data used were protein sequences from NCBI and protein sequences from clonned and expressed RuBisCo-like proteins Chromohalobacter salexigens BKL 5 from our previous study. The analysis was in the form of Principal Component Analysis (PCA) and Partial Least Square (PLS). The tools used were Origin Lab Full Version 2019 and Metaboanalyst 5.0. The parameters tested were isoelectric point (pI), negative charge (acidic amino acids), aliphatic index, and GRAVY index using the ProtParam tool from Expasy. Calculations analysis showed that the amino acid residues in halophilic RuBisCo-like proteins that differ significantly from their homologs were glutamic acid and alanine, while the variables that could be observed as differentiating were negative charge and aliphatic index. Analysis shows that some parameters obtained from the program can be used as discriminants to differentiate halophilic and non-halophilic RuBisCo-like proteins.

One of the characteristics of halophilic proteins is the high content of negatively charged amino acid residues, especially those from extreme halophilic bacteria, which can reach 20% of the total residue [5].These negatively charged residues contribute to maintaining the stability of the protein structure through the interaction of intramolecular charges and the neutralization of oppositely charged ions in the medium.Another characteristic of halophilic proteins is their low content of hydrophobic residues [6].This is an adaptation of proteins to reduce hydrophobic interactions, thereby maintaining the stability of halophilic proteins in highsalt environments [7].
Researchers have conducted several explorations to evaluate the relationship between the structure of halophilic proteins and their stability.They conducted several studies at all structural levels, from primary and secondary to tertiary, between halophilic and nonhalophilic proteins.The results showed that the residue composition differs between halophilic and nonhalophilic proteins on the surface and interior areas [7].Electrostatic interactions and unique amino acid composition are the unique characteristics that distinguish halophilic and non-halophilic proteins.
Computational analysis has become a new approach to accelerate research in the protein field [8].The essence of computational analysis is finding patterns and distribution in existing data and subsequently making predictions for new datasets.These data typically comprise data points characterized by various features or descriptors, such as protein sequences, their secondary and tertiary structures, and the physicochemical properties of amino acids.The number of features in such datasets usually varies from dozens to thousands, rendering these problems high-dimensional.In terms of halophilic proteins, research based on computational analysis has been carried out by Zhang and Ge [4] and Asy'ari et al. [9].
We have isolated the RuBisCo-like proteins (RLP) gene from Chromohalobacter salexigens BKL 5, sourced from the hypersaline environment of the Bleduk Kuwu Mud Volcano in Central Java, where the salt content is approximately 10% [10].This gene was cloned onto the pCold plasmid and subsequently transformed and expressed in the Escherichia coli BL 21 system.These findings remain unpublished.Based on its habitat, this type of bacteria is classified as an extremophile species, organisms that tolerate and thrive in the most extreme and challenging conditions of life.As a result of these extreme environmental insults, extremophiles have developed several interesting adaptations to cellular membranes, proteins, and extracellular metabolites [11].
These environments are often so unique that the organisms within them exhibit highly specialized adaptations at the protein level.For instance, they possess enzymes capable of functioning effectively in extreme conditions without denaturing.Such proteins can operate under conditions where mesophilic proteins might fail.Some variables can lead to an environment being considered extreme, such as pH, temperature, relative salinity, or the presence of other external factors, such as heavy metals or radiation [12].In this case, Chromohalobacter salexigens BKL 5 are categorized as halophiles, organisms that can survive the ionic stresses placed upon them by saline environments.
Chromohalobacter salexigens BKL 5 species is a moderately halophilic gram-negative bacterium in the form of single or paired cells.The colonies of this bacterium are creamy, round, and 2 mm in diameter.These bacteria grow optimally at a salt concentration of 7.5 to 10%.In a freshwater environment, CS was found and isolated from mud in the Bledug Kuwu Mud Volcano, Central Java [10].
RuBisCo was grouped into four groups: I, II, III, and IV.Groups I, II, and III function to catalyze the carboxylation or oxygenation of ribulose 1,5bisphosphate (RuBP).Group IV, known as RuBisCo-like protein (RLP), is unable to fix CO2 due to substitution at the amino acid residue in its active site [13].RLP plays a role in the methionine metabolic pathway and catalyzes the enolization reaction of the RuBP analog substrate, specifically the 2,3-keto-5-methylthiopentyl-1-P analog [13,14].
This study represents the first instance of using computational methods for a comparative analysis of halophilic and non-halophilic RuBisCo-like proteins (RLPs).The objective was to identify and compare the differences between these proteins based on their amino acid compositions.The analysis utilized protein sequences sourced from the National Center for Biotechnology Information (NCBI) at www.ncbi.nlm.nih.gov.Parameters related to amino acids, such as isoelectric point (pI), aliphatic index, GRAVY index, and negative charge, were key factors in this comparative study.

Materials
Fifteen RLP sequences were obtained from NCBI (www.ncbi.nlm.nih.gov) in FASTA format (Table 1).The literature search results showed that six of these RLP types belonged to halophilic bacteria, while the remaining nine were derived from non-halophilic bacteria.The dataset comes from various bacterial classes, with the halophilic group specifically originating from bacteria classified as moderately halophilic.

Analysis of Amino Acid Composition
RLP amino acid sequences were analyzed using the ProtParam tool available on Expasy (www.expasy.org).The results obtained were the composition of amino acid residues and several parameters, such as pI (isoelectric point), aliphatic index, and GRAVY index [19,20].

Calculation of PLS Regression
Partial Least Squares (PLS) regression is a method that reduces several variables into a smaller set of predictors [35].These predictors were utilized to determine regressions and correlations.VIP (Variable important in Projection) analysis was performed to analyze the most significant amino acid residues and parameters that differentiate between halophilic and non-halophilic RLPs.This analysis was conducted using the Metaboanalyst 5.0 software [36].The 3D structures of the RLP proteins and the positions of the amino acids were visualized using the ChimeraX tool, following the guidelines provided in the Chimera user manual (www.cgl.ucsf.edu)[37].

Prediction of 3D Protein Structure, Validation, and Visualization
The results of 3D structural prediction with AlphaFold are shown in Figure 1a.The structural validation of the homologation results gave a value of 92.54% and a z-score of -6.57(Figure 1b) (ProSA).ProSA is a tool used to calculate the energy required for amino acids to fold into their functional configurations.A more negative value indicates greater protein stability.Meanwhile, the z-score is a value describing the quality of the tested protein model.The Ramachandran plot analysis conducted through the PROCHECK web server indicates the protein structure contains amino acid residues located in the allowed positions.Specifically, 89.8% of the residues are located in the most favored regions, 9.6% in the additionally allowed regions, and 0.5% in the generous regions, as shown in Figure 1c.This shows that structure prediction using Alphafold yields excellent results and can be further utilized as a reference for testing and analyzing the RLP structure of Chromohalobacter salexigens BKL 5.

Analysis of Amino Acid Composition
Amino acids are the building blocks of proteins whose composition and configuration determine function and activity.The function of a protein is strongly dependent on its structure [39].Alterations in protein structures from organisms evolved to extreme environments vary in their mechanisms for maintaining optimal activity [40].
RLP functions as an enolase catalyst and exhibits significant structural variation.Consequently, these proteins display very low similarity to each other, with about only 30% resemblance in their structures [41,42].This phenomenon occurs because the configuration of amino acids in RLP tends to vary, even though the composition of the amino acids in these proteins generally remains relatively consistent.Analysis of amino acid composition using ProtParam from Expasy, visualized with a heatmap, shows that RLP from halophilic and non-halophilic bacteria have similar patterns as depicted in Figure 2.

Calculation of PLS Regression
Further analysis was carried out to validate the heatmap results with a chemometric approach based on machine learning.The PLS method was used to determine the most significant variable as a discriminant for halophilic and non-halophilic classes.The results show that the two amino acids with high VIP scores are alanine and glutamic acid (Figure 3a).In the halophilic RLP group, the amount of glutamic acid was greater than in the non-halophilic class.Conversely, the alanine composition in halophilic RLP was less than in nonhalophilic RLP.Analysis of the correlation coefficient on glutamic acid between halophilic and non-halophilic RLP strengthened the PLS results.Specifically, it revealed an inverse correlation between the levels of glutamic acid and alanine (Figure 3b), where an increase in the amount of glutamic acid would be followed by a decrease in the amount of alanine.
Water is less available to protein at high salt concentrations (higher than 0.1 M) because most water surrounds salt in an ionic lattice [43].The lower availability of water can cause hydrophobic amino acids in a protein to lose hydration and tend to aggregate.Subsequently, high salt concentrations strengthen hydrophobic interactions in a protein.Salt also interferes with the electrostatic interactions between charged amino acids.Non-halophilic proteins cannot function in high salt concentrations because the hydrophobic and electrostatic interactions they normally rely on for proper folding and maintaining stability are greatly altered.This can even lead to destabilization of the protein, potentially causing global unfolding and aggregation, ultimately leading to precipitation [44].
One of the most notable adaptations of halophilic proteins is the large increase in acidic residues on the protein's surface, like glutamic and aspartic acid.This is almost ubiquitous with halophilic proteins and can distinguish between halophilic and non-halophilic protein sequences [45].PLS analysis of the RLP protein shows that the halophilic acidic amino acid significantly different from its non-halophilic homolog is glutamic acid (Figure 3a).Glutamic acid in halophilic RLP is more abundant than its non-halophilic counterpart.Figure 4 shows that the glutamic acid position in the RLP protein is distributed more on the protein surface.This indicates that the adaptation of RLP to a hypersaline environment is that it has a higher glutamic acid composition and is distributed on its surface.Glutamic acid, like aspartic acid, has a high ability and is very important for protein solubilization.There are several possible roles for these acidic residues.It is thought that the increased negative charge on the protein's surface allows the protein to compete with positively charged ions from salt and facilitates excess protein hydration [46,47,48].
The description of the distribution of alanine in halophilic RLP strengthens the notion that the smaller amount of alanine (10% total average on halophilic RLP compared to more than 12% average total on nonhalophilic RLP) allows the RLP protein to be more flexible.Another adaptation made to maintain protein stability is increasing interaction with an environment where there is minimal availability of water as a medium by reducing the number of hydrophobic amino acids.In this halophilic RLP, the amino acid in question is alanine (Figure 3).Reduction of these amino acids is essential to decrease hydrophobic interactions, thereby potentially enhancing protein flexibility under high salt stress conditions [49].
The availability of free water molecules in a halophilic environment is deficient because water is surrounded by positive ions from salts [50].Halophilic proteins carry out several general adaptations, including a decreased presence of hydrophobic residues and an increased abundance of acidic amino acid residues [51].Negative charges on the protein surface will interact with hydrated ions, while low hydrophobic residues will reduce aggregation [52].Amino acid residues with polar characteristics can enhance stability by engaging in interactions with water within the cytoplasm.Another adaptation involves abundant residues with relatively polar side chains, such as threonine or serine, which are hydrophobic residues [53].Analysis of several variables obtained from ProtParam was used to determine which factors were most used to distinguish the halophilic class.The heatmap results show that halophiles tend to have a high negative charge and a low aliphatic index (Figure 5).These results were later confirmed by conducting a PLS analysis, and it showed consistent results where the VIP with the highest score was the negative charge and aliphatic index (Figure 6).
The aliphatic index is the relative volume of a protein occupied by aliphatic/hydrophobic side chains of specific amino acids (alanine, valine, leucine, and isoleucine) [54,55].The VIP analysis results show that the aliphatic index of halophilic RLP is relatively lower than that of its nonhalophilic homologs (Figure 6).This reduction is primarily due to a lower presence of alanine residues, which contribute to the aliphatic content, by approximately 2% (calculations not shown) [48].The low aliphatic index value is also related to the folding rate of a protein, where the lower the index aliphatic value, the smaller the folds.The low aliphatic index of halophilic RLP allows the RLP protein to tend to be more flexible and allows the protein to be more exposed to solvents [56].
Siglioccolo et al. [57] determined that the hydrophobic interaction within the core of halophilic proteins is consistently less than that observed in nonhalophilic proteins.They propose that the lower hydrophobic contact in the core may counterbalance the increased strength of hydrophobic interactions in high salt concentrations.Weaker hydrophobic interactions, due to smaller hydrophobic residues, can enhance the flexibility of proteins in high-salt environments.This increased flexibility helps prevent the protein's hydrophobic core from becoming excessively rigid [43].
Low hydrophobic interactions in general, particularly at the level of core and conserved hydrophobic contacts, may help the structure prevent loss of function in hypersaline environments.Shrinking of hydrophobic contacts must be even more critical for the early stages of folding when intramolecular hydrophobic nuclei must correctly form to guide the polypeptide through the folding funnel to the native state [57,58].

PCA Analysis
Principal Component Analysis (PCA) of several variables obtained from ProtParam shows that these parameters can be effectively utilized to determine whether a protein, specifically RLP, is halophilic (Figure 7).The variables used were PC1 (negative charge; aliphatic index) and PC2 (GRAVY index; isoelectric point (pI)), which represent 88.06% of the number of variants and are considered quite representative.PCA is a method that can reduce many variables into unrelated variables and is referred to as the principal component that can be used to conclude.By utilizing parameters such as the isoelectric point (pI), negative charge (from acidic amino acids), aliphatic index, and GRAVY index in combination with computational analysis, it is possible to distinguish halophilic RLP (red writing) from its non-halophilic counterpart (black writing).This method proves extremely useful for the rapid identification of proteins, particularly when the origin or habitat of a protein with a specific sequence is unknown.

Conclusion
Calculations through PLS showed that glutamic acid and alanine are the amino acids with the most significant differences, as indicated by the highest VIP values.The two amino acids are unique in that they differentiate amino acids as an adaptation of RLP to a halophilic environment.Glutamic acid, an acidic amino acid, played a crucial role in maintaining the solubility of RLP under high salt stress conditions and was essential for the formation of salt bridges that help sustain protein structure.As one of the hydrophobic amino acids, alanine was present in lower amounts in halophilic RLP compared to its mesophilic counterparts.This reduction leads to decreased hydrophobic interactions within the protein, thereby increasing its exposure to solvents.PCA analysis of parameters such as the isoelectric point (pI), negative charge (from acidic amino acids), aliphatic index, and GRAVY index yielded positive results (halophilic RLP was completely separated from its mesophilic homolog); therefore, these parameters can be effectively utilized as variables in chemometric procedures to determine whether an RLP protein is halophilic or non-halophilic.

Figure 2 .Figure 3 .
Figure 2. Heatmap of amino acid composition in several RLP.The area in the square box is the halophilic RLP.The scale on the right column describes the level of amino acid composition; the greater the value, the higher the composition

Figure 4 .
Figure 4. Distribution of glutamic acid (red) and alanine (blue) in the 3D RLP structure of Chromohalobacter salexigens BKL 5 visualized using Chimera

Figure 6 .
Figure 6.Results of PLS analysis of (a) several VIP score parameters and (b) correlation matrix

Figure 7 .
Figure 7. PCA of RLP from several bacteria.Red writing indicates halophilic, while black writing is nonhalophilic