How Quickly Do Proteins Fold and Unfold, and What Structural Parameters Correlate with These Values?

The correlations between the logarithm of the unfolding rate of 108 proteins and their structural parameters were calculated. We showed that there is a good correlation between the logarithm of folding and unfolding rates (0.79) and protein stability and unfolding rate (0.79). Thus, the faster the protein folds, the faster it unfolds. Folding and unfolding rates are higher for the proteins with two-state kinetics, in comparison with the proteins with multi-state kinetics. At the same time, two-state bacterial proteins folds and unfolds two orders of magnitude faster than two-state eukaryotic proteins, and multi-state bacterial proteins folds and unfolds slower than multi-state eukaryotic proteins. Despite the fact that the folding rates of thermophilic and mesophilic proteins are close, the unfolding rates of thermophilic proteins is about two orders of magnitude lower than for mesophilic proteins. The correlation between unfolding rate and stability of thermophilic proteins is high (0.90). We also found that the unfolding rate correlates with such structural parameters as: size of the protein, radius of the cross-section, logarithm of absolute contact order, and radius of gyration. This information will be useful for engineering and designing new proteins with desired properties.


Introduction
The problem of predicting folding rates (kf ) for proteins with two-state and multi-state kinetics is still important and extensively studied [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Many articles are devoted to the study of protein folding rates and their correlation with various structural parameters [2,[6][7][8]16]. In 1998, a relative contact order (rCO) parameter was suggested, which is the average distance along the sequence between all pairs of contacting residues, normalized to the size of the protein (number of amino acid residues, further protein length). This parameter reflects the topological complexity of the protein chain. It was shown that the rCO correlates well (correlation coefficient is 0.81) with the logarithm of the folding rate for 12 two-state proteins [2]. Subsequent studies have shown that there is no correlation between rCO and logarithm of the folding rate of proteins [6,7,16]. It turned out that only absolute contact order (AbsCO, contact order multiplied by protein length) correlated with the logarithm of the folding rate (the correlation coefficient is −0.77) [16]. It was found that the structural parameters, depending on the protein length (L), correlated well with the logarithm of the folding rate [16]. In the set of papers [13,15,16,21,22], the authors considered the different structural parameters of protein globule compactness: radius of gyration (R g ); normalized radius of gyration (R g /R g *, where R g * is the radius The correlation between the logarithm of the unfolding and folding rates is 0.79 for all proteins. Moreover, this correlation is better for two-state (0.78) than for multi-state proteins (0.73). The separation of 108 proteins by structural classes (α, β, α/β, and α + β) revealed that correlation between the logarithm of the folding and unfolding rates is better for proteins from α and β (0.78 and 0.75) classes, in comparison with the proteins from α/β and α + β classes (0.59 and 0.60). Moreover, two-state proteins make the largest contribution to this correlation (see Table 1).
Correlations with ln(ku) All (two + multi) Two-state Multi-state The correlation between the logarithm of the unfolding and folding rates is 0.79 for all proteins. Moreover, this correlation is better for two-state (0.78) than for multi-state proteins (0.73). The separation of 108 proteins by structural classes (α, β, α/β and α + β) revealed that correlation between the logarithm of the folding and unfolding rates is better for proteins from α and β (0.78 and 0.75) classes, in comparison with the proteins from α/β and α + β classes (0.59 and 0.60). Moreover, two-state proteins make the largest contribution to this correlation (see Table 1).
It was previously shown that L, ln(AbsCO), V asa /S asa , and R g correlate well with the logarithm of the protein folding rate [16]. Thus, it can be assumed that if these parameters correlate well with the logarithm of the folding rate, then they will also correlate well with the logarithm of the unfolding rate. In this case, four parameters were examined: L is a number of amino acid residues in protein, ln(AbsCO) is the logarithm of the absolute contact order, V asa /S asa is a radius of cross-section, and R g is a radius of gyration. The values of structural parameters considered in this paper (L, ln(AbsCO), V asa /S asa and R g ) are lower for two-state proteins than for multi-state proteins: 78 ± 5 vs. 130 ± 8 for L, 3.14 ± 0.05 vs. 3.59 ± 0.06 for V asa /S asa , 6.91 ± 0.06 vs. 7.22 ± 0.06 for ln(AbsCO), and 12.1 ± 0.3 vs. 14.2 ± 0.3 for R g (see Table 2). The logarithms of the folding and unfolding rates are higher for two-state proteins, in comparison with multi-state proteins: 6.08 ± 0.50 vs. 2.51 ± 0.59 for the folding rate and −1.51 ± 0.79 vs. −6.09 ± 1.03 for the unfolding rate, respectively (see Figure 1B). Table 2. Average values of structural parameters for 108 proteins. V asa /S asa = radius of cross-section. R g = radius of gyration. ln(AbsCO) = logarithm of the absolute contact order. L = length of the protein. For 108 proteins, the correlations between the logarithm of the unfolding rate (ln(ku)) and structural parameters such as L, V asa /S asa , ln(AbsCO), and R g were calculated (Table 3 and Figure 2). V asa /S asa and ln(AbsCO) are better correlated with the logarithm of the unfolding rate of two-state proteins. For two-state proteins, these correlations are −0.79 and −0.87, in comparison with −0.63 and −0.69 for multi-state proteins. The correlation between R g and the logarithm of the unfolding rate is almost the same for two-state and multi-state proteins (−0.61 vs. −0.60, respectively). Moreover, L is better correlated with the logarithm of the unfolding rate of multi-state proteins. Good correlation (0.79) between the protein stability (−(lnkf − lnku)) and the logarithm of the unfolding rate has been observed. Table 3. Correlations logarithm of the unfolding rate (ln(ku)) with protein stability (−(lnkf − lnku)) and structural parameters (L, V asa /S asa , ln(AbsCO) and R g ) for 108 proteins. After the separation of 108 proteins by structural classes (α, β, α/β, and α + β), we observed that correlations between the logarithm of the unfolding rate (ln(ku)) and L, Vasa/Sasa, ln(AbsCO), and Rg are better for proteins from α and β classes (see Table 4). These correlations are the highest for proteins from β class (higher than 0.8). The largest contribution to these correlations made two-state proteins (see Table 4). The exception is only for correlation between ln(ku) and L for proteins from β class. This correlation is higher for multi-state proteins (−0.86), in comparison with two-state proteins (−0.84). Table 4. Correlations between the logarithm of the unfolding rate (ln(ku) and structural parameters (L, Vasa/Sasa, ln(AbsCO), and Rg) for four structural classes of proteins (α, β, α/β, and α + β).

Unfolding Rates of Bacterial and Eukaryotic Proteins
To find the dependence of the unfolding rates on the origin of the proteins, the 42 bacterial and 53 eukaryotic proteins from our database were separately studied. Two-state bacterial proteins fold and unfold faster than two-state eukaryotic proteins. For multi-state proteins, we observed that bacterial proteins fold and unfold slower than eukaryotic proteins (see Figure 3 and Figure 4). The Correlations of the logarithm of the unfolding rates of 108 proteins with their structural parameters: L-length of the protein, V asa /S asa -radius of the cross-section, ln(AbsCO)-logarithm of the absolute contact order, and R g (radius of gyration). There is a line approximation of points and its equation: orange line corresponds to two-state proteins and purple line to multi-state proteins. R 2 is a linear approximation reliability.
After the separation of 108 proteins by structural classes (α, β, α/β and α + β), we observed that correlations between the logarithm of the unfolding rate (ln(ku)) and L, V asa /S asa , ln(AbsCO), and R g are better for proteins from α and β classes (see Table 4). These correlations are the highest for proteins from β class (higher than 0.8). The largest contribution to these correlations made two-state proteins (see Table 4). The exception is only for correlation between ln(ku) and L for proteins from β class. This correlation is higher for multi-state proteins (−0.86), in comparison with two-state proteins (−0.84). Table 4. Correlations between the logarithm of the unfolding rate (ln(ku) and structural parameters (L, V asa /S asa , ln(AbsCO) and R g ) for four structural classes of proteins (α, β, α/β and α + β).

Unfolding Rates of Bacterial and Eukaryotic Proteins
To find the dependence of the unfolding rates on the origin of the proteins, the 42 bacterial and 53 eukaryotic proteins from our database were separately studied. Two-state bacterial proteins fold and unfold faster than two-state eukaryotic proteins. For multi-state proteins, we observed that bacterial proteins fold and unfold slower than eukaryotic proteins (see Figures 3 and 4). The same result was observed when the dataset consisted of 35 bacterial and 38 eukaryotic proteins [23]. The correlation between the logarithm of the unfolding and folding rates is 0.73 for bacterial and 0.75 for eukaryotic proteins. Moreover, for bacterial proteins, this correlation is better for two-state (0.69) than for multi-state proteins (0.45). For eukaryotic proteins, this correlation is better for multi-state (0.81) than for two-state proteins (0.72). Values V asa /S asa , ln(AbsCO), and R g are slightly higher for the bacterial proteins, and this gap increases for multi-state proteins: 3.74 ± 0.07 vs. 3.47 ± 0.07 for V asa /S asa , 7.40 ± 0.07 vs. 7.14 ± 0.10 for ln(AbsCO), and 14.8 ± 0.5 vs. 13.9 ± 0.4 for R g , respectively (Table 5).
Biomolecules 2020, 9, x 6 of 14 same result was observed when the dataset consisted of 35 bacterial and 38 eukaryotic proteins [23]. The correlation between the logarithm of the unfolding and folding rates is 0.73 for bacterial and 0.75 for eukaryotic proteins. Moreover, for bacterial proteins, this correlation is better for two-state (0.69) than for multi-state proteins (0.45). For eukaryotic proteins, this correlation is better for multi-state (0.81) than for two-state proteins (0.72). Values Vasa/Sasa, ln(AbsCO), and Rg are slightly higher for the bacterial proteins, and this gap increases for multi-state proteins: 3.74 ± 0.07 vs. 3.47 ± 0.07 for Vasa/Sasa, 7.40 ± 0.07 vs. 7.14 ± 0.10 for ln(AbsCO), and 14.8 ± 0.5 vs. 13.9 ± 0.4 for Rg, respectively (Table 5).  Then, the correlations between the logarithm of the unfolding rate and structural parameters for bacterial and eukaryotic proteins were investigated. The correlations between the logarithm of the unfolding rate and L, Vasa/Sasa, and ln(AbsCO) are almost the same for all bacterial and eukaryotic proteins: −0.67 vs. −0.68 for L, −0.72 vs. −0.69 for Vasa/Sasa, and −0.80 vs. −0.79 for ln(AbsCO), respectively (Table 6 and Figure 4). The difference is observed only for Rg, which correlates better with the logarithm of the unfolding rate of bacterial proteins (−0.71). If we consider these correlations for two-state and multi-state bacterial and eukaryotic proteins separately, we get the following picture. For two-state proteins, the correlation between the logarithm of the unfolding rate and Vasa/Sasa is almost the same for bacterial and eukaryotic proteins (−0.75 vs. −0.77). Rg and ln(AbsCO) better correlate with the logarithm of the unfolding rate of two-state bacterial proteins than with eukaryotic proteins (−0.86 vs.   Then, the correlations between the logarithm of the unfolding rate and structural parameters for bacterial and eukaryotic proteins were investigated. The correlations between the logarithm of the unfolding rate and L, V asa /S asa , and ln(AbsCO) are almost the same for all bacterial and eukaryotic proteins: −0.67 vs. −0.68 for L, −0.72 vs. −0.69 for V asa /S asa , and −0.80 vs. −0.79 for ln(AbsCO), respectively (Table 6 and Figure 4). The difference is observed only for R g , which correlates better with the logarithm of the unfolding rate of bacterial proteins (−0.71). If we consider these correlations for two-state and multi-state bacterial and eukaryotic proteins separately, we get the following picture. For two-state proteins, the correlation between the logarithm of the unfolding rate and V asa /S asa is almost the same for bacterial and eukaryotic proteins (−0.75 vs. −0.77). R g and ln(AbsCO) better correlate with the logarithm of the unfolding rate of two-state bacterial proteins than with eukaryotic proteins (−0.86 vs. −0.77 for ln(AbsCO) and −0.64 vs. −0.47 for R g , respectively). L, on the contrary, correlates better with the logarithm of the unfolding rate of two-state eukaryotic proteins (−0.61 vs. −0.75). For multi-state proteins, we observed the same picture as for two-state proteins for correlations of L and R g with the logarithm of the unfolding rate. Both V asa /S asa and ln(AbsCO) correlate better with the logarithm of the unfolding rate of multi-state eukaryotic proteins than with bacterial proteins:    Table 6. Correlations between the logarithm of the unfolding rate (ln(ku)) and structural parameters (L, V asa /S asa , ln(AbsCO) and R g ) for two-state and multi-state bacterial and eukaryotic proteins. Amino acid composition of bacterial and eukaryotic proteins was analyzed ( Figure 5). The bacterial proteins with two-state kinetics are enriched in Ala, Gly, Lys, and Asn, compared with eukaryotic proteins with two-state kinetics. The eukaryotic proteins with two-state kinetics contain more His, Leu, Pro, Arg, Ser, and Trp, compared to the bacterial proteins with two-state kinetics (see Figure 5).

Correlations with ln(ku) All (Two + Multi) Two-State
Biomolecules 2020, 9, x 8 of 14 Table 6. Correlations between the logarithm of the unfolding rate (ln(ku)) and structural parameters (L, Vasa/Sasa, ln(AbsCO), and Rg) for two-state and multi-state bacterial and eukaryotic proteins. Amino acid composition of bacterial and eukaryotic proteins was analyzed ( Figure 5). The bacterial proteins with two-state kinetics are enriched in Ala, Gly, Lys, and Asn, compared with eukaryotic proteins with two-state kinetics. The eukaryotic proteins with two-state kinetics contain more His, Leu, Pro, Arg, Ser, and Trp, compared to the bacterial proteins with two-state kinetics (see Figure 5).

Unfolding Rates of Proteins from Thermophilic and Mesophilic Organisms
Since a lot of attention was paid to the search for differences between thermophilic and mesophilic proteins -in particular, folding rates -we also decided to conduct our analysis for these proteins. All bacterial proteins were divided into thermophilic and mesophilic groups. Further in the text, we call proteins from thermophilic organisms as thermophilic proteins and proteins from mesophilic organisms as mesophilic proteins. The correlation between the logarithm of the unfolding and folding rates is better for mesophilic (0.76), in comparison with thermophilic (0.73) proteins. Moreover, for mesophilic proteins, this correlation is better for two-state (0.76) than for multi-state proteins (0.12). For thermophilic proteins, it is hard to say something, because there are only two proteins with multi-state kinetics. There is a correlation between stability and the logarithm of the unfolding rate for thermophilic (0.90) and mesophilic (0.73) proteins. The logarithm of the folding rate of thermophilic and mesophilic proteins are almost the same (4.75 ± 1.20 vs. 4.58 ± 0.79) ( Figure 6 and Table 7). Still, mesophilic proteins unfold faster than thermophilic proteins (−5.63 ± 2.31 vs. −3.27 ± 1.12). The same picture is observed for two-state thermophilic and mesophilic proteins. Schematic "chevron" plots for thermophilic and mesophilic proteins are presented in Figure 7.

Unfolding Rates of Proteins from Thermophilic and Mesophilic Organisms
Since a lot of attention was paid to the search for differences between thermophilic and mesophilic proteins -in particular, folding rates -we also decided to conduct our analysis for these proteins. All bacterial proteins were divided into thermophilic and mesophilic groups. Further in the text, we call proteins from thermophilic organisms as thermophilic proteins and proteins from mesophilic organisms as mesophilic proteins. The correlation between the logarithm of the unfolding and folding rates is better for mesophilic (0.76), in comparison with thermophilic (0.73) proteins. Moreover, for mesophilic proteins, this correlation is better for two-state (0.76) than for multi-state proteins (0.12). For thermophilic proteins, it is hard to say something, because there are only two proteins with multi-state kinetics. There is a correlation between stability and the logarithm of the unfolding rate for thermophilic (0.90) and mesophilic (0.73) proteins. The logarithm of the folding rate of thermophilic and mesophilic proteins are almost the same (4.75 ± 1.20 vs. 4.58 ± 0.79) ( Figure 6 and Table 7). Still, mesophilic proteins unfold faster than thermophilic proteins (−5.63 ± 2.31 vs. −3.27 ± 1.12). The same picture is observed for two-state thermophilic and mesophilic proteins. Schematic "chevron" plots for thermophilic and mesophilic proteins are presented in Figure 7.   Finally, the correlations of the logarithm of the unfolding rate and structural parameters for thermophilic and mesophilic proteins were examined (Table 8 and     Finally, the correlations of the logarithm of the unfolding rate and structural parameters for thermophilic and mesophilic proteins were examined (Table 8 and    Finally, the correlations of the logarithm of the unfolding rate and structural parameters for thermophilic and mesophilic proteins were examined (Table 8 Table 8. Correlations between the logarithm of the unfolding rate (ln(ku)) and structural parameters (L, V asa /S asa , ln(AbsCO) and R g ) for two-state and multi-state thermophilic and mesophilic proteins. The thermophilic proteins are enriched with Lys, Arg, and Val, in comparison with the mesophilic proteins, and enriched in Lys, Asp, Ala, and the mesophilic proteins contain more Asp, Asn, Ser, and Thr, in comparison with the thermophilic proteins ( Figure 9). The same can be said about two-state thermophilic and mesophilic proteins. These data are also consistent with those that we obtained earlier in the study of 373 pairs of structurally similar thermophilic and mesophilic proteins [27].

Correlations with ln(ku)
The thermophilic proteins are enriched with Lys, Arg, and Val, in comparison with the mesophilic proteins, and enriched in Lys, Asp, Ala, and the mesophilic proteins contain more Asp, Asn, Ser, and Thr, in comparison with the thermophilic proteins ( Figure 9). The same can be said about two-state thermophilic and mesophilic proteins. These data are also consistent with those that we obtained earlier in the study of 373 pairs of structurally similar thermophilic and mesophilic proteins [27]. Table 8. Correlations between the logarithm of the unfolding rate (ln(ku)) and structural parameters (L, Vasa/Sasa, ln(AbsCO), and Rg) for two-state and multi-state thermophilic and mesophilic proteins.

Discussion
In this paper, we tried to find parameters that are important for predicting the protein unfolding rates. For this, the database consists of 108 proteins with known unfolding and folding rates, and such structural parameters as L, ln(AbsCO), Vasa/Sasa, and Rg were considered.
The good correlation (0.79) between the logarithm of the unfolding rate and protein stability was observed for 108 proteins.
First, we divided the proteins in our database into two-states and multi-states. On average, the logarithms of the folding and unfolding rates are higher for two-state proteins, in comparison with

Discussion
In this paper, we tried to find parameters that are important for predicting the protein unfolding rates. For this, the database consists of 108 proteins with known unfolding and folding rates, and such structural parameters as L, ln(AbsCO), V asa /S asa , and R g were considered.
The good correlation (0.79) between the logarithm of the unfolding rate and protein stability was observed for 108 proteins.
First, we divided the proteins in our database into two-states and multi-states. On average, the logarithms of the folding and unfolding rates are higher for two-state proteins, in comparison with multi-state proteins. A good correlation (not lower than 0.70) for the logarithm of the folding and unfolding rates for two-and multi-state proteins was observed. It has been shown that the logarithm of the unfolding rate of two-state proteins correlate better with V asa /S asa (−0.79) and ln(AbsCO) (−0.87), and the logarithm of the unfolding rate of multi-state correlates better with L (−0.71).
Then, we separately studied bacterial and eukaryotic proteins from our database. It has been shown that two-state bacterial proteins fold and unfold faster than two-state eukaryotic proteins, and multi-state eukaryotic proteins fold and unfold faster than multi-state bacterial proteins. The logarithm of the unfolding rate of two-state bacterial proteins correlates better with ln(AbsCO) (−0.86) and R g (−0.64), and eukaryotic proteins correlate better with L (−0.75). For multi-state proteins, the following picture is observed: the logarithm of the unfolding rate of bacterial proteins correlates better with R g (−0.58), and eukaryotic proteins correlate better with L (−0.77), V asa /S asa (−0.69), and ln(AbsCO) (−0.81).
Finally, we separately studied the thermophilic and mesophilic bacterial proteins from our database. There is correlation of the logarithm of the unfolding rate with protein stability for thermophilic (0.90) and mesophilic proteins (0.73). It has been shown that the logarithm of the unfolding rates of thermophilic proteins are about two orders of magnitude lower than that of mesophilic proteins, but the logarithm of the folding rates of thermophilic and mesophilic proteins are almost the same. The logarithm of the unfolding rate of two-and multi-state thermophilic proteins correlate better with all considered structural parameters (L, V asa /S asa , ln(AbsCO) and R g ), in comparison with the mesophilic proteins.
We have tried to find out which parameters are most important for the prediction of the unfolding rates for proteins from different structural classes (α, β, α/β and α + β); proteins of different origins (bacterial and eukaryotic); and proteins from different organisms (thermophilic and mesophilic).

Conclusions
Thus, it has been shown that there is a good correlation between the logarithm of the unfolding and folding rates (0.79) and between the logarithm of the unfolding rate and proteins stabilities (0.79) for 108 proteins. The correlation between the unfolding and folding rates is better for: two-state (0.78), in comparison with multi-state (0.73) proteins; α and β proteins (0.78 and 0.75), in comparison with α/β and α + β protein (0.59 and 0.60) structural classes; eukaryotic (0.75), in comparison with bacterial (0.73) proteins; and mesophilic (0.76), in comparison with thermophilic (0.73) proteins. The structural parameter ln(AbsCO) better correlates with the logarithm of the unfolding rate for: all 108 proteins; proteins from α and β structural classes; and bacterial, eukaryotic, and mesophilic proteins, in comparison with other parameters (L, V asa /S asa and R g ).

Supplementary Materials:
The following are available online: http://www.mdpi.com/2218-273X/10/2/197/s1. Table S1. Logarithm of folding and unfolding rates and some structural parameters for 108 proteins. Table S2. Average values of different parameters for four structural classes of proteins (α, β, α/β and α + β).  Abbreviations ln(kf )-logarithm of the folding rate, ln(ku)-logarithm of the unfolding rate, ln(kmt)-logarithm of the mid-transition folding/unfolding rate, L-length of the protein, rCO-relative contact order, ln(AbsCO)-logarithm of the absolute contact order, V asa /S asa -radius of cross-section, and R g -radius of gyration.