A Combination of Feature Selection and Random Forest Techniques to Solve a Problem Related to Blast-Induced Ground Vibration

: In mining and civil engineering applications, a reliable and proper analysis of ground vibration due to quarry blasting is an extremely important task. While advances in machine learning led to numerous powerful regression models, the usefulness of these models for modeling the peak particle velocity (PPV) remains largely unexplored. Using an extensive database comprising quarry site datasets enriched with vibration variables, this article compares the predictive performance of ﬁve selected machine learning classiﬁers, including classiﬁcation and regression trees (CART), chi-squared automatic interaction detection (CHAID), random forest (RF), artiﬁcial neural network (ANN), and support vector machine (SVM) for PPV analysis. Before conducting these model developments, feature selection was applied in order to select the most important input parameters for PPV. The results of this study show that RF performed substantially better than any of the other investigated regression models, including the frequently used SVM and ANN models. The results and process analysis of this study can be utilized by other researchers / designers in similar ﬁelds.


Introduction
Blasting operation is one of the most frequently used techniques in rock excavation. However, this technique may result in several environmental impacts, including air over-pressure, back break, fly-rock, and ground vibration [1][2][3][4][5][6][7][8]. Typically, these impacts are the results of wasting a huge amount of explosive energy. The existing literature on surface mining and blasting operation confirmed that ground vibration is one the most undesirable phenomena and can damage surrounding structures [9][10][11][12]. These include damage of nearby rock masses, underground workings, slopes, railroads, roads, the existing groundwater conduits, and the adjacent area ecology [13][14][15][16]. Therefore, an accurate estimation of ground vibration helps engineers to reduce the environmental impacts of blasting. Table 1. Summary of the previous studies on peak particle velocity (PPV). ANFIS-adaptive neuro-fuzzy inference system; ANN-artificial neural network; FIS-fuzzy inference system; SVM-support vector machine; PSO-particle swarm optimization; ICA-imperialism competitive algorithm; CART-classification and regression trees; GEP-gene expression programming.

Methods
In this present study, the authors compared five ML techniques to identify the best-performing method. Firstly, two decision tree methods (CART and CHAID) that can be used for predicting a target variable with continuous values were selected. Then, the authors developed another tree-based model, random forest (RF). RF was developed since several studies in the literature acknowledged the better performance of RF compared to other tree-based methods. RF uses an ensemble technique that creates a final tree using more than 100 single trees. Thus, the results should be more accurate. Finally, the authors added two more models to this study to have a more comprehensive comparison of these methods. Thus, SVM and ANN were used since these two models showed desirable performance in previous studies, and since they can be used to predict a target variable with continuous values. Before developing these models, a feature selection (FS) technique was used to reduce the dimensionality of data and to identify/select the most important and relevant input variables.

Input Selection Technique
Several techniques and methods are available to select the relevant input variables before developing a new model. FS is one of the most frequently used techniques in ML for input selection [32]. This technique aims to reduce the dimensionality of data and remove irrelevant inputs. The FS improves the predictive accuracy of ML in several ways. For example, it enhances the efficiency of learning and the effectiveness of data collection [33]. The FS identifies the variable quality using the correlation between input variables and the target variable, and it selects those variables with the highest correlations. Typically, the FS ranks the input variables according to the intrinsic properties of the data and chooses the top k variables according to thresholds. The screening rules and cut-off values are shown in Table 2.

Decision Trees
Decision trees are among the most popular ML methods. There are several algorithms for developing decision tree models, including classification and regression trees (CART), chi-squared automatic interaction detection (CHAID), quick, unbiased, efficient, and statistical tree (QUEST), and C5. Among these algorithms, only CART and CHAID can be used for continuous target variables. CART is a binary decision tree, which was developed by Breiman [83]. CART seeks to achieve the best feasible split. To choose the best partition, CART attempts to lessen the impurity of leaf nodes. CART uses three main indices to select the best partition, including the Gini criterion entropy and Twoing criterion. The performance of the CART is influenced by the selection of these indices. A simple presentation of the CART structure is presented in Figure 1. CHAID is a widely used and non-binary decision tree, which was developed by Kass [84]. CHAID creates the decision tree using several sequential combinations and splits, based on a chisquare test. To avoid overfitting, which is one of the most common drawbacks of decision trees, it automatically prunes the tree. CHAID also produces a number of rules, whereby each has a confidence level (accuracy). The confidence level is defined as the ratio of records having the specific value for the target variable to the given values for the independent variables.
Similarly to the CART model, the same values were used for maximum tree depth, minimum records in parent branch, minimum records in child branch, combining rule for continues target, and the number of component models for boosting and/or bagging to develop the CHAID model. However, more settings were used to develop the CHAID decision tree model compared to the CART model. To fine-tune the CHAID tree-building process, five parameters were used: (1) significance level for splitting, (2) significance level for merging, (3) adjustment of significance values technique, (4) minimum change in expected cell frequencies, and (5) maximum iterations for convergence. The significance levels for splitting and merging were set as 0.05. To control the false-positive error rate, an adjustment of significance values technique known as the "Boferroni method" was used. In order to converge on the optimal estimate used in the chi-square test for a specific split, the minimum change in expected cell frequencies was used and set as 0.001. The last parameter specified the maximum number of iterations before stopping, whether convergence occurred or not. This parameter was set to 0.001.
Decision trees are regarded as unstable and high-variance models, which causes the trees to be prone to different generalization behavior with small changes in the training data [85]. The RF method can be used to avoid the abovementioned drawbacks. RF is a relatively new tree-based method and was developed by Breiman [86]. RF is an ensemble technique that produces more accurate prediction compared to other tree-based methods by merging a huge number of decision trees. RF uses a bagging technique to build each ensemble from different datasets. The bagging technique randomly selects from the space of decision trees and creates almost identical (low- To develop the CART model, the maximum tree depth was considered as five, which showed that the sample was recursively split five times. It also indicated that there are five levels below the root node. To avoid overfitting and deal with missing values, the authors pruned the tree using the "maximum surrogates" technique which eliminates bottom-level splits that do not substantially improve the accuracy of the tree. In order to control how the CART tree is constructed, two stopping rules, including the (1) minimum records in the parent branch and (2) minimum records in the child branch, were considered. The value of the former parameter was set as two, and it prevented a split if the number of records in the parent node to be divided was less than two. The former parameter was set as one, and it prevented a split if the number of records in any child branch created by the split was less than one. The authors used two parameters to determine the behavior of assembly that occurs during the boosting and bagging: (1) combining rule for continuous target, and (2) the number of component models for boosting and/or bagging. The authors used the mean of the predicted values from the base models to combine the ensemble values for the continuous target of this study. For boosting and bagging, the number of base models to build was set as 10. This number may improve the CART model accuracy and stability. To fine-tune the CART tree-building process, two advanced parameters were used: (1) minimum change in impurity and (2) overfit prevention set. The former parameter was used to create a new split in the tree and was set as 0.0001. The latter parameter was used to track errors during training to prevent the CART from modeling chance variation in the data. The percentage of records was set as 30.
CHAID is a widely used and non-binary decision tree, which was developed by Kass [84]. CHAID creates the decision tree using several sequential combinations and splits, based on a chi-square test.
To avoid overfitting, which is one of the most common drawbacks of decision trees, it automatically prunes the tree. CHAID also produces a number of rules, whereby each has a confidence level (accuracy). The confidence level is defined as the ratio of records having the specific value for the target variable to the given values for the independent variables.
Similarly to the CART model, the same values were used for maximum tree depth, minimum records in parent branch, minimum records in child branch, combining rule for continues target, and the number of component models for boosting and/or bagging to develop the CHAID model. However, more settings were used to develop the CHAID decision tree model compared to the CART model. To fine-tune the CHAID tree-building process, five parameters were used: (1) significance level for splitting, (2) significance level for merging, (3) adjustment of significance values technique, (4) minimum change in expected cell frequencies, and (5) maximum iterations for convergence. The significance levels for splitting and merging were set as 0.05. To control the false-positive error rate, an adjustment of significance values technique known as the "Boferroni method" was used. In order to converge on the optimal estimate used in the chi-square test for a specific split, the minimum change in expected cell frequencies was used and set as 0.001. The last parameter specified the maximum number of iterations before stopping, whether convergence occurred or not. This parameter was set to 0.001.
Decision trees are regarded as unstable and high-variance models, which causes the trees to be prone to different generalization behavior with small changes in the training data [85]. The RF method can be used to avoid the abovementioned drawbacks. RF is a relatively new tree-based method and was developed by Breiman [86]. RF is an ensemble technique that produces more accurate prediction compared to other tree-based methods by merging a huge number of decision trees. RF uses a bagging technique to build each ensemble from different datasets. The bagging technique randomly selects from the space of decision trees and creates almost identical (low-diversity) predictions.

Artificial Neural Network
The artificial neural network (ANN) is a non-linear statistical data modeling technique for decision-making [87]. ANN is able to find patterns in data, or to unveil relations between input variables and the target variable. ANN is applied to different fields, such as function approximation, classification, and data processing. This technique uses a connectionist approach for computation. Mostly, an ANN can be viewed as an adaptive system that modifies its structure according to internal and external information which flows through the network during the learning stage. In ANN, the weights represent the connections of the biological neurons. An excitatory connection is shown by a positive weight, while an inhibitory connection is shown by a negative value. Typically, the ANN uses a multilayer perception (MLP) approximator which is a class of feed-forward ANN. MLP comprises at least three layers of nodes: an input layer, a hidden layer, and an output layer. Its multiple layers differentiate MLP from a linear perceptron. In fact, it can differentiate data that are not linearly separable [87].
The authors developed the ANN model through an MLP approximator which allowed for more complex associations at the possible cost of raising the training and scoring time. Moreover, the authors set the maximum number of minutes for the algorithm to run as 15. Once an ensemble model is created, this is the training time allowed for each component model of the ensemble. The authors used the mean of the predicted values from the base models to combine the ensemble values for the continuous target of this study. For boosting and bagging, the number of base models to build was set as 10. This number may enhance the model accuracy and stability. To control the options that do not fit neatly into other groups of setting, the authors used two parameters: (1) overfit prevention set and (2) missing values in predictors. The former is an independent set of data records used to track errors during training to prevent the method from modeling chance variation in the data. The percentage of records was set as 30. The latter specifies how to deal with missing values. The authors used the delete listwise approach, which eliminates records with missing values on predictors from model building.

Support Vector Machine
Support vector machine (SVM) is one of the most frequently used methods for regression and classification problems [88]. Several advantages of SVM are acknowledged in the literature. For example, this method can be effectively applied to high-dimensional and linearly non-separable datasets [89,90]. According to Cortes and Vapnik [91], statistical learning theory is the basic theory behind SVM. Several kernel functions, including linear, radial basis function, sigmoid and polynomial, also influence the performance of SVM [92]. The SVM aims to determine an ideal separation hyperplane that can distinguish two classes [92]. Furthermore, the SVM can reduce both the error (for training and testing datasets) and the complexity of model [93].
To develop the SVM model and data transformation in this study, the authors used the radial basis function (RBF) as the mathematical function. It is noteworthy to mention that the authors developed the SVM models using different kernel functions, including linear, radial basis function, sigmoid, and polynomial. Since the RBF resulted in higher accuracy compared to other kernels, the model developed using the RBF was used in this study. Additionally, a stopping criterion was used to determine when to stop the optimization algorithm. Values ranged from 1.0 × 10 −1 to 1.0 × 10 −6 . The authors used different values to run the SVM models. According to the results, the value of 1.0 × 10 −3 was selected as the best since it resulted in the highest accuracy and lowest training time among stopping criterion values. A regularization parameter (C) was used to provide control of the trade-off between maximizing the margin and minimizing the training error term. C values ranged from 1-10. Because higher values of C may result in higher classification accuracy, the authors used the value of 10.

Case Study and Data Collection
In this research, in order to have an interesting prediction purpose, an extensive dataset was collected from a mine containing explosive operations. A main view of the granite mine is shown in Figure 2. In general, explosive operations are common in this mine and are repeated at different intervals. The production capacity of this mine is about 500-700 thousand tons per year, and explosive operations use large quantities of explosives. The values of 856 kg to 9420 kg of explosives are usually used in holes with diameters between 76 mm and 89 mm. The highly explosive operations increased the danger in the mine. In this research, 102 operations were accurately measured and evaluated with all design details. Important parameters such as the burden, spacing, hole diameter, hole depth, total charge, number of holes, stemming length, maximum charge per delay, powder factor, sub-drilling, and distance from the blast-face, which help in determining PPV values were measured and recorded. Additionally, the Vibra ZEB seismographer was used for PPV recording. The distance from the seismograph to the explosion was about 280 m-530 m. According to the content given in Section 1 about the influence of various parameters, the importance of this parameter was determined by the effect of this phenomenon. In addition, previous research also made recommendations on the selection of effective parameters, the importance of each item, and design issues, which can be consulted for further information. Finally, the input data collected for the prediction and simulation purposes with the mathematical and probabilistic models are summarized in Table 3. Using this table, different ranges of recorded data can be seen. The relationships between the PPV and other input indicators are demonstrated in the correlation matrix plot (see Figure 3), which can be observed as the pairwise relationship between two indicators with corresponding correlation coefficients for each indicator. In the remaining sections, after input selection through the FS technique, five ML methods, i.e., RF, CART, CHAID, ANN, and SVM, are applied to predict PPV, and then the best PPV prediction model is selected and introduced.
in Table 3. Using this table, different ranges of recorded data can be seen. The relationships between the PPV and other input indicators are demonstrated in the correlation matrix plot (see Figure 3), which can be observed as the pairwise relationship between two indicators with corresponding correlation coefficients for each indicator. In the remaining sections, after input selection through the FS technique, five ML methods, i.e., RF, CART, CHAID, ANN, and SVM, are applied to predict PPV, and then the best PPV prediction model is selected and introduced.

Input Selection Using FS Technique
An FS technique was used to reduce the dimensionality of the data and select the most important input variables. Table 4 shows the results obtained from the FS technique. The FS technique reduced the number of input variables from six to five. Thus, FS identified the maximum charge per delay (MC), hole depth (HD), stemming (ST), powder factor (PF), and distance (D) as the most important input variables. These variables were then used to develop five ML models to predict the PPV.

Development of ML Models
The authors of this study developed five ML models, namely, RF, CART, CHAID, SVM, and ANN. Each of these models was evaluated in terms of performance using four indices. These models also determined the importance of each input variable to predict the PPV. Thus, the authors compared the importance of each input variable across the developed models ( Figure 4). According to the chart, PF was the most important variable to predict the PPV for the methods of CHAID, RF, and SVM. On the other hand, the methods of CART and ANN determined D as the most important variable. It is worth noting that RF uses an ensemble approach to develop the final tree; thus, the RF results may be more reliable than other decision tree methods.

Input Selection Using FS Technique
An FS technique was used to reduce the dimensionality of the data and select the most important input variables. Table 4 shows the results obtained from the FS technique. The FS technique reduced the number of input variables from six to five. Thus, FS identified the maximum charge per delay (MC), hole depth (HD), stemming (ST), powder factor (PF), and distance (D) as the most important input variables. These variables were then used to develop five ML models to predict the PPV.

Development of ML Models
The authors of this study developed five ML models, namely, RF, CART, CHAID, SVM, and ANN. Each of these models was evaluated in terms of performance using four indices. These models also determined the importance of each input variable to predict the PPV. Thus, the authors compared the importance of each input variable across the developed models ( Figure 4). According to the chart, PF was the most important variable to predict the PPV for the methods of CHAID, RF, and SVM. On the other hand, the methods of CART and ANN determined D as the most important variable. It is worth noting that RF uses an ensemble approach to develop the final tree; thus, the RF results may be more reliable than other decision tree methods.

Assessment of the Proposed Models
The present study used five performance indices to assess the performance of the models developed. These included coefficient of determination (R 2 ), mean absolute error (MAE), root-meansquare error (RMSE), variance accounted for (VAF), and a20-index [32,[40][41][42]44,[94][95][96][97]. These indices were widely used in previous studies for the performance assessment of ML models. The computation formulas are presented in Figure 5. This study also used a simple ranking system proposed by Zorlu et al. [86]

Assessment of the Proposed Models
The present study used five performance indices to assess the performance of the models developed. These included coefficient of determination (R 2 ), mean absolute error (MAE), root-mean-square error (RMSE), variance accounted for (VAF), and a20-index [32,[40][41][42]44,[94][95][96][97]. These indices were widely used in previous studies for the performance assessment of ML models. The computation formulas are presented in Figure 5.

Assessment of the Proposed Models
The present study used five performance indices to assess the performance of the models developed. These included coefficient of determination (R 2 ), mean absolute error (MAE), root-meansquare error (RMSE), variance accounted for (VAF), and a20-index [32,[40][41][42]44,[94][95][96][97]. These indices were widely used in previous studies for the performance assessment of ML models. The computation formulas are presented in Figure 5. This study also used a simple ranking system proposed by Zorlu et al. [86]   This study also used a simple ranking system proposed by Zorlu et al. [86] to rank the performance of each proposed PPV prediction model. Before the model development, the authors split the data into training (70%) and testing (30%) datasets. For each dataset, this technique assigned a rank of five to the model that had the best value for each performance index, and then assigned a rank of one to the model that had the worst value for each performance index. The total performance rating of each model was then calculated by summing up its total rank of each dataset (Equation (1)). The authors also calculated the final rank of each model by summing up the training and testing dataset total ranks of the model. Table 5 shows the results of performance evaluation and the final ranking of each proposed model. Overall, the RF model obtained the highest final rank (41) by far, followed by ANN, CHAID, and SVM. CART achieved the lowest final rank (10). RF had the highest total rank (23) for the training dataset, while SVM and ANN had the highest total rank (21) for the testing dataset. The CART model had the lowest total rank for both training and testing datasets.
The performance of the proposed models was also evaluated using a gain chart, which is shown in Figure 6. A simple definition of gains is the proportion of total hits that occur in each percentile or increment. In the chart below, the diagonal red line denotes the at-chance model, the blue line (the higher line) denotes the perfect prediction model, and the other five lines in the middle represent the models that were developed in this study. Basically, higher lines indicate better models, specifically on the left side of the gain chart. The areas between each of the models and the red line show how much better the proposed models are compared to the at-chance model. Furthermore, the areas between each of the models and the perfect model show where the proposed models can be improved. It is worth noting that the aim of the developed models is to be as close to the perfect model as possible. The models also identified a PPV of greater than 5.59 mm/s as the best predictive value for the perfect model.
For the training dataset, as we went through 40% of the data, the at-chance model indicated that we correctly identified 40% of the samples which had a PPV of greater than 5.59 mm/s. The perfect model showed that we correctly identified 100% of the samples which had a PPV of greater than 5.59 mm/s. As we went through 44% of the data, the RF model showed that we identified 100% of the samples with the probability of a PPV of greater than 5.59 mm/s. As we went through 46% of the data, the CHAID model showed that we identified 100% of the samples with the probability of a PPV of greater than 5.59 mm/s. As we went through 54% of the data, the ANN model showed that we identified 100% of the samples with the probability of a PPV of greater than 5.59 mm/s. As we went through 55% of the data, the SVM model showed that we identified 100% of the samples with the probability of a PPV of greater than 5.59 mm/s. As we went through 68% of the data, the CART model showed that we identified 100% of the samples with the probability of a PPV of greater than 5.59 mm/s. The gain chart illustrates that the performance of the RF model was better than other proposed models for the training dataset since the performance of this model was closer to the perfect model compared to the other proposed models. For the testing dataset, the ANN and SVM models performed slightly better than the RF model and other proposed models.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 18 showed that we identified 100% of the samples with the probability of a PPV of greater than 5.59 mm/s. The gain chart illustrates that the performance of the RF model was better than other proposed models for the training dataset since the performance of this model was closer to the perfect model compared to the other proposed models. For the testing dataset, the ANN and SVM models performed slightly better than the RF model and other proposed models.  The authors provide a comparison between the achieved R 2 values in previous studies that were listed in Table 1 and the current study. Figure 7 shows the average number of input variables and average R 2 values for several types of models. This figure shows that the ANFIS models with two input variables obtained an average R 2 value of 0.98, which was the highest value among other models. The ANFIS was followed by SVM and hybrid models (PSO-ANN and ICA-ANN) that had 4.5 and eight input variables on average and achieved an average R 2 value of 0.93. In addition, ANN and FIS models that had 2.6 and 3.3 input variables on average could obtain an average R 2 value of 0.92. Turning to this present study, the authors used a hybrid approach to develop several models using six input variables. The FS-RF model developed in this study showed a higher R 2 compared to the previous hybrid models, including PSO-ANN and ICA-ANN, while a lower number of input variables were used to develop this model. The other models developed in this study achieved a lower R 2 compared to the models previously developed.  The authors provide a comparison between the achieved R 2 values in previous studies that were listed in Table 1 and the current study. Figure 7 shows the average number of input variables and average R 2 values for several types of models. This figure shows that the ANFIS models with two input variables obtained an average R 2 value of 0.98, which was the highest value among other models. The ANFIS was followed by SVM and hybrid models (PSO-ANN and ICA-ANN) that had 4.5 and eight input variables on average and achieved an average R 2 value of 0.93. In addition, ANN and FIS models that had 2.6 and 3.3 input variables on average could obtain an average R 2 value of 0.92. Turning to this present study, the authors used a hybrid approach to develop several models using six input variables. The FS-RF model developed in this study showed a higher R 2 compared to the previous hybrid models, including PSO-ANN and ICA-ANN, while a lower number of input variables were used to develop this model. The other models developed in this study achieved a lower R 2 compared to the models previously developed.

Legend
Comparing the results of this study with similar studies in terms of predictive techniques, it was found that the present study has some advantages. For example, concerning the study by Hasanipanah et al. [78], they only developed a CART model (one of the implemented techniques of this study) to predict PPV induced by blasting using MC and D parameters as inputs. In addition, Nam Bui et al. [98] developed a hybrid model of PSO and k-nearest neighbors (KNN) to estimate the blast-induced PPV. The database included 152 blasting events. Three different kernel functions were used to develop the KNN models. They used two input variables, including MC and D, for predicting PPV. The results of the PSO-KNN models were compared with RF and support vector regression (SVR) in terms of R 2 , RMSE, and MAE. The results of this study show that the R 2 values of PSO-KNN (training = 0.982, testing = 0.977) was higher than RF (training = 0.996, testing = 0.953) and SVR (training = 0.973, testing = 0.944) models for both training and testing datasets. The difference between our study and the study by Bui et al. [98] is based on three aspects: (1) different input parameters, (2) different case study and type of rock mass and region, and (3) different predictive techniques. According to the above discussion, the methods presented in this study have a high level of originality and novelty in the field of blasting environmental issues. 4.5 and eight input variables on average and achieved an average R value of 0.93. In addition, ANN and FIS models that had 2.6 and 3.3 input variables on average could obtain an average R 2 value of 0.92. Turning to this present study, the authors used a hybrid approach to develop several models using six input variables. The FS-RF model developed in this study showed a higher R 2 compared to the previous hybrid models, including PSO-ANN and ICA-ANN, while a lower number of input variables were used to develop this model. The other models developed in this study achieved a lower R 2 compared to the models previously developed.

Summary and Conclusions
This study set out with the aim of comparing the different ML techniques to predict ground vibration (PPV) due to quarry blasting. Thus, five methods, including RF, CART, CHAID, ANN, and SVM, were used. Before development of the models, an FS method was used to identify the most important input variables and reduce the dimensionality of the data. Concerning the variable importance, the results of the analysis of variable importance showed that the magnitudes of variable importance between the considered models differed. This indicates that these models identified the relationships between variables in a dissimilar manner. While RF showed a better performance for the training dataset compared to other methods used in this study, the ANN and SVM together obtained the highest rank for the testing dataset. The higher rank of RF for the training dataset shows that its flexibility, which is achieved by combining multiple decision trees, is particularly useful for predicting ground vibration. The performance of RF was significantly better than other tree-based regression models like CHAID and CART. This difference can be connected to the larger diversity among the learned trees of RF, which is a consequence of RF's procedure for randomized splitting at nodes. Typically, ensemble regression models function better if there is notable diversity among the models [99]. However, the performance of CART was inferior to both RF and CHAID. It can be explained by the fact that CART models are more prone to overfitting, while RF and CHAID conceptually aim to reduce model variance and consequently avoid overfitting.
The aim of the FS techniques is not only increasing the model accuracy. The FS techniques also aim to reduce model dimensionality/complexity, training time, and overfitting risk [100]. In addition, these techniques may increase the model generalization ability [101]. It is worth mentioning that the FS techniques are heuristic, as they are just an indication of what may work well. The authors of this present study compared the predictive accuracies achieved from different models, with and without FS. It was found that the hybridization of FS and RF increased the R 2 value, while this hybridization decreased the number of input variables. The results confirmed the effectiveness of FS technique employment before developing the RF model to predict the PPV.
The evaluation using the gain chart confirmed the results of performance indices. The RF performance for the training dataset was better than that for the testing dataset. In fact, the RF model correctly predicted 100% of the samples that had a PPV of greater than 5.59 mm/s when we went through 44% of the training dataset, while, for the testing dataset, the RF model correctly predicted 100% of the samples that had a PPV of greater than 5.59 mm/s when we went through 65% of data.
Better performance of RF over SVM and ANN was relatively unexpected, since both SVM and ANN use more advanced techniques compared to the RF to predict the target variable. The findings of this study suggest that RF can be used as an effective method to predict the PPV. Moreover, RF is a more powerful method compared to other decision tree methods to determine the importance of input variables, as the RF considers a huge number of trees (e.g., 100) and prevents the masking of the tree by another correlated input. However, caution must be taken when applying the results to other studies, since the present study used only five input variables, including MC, HD, ST, PF, and D. Future studies can develop the RF model to predict the PPV using more or different input variables.

Conflicts of Interest:
The authors declare no conflicts of interest.