Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling

The main purpose of the present study is to apply three classification models, namely, the index of entropy (IOE) model, the logistic regression (LR) model, and the support vector machine (SVM) model by radial basis function (RBF), to produce landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Firstly, landslide locations were extracted from field investigation and aerial photographs, and a total of 194 landslide polygons were transformed into points to produce a landslide inventory map. Secondly, the landslide points were randomly split into two groups (70/30) for training and validation purposes, respectively. Then, 10 landslide explanatory variables, such as slope aspect, slope angle, altitude, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected and the potential multicollinearity problems between these factors were detected by the Pearson Correlation Coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL). Subsequently, the landslide susceptibility maps for the study region were obtained using the IOE model, the LR–IOE, and the SVM–IOE model. Finally, the performance of these three models was verified and compared using the receiver operating characteristics (ROC) curve. The success rate results showed that the LR–IOE model has the highest accuracy (90.11%), followed by the IOE model (87.43%) and the SVM–IOE model (86.53%). Similarly, the AUC values also showed that the prediction accuracy expresses a similar result, with the LR–IOE model having the highest accuracy (81.84%), followed by the IOE model (76.86%) and the SVM–IOE model (76.61%). Thus, the landslide susceptibility map (LSM) for the study region can provide an effective reference for the Fugu County government to properly address land planning and mitigate landslide risk.


Introduction
Landslides often occur in mountainous and hilly areas and are one of the most dangerous geological disasters [1]. Landslides can cause huge economic losses and a large number of casualties. According to statistics, almost 1000 people and 4 billion dollars are lost annually in the world [2], and this figure still keeps growing. China is also a region where landslides frequently occur; it has been reported that 7122 geological disasters occurred in 2017, causing 327 deaths, 173 injured, 25 missing, and a loss of 3.54 billion CNY [3]. In addition, in northwestern China, landslides pose a greater threat to

Study Area
The Fugu County, whose geographic coordinates are 110 • 25 to 111 • 15 east longitude and 38 • 42 to 39 • 33 north latitude, covers an area of 3229 Km 2 ( Figure 1). The elevation in the study area is between 761 and 1423 m above sea level, and increases from east to west. The temperate zone with an arid continental monsoon climate is the main climate type in the study region, and the maximum and minimum temperatures in history are 38.9 • C and −24 • C, while the average annual temperature is 9.1 • C. The average annual rainfall is 428.6 mm, and the geographical distribution of rainfall shows a gradual increase from northwest to southwest. Meanwhile, most of the precipitation is concentrated from July to September, accounting for 69% of the annual rainfall. There are 62 rivers with drainage areas above 1 × 10 7 m 2 in the study region, and the average annual runoff is 5.911 × 10 9 m 3 . The overall topography of the study area is high in the northwest and low in the southwest. The main landform types can be divided into four types as follow: Loess girder landform, loess gully landform, canyon hilly landform, and valley terraces. The dip direction of rock formation is roughly southwest-northwest, with a dip angle of approximately 5-8 degrees except for a few areas, which are about 20 degrees. The Carboniferous-Permian strata in the east and the Jurassic strata in the northwest are coal-bearing strata, and the lithology in the study area is shown in Table 1.
Due to the rich coal resources in the study area, the mining industry is developed and the population is concentrated, which caused serious damage to the environment. At the same time, it has also formed massive landslides. The overall topography of the study area is high in the northwest and low in the southwest. The main landform types can be divided into four types as follow: Loess girder landform, loess gully landform, canyon hilly landform, and valley terraces. The dip direction of rock formation is roughly southwest-northwest, with a dip angle of approximately 5-8 degrees except for a few areas, which are about 20 degrees. The Carboniferous-Permian strata in the east and the Jurassic strata in the northwest are coal-bearing strata, and the lithology in the study area is shown in Table 1. Due to the rich coal resources in the study area, the mining industry is developed and the population is concentrated, which caused serious damage to the environment. At the same time, it has also formed massive landslides.

Landslide Inventory Map
A landslide inventory map is the first step in a landslide susceptibility analysis and includes historical and newly discovered landslides and their relational information [43], such as the location, the date of occurrence, the extent of landslide phenomena in a region, and the types of mass movements that have left discernable traces [55]. In order to obtain a practical and accurate landslide inventory map, data collection and an adequate field survey were significantly in the current study. A digital elevation model (DEM) of the study region with 30 m resolution was obtained from ASTER GDEM, downloaded from Geospatial Data Cloud [56]. The geological map and mean annual precipitation data were provided by the government of Fugu County. Based on field investigations, a total of 194 landslides polygons, including 162 slides, 29 falls, and 3 debris flows, were drawn according to the depletion zone, and these landslides were triggered by rainfall and excavation. In the study area, the smallest and largest sizes of these landslides were about 39 m 2 and 13.5 × 10 4 m 2 , respectively. Because only 12% of landslides are over 10,000 m 2 in size, landslide polygons were transformed into points using the centroid method and then the landslide inventory map (Figure 1) was obtained in the present study [57,58].
To avoid the overfitting problems in modeling, a total of 194 nonlandslide points were randomly generated and mapped on the landslide inventory map. All of these landslide and nonlandslide points were randomly divided into two groups; namely, the training dataset, including 272 (70%) points, was used to train the models, and the validating dataset, including 116 (30%) points, was used for validation propose.

Landslide Explanatory Variables
In order to produce the landslide susceptibility map, 10 landslide explanatory variables, namely slope aspect, altitude, slope angle, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected to produce data layers representing themselves with a resolution of 30 × 30 m. Slope aspect, altitude, and slope angle maps were extracted from DEM data using ArcGIS software. Land use and NDVI were extracted from GF-2 satellite images gathered from the China Center for Resources Satellite Data and Application. Lithology, distance to roads, mean annual precipitation, distance to rivers, and distance to faults maps were extracted based on existing data.
The slope aspect, which is considered to be a prerequisite condition, was frequently adopted by many works in the literature to produce a landslide susceptibility map [30]. The slope aspect was reclassified into nine groups, based on the equal interval method, as follows: Northwest, west, southwest, south, southeast, east, northeast, north, flat, respectively ( Figure 2a).
As it is considered to be another critical factor, the slope angle was widely used by a lot of relevant research [59]. In the current research, the slope angle was divided into the following six categories, based on the Jenks natural break method, as follows: 0 • -6.65 • (Figure 2b).
Altitude is also considered a significant factor for landslide susceptibility mapping [1]. Thus, based on the Jenks natural break method, elevation values were classified into the following seven ranges  The difference of lithology is the basis of landslide formation conditions [60]. According to field investigations and the existing geological data and maps, lithological units were divided into six categories (Table 1) and the lithology map was produced (Figure 2d).
Distance to roads is used as an important landslide explanatory variable to prepare the distance to roads map [64]. In this study, the values of distance to roads were reclassified into five ranges based on equal interval method as follows: <200 m, 200-400 m, 400-600 m, 600-800 m, and >800 m (Figure 2f).
River erosion of slope is considered to be a significant explanatory variable inducing landslides; thus, distance to rivers is employed to be a quantitative index of river erosion [25]. In this study, with 200 m as the interval, the values of distance to rivers were reclassified into five ranges based on equal interval method as follows: <200 m, 200-400 m, 400-600 m, 600-800 m, and >800 m (Figure 2g).
Fault movement is not only the requirement for individual landslide occurrences, but also a controlling factor for regional landslide occurrences [12]. A mass of field surveys indicated that the more fault movement occurred acutely, the more landslides were triggered. In the current research, with 2000 m as the interval, the values of distance to faults were reclassified into five ranges based on equal interval method as follows: <2000 m, 2000-4000 m, 4000-6000 m, 6000-8000 m, and >8000 m ( Figure 2h).
Land use in different regions will be different. The use of these land may lead to an asymmetrical distribution of landslides [65]. Thus, land use was also employed to be an explanatory variable in the study region, which was generally divided into five categories as follows: Water, residential areas, bare land, forest/grassland, and farmland ( Figure 2i).
NDVI reflects the surface condition and provides a quantitative estimate of vegetation growth and biomass. This is depending on the biomass, the position within the hillslope profile, the root-zone depth and possibility to crack rocks and to prevent or ease water infiltration [66,67]. Therefore, NDVI is also considered to be a pivotal explanatory variable. The computational formula of NDVI is defined as follows: where R stands for the red part of electromagnetic spectrum, while NIR represents the infrared part of electromagnetic spectrum. Using the Jenks natural break method, the NDVI values were reclassified into five categories as follows: −0.39 to −0.019, −0.019 to 0.063, 0.063-0.134, 0.134-0.216, and 0.216-0.607 ( Figure 2j).
where R stands for the red part of electromagnetic spectrum, while NIR represents the infrared part of electromagnetic spectrum. Using the Jenks natural break method, the NDVI values were reclassified into five categories as follows: −0.39 to −0.019, −0.019 to 0.063, 0.063-0.134, 0.134-0.216, and 0.216-0.607 (Figure 2j).

Multicollinearity Diagnosis
In the study region, not all explanatory variables have a positive impact on the classification results. Multicollinearity problems may exist between explanatory variables, which may lead to an overfit in modeling. Thus, the Pearson correlation coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL) were introduced to detect the potential multicollinearity problems [68].
The essence of PCC is a statistical linear correlation coefficient, and its analysis is usually used to measure the linear relationship between distance variables. For two sets of samples Xi (i =1, 2, 3, ..., n) and Yj (j = 1, 2, 3, ..., n), the PCC between them can be expressed as: where xi and yj are variable values for Xi and Yj. x and y are the average of Xi and Yj, respectively. In general, the greater the absolute value of PCC is, the higher the risk of multicollinearity between the landslide explanatory variables [69], and a PCC of >0.7 indicates a multicollinearity problem [70].
The VIF and TOL are two important indexes for a multicollinearity diagnosis. VIF refers to the ratio of the variance when there is multicollinearity between the conditioning factors and the variance when there is no multicollinearity, and the tolerance is the reciprocal of VIF [71].

Multicollinearity Diagnosis
In the study region, not all explanatory variables have a positive impact on the classification results. Multicollinearity problems may exist between explanatory variables, which may lead to an overfit in modeling. Thus, the Pearson correlation coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL) were introduced to detect the potential multicollinearity problems [68].
The essence of PCC is a statistical linear correlation coefficient, and its analysis is usually used to measure the linear relationship between distance variables. For two sets of samples X i (i = 1, 2, 3, ..., n) and Y j (j = 1, 2, 3, ..., n), the PCC between them can be expressed as: where x i and y j are variable values for X i and Y j . x and y are the average of X i and Y j , respectively. In general, the greater the absolute value of PCC is, the higher the risk of multicollinearity between the landslide explanatory variables [69], and a PCC of >0.7 indicates a multicollinearity problem [70]. The VIF and TOL are two important indexes for a multicollinearity diagnosis. VIF refers to the ratio of the variance when there is multicollinearity between the conditioning factors and the variance when there is no multicollinearity, and the tolerance is the reciprocal of VIF [71]. In general, the larger the VIF values and the smaller the tolerances values are, the stronger the multicollinearity between the conditioning factors. In this study, the explanatory variables with VIF >2 or TOL <0.4 should be abandoned [72].

Index of Entropy (IOE) Method
The first classification model applied in the present study is the index of entropy (IOE) model, which is a bivariate statistic model; the IOE is also used to be the input data to build the hybrid models in the subsequent modeling. The entropy means the degree of unsteadiness and indeterminacy of a system, and also indicates that elements in a natural environment are the most related development for mass movement [23]. In addition, the entropy represents the degree of different explanatory variables that affect the development of landslides in a landslide susceptibility analysis. The weight values (W j ) of each landslide explanatory variable are determined by the following equations [73]: where FR ij is the frequency ratio value; x and y represent the percentage of domain and percentage of landslides, respectively; S ij stands for the probability density; entropy values are represented by M j and M jmax ; N j means the number of categories or ranges of each explanatory variables; and I j is the information parameters. Then, the final weight values are calculated by SPSS software. Because these three explanatory variables (aspect, lithology, and land use) are generated from vector graphics with no attribute values, the FR values of aspect, lithology, and land use were used as input data for the computation of W j . Finally, the landslide susceptibility map for the IOE model is produced using the following equation: where LSI IOE stands for the sum of all the categories; j represents the number of explanatory variable maps; e means the number of classes within explanatory variable maps with the greatest number of groups; f j is the number of classes within particular explanatory variable maps; and C indicates the value of the categories after secondary classification [74].

Integration of Logistic Regression and Index of Entropy Model
The logistic regression (LR) model is employed to integrate with the IOE to build a new hybrid model, namely, the LR-IOE model in this study. Logistic regression is a commonly used statistical analysis method for regression analysis of binary classification dependent variables. The superiority of the LR model is that independent variables can be discrete or continuous and there is no need to satisfy the normal distribution [75]. In a logistic regression analysis, the dependent variable has values Entropy 2018, 20, 884 10 of 24 of 0 and 1, representing nonlandslide occurrences and landslide occurrences, respectively. The LR model can be expressed as the following equation: where P stands for the probability of landslide occurrences, whose value ranges from 0 to 1; Z is calculated by the following equation with the output values range from −∞ to +∞: where n is the number of independent variables; B i (i = 1, 2, 3, ..., n) is the logistic regression coefficient and X i are the values of the n explanatory variables; and B 0 is a constant.
Because the values of S ij were obtained from the IOE model and the dimension of S ij is uniform, it can avoid the linear correlation between landslides and explanatory variables and also reduce the noise in modeling. In this study, the 10 explanatory variables were reclassified with the corresponding S ij values. Then, the values of S ij were regarded as the input data to build the hybrid model (LR-IOE) through the forward stepwise method to calculate B 0 and B i .

Integration of Support Vector Machine and Index of Entropy Model
The basic theory of the support vector machine is to transform the input space into high-dimensional space through an inner product function using the training data [76]. The support vectors are defined as the training samples that have the smallest distance from the optimal hyper plane [40]. In this study, SVM is designed to solve binary classification problems, which means that the positive and negative samples exist at the same time.
Consider a set of training vectors x i (i = 1, 2, 3, ..., n), and x i consists of two types denoted as y i = ±1 [77]. SVM aims to search an n-dimensional hyperplane distinguishing the two categories; meanwhile, ensure that these two classes are farthest from the hyperplane. Using mathematical formulas, this can be expressed as follows: followed by constraints: where w stands for the norm of hyperplane normal; k is a constant. By applying the Lagrangian multiplier (λ i ), the cost function can be written as: In addition, slack variable ξ i is applied to solve the nonseparable problems [76]; thus, Equations (12) and (13) can be modified as: where v stands for misclassification, with values ranging from 0 to 1. In addition, by introducing a kernel function, the nonlinear decision boundary can be calculated. In the current research, the following kernel function, namely, the radial basis function (RBF), which is considered to be one of the most powerful kernels [78], is selected to calculate LSI SVM and produce landslide susceptibility map. The radial basis function is shown as follows: where δ accounts for the width of the Gaussian kernel function [19]. Similarly, the S ij was used to be the input data for the SVM model and then build the new hybrid model (SVM-IOE).

The ROC Curve
To test the performance of LSMs obtained by the three models, the receiver operating characteristics (ROC) curve was applied. Based on a series of different dichotomies (cutoffs or decision thresholds), the ROC curve plots 1-specificity as X-axis and sensitivity as Y-axis, which can be expressed as: where TP represents true positive, TN is true negative, FP is false positive, and FP is false negative [79]. The quality of these three models predicting the occurrences or non-occurrences of landslide can be measured by the area under the ROC curve (AUC) [9]. The AUC values range from 0 to 1; in addition, if the AUC value is closer to 1, it indicates that the accuracy of model prediction is higher. Conversely, if AUC value is less than 0.5, and closer to 0, it indicates that the model prediction has no practical value [80].

Assessment of Explanatory Variables
In this study, the training dataset was used to evaluate explanatory variables and the Pearson correlation coefficient between pairs of explanatory variables was calculated ( Table 2). It can be seen from the results that the lowest PCC value is −0.009, which happened between altitude and NDVI, and the highest PCC value happened between slope aspect and distance to rivers (0.368). All PCC values are less than 0.7.
The calculation results of VIF and TOL are shown in Table 3. It can be observed that the maximum VIF value is 1.926 and the minimum TOL value is 0.519, which means all the explanatory variables can be applied for landslide susceptibility modeling.

Result of IOE Model
The calculation method of W j has already been described in Section 4.2, Equations (3)-(8), and the results are shown in Table 4. The FR ij values shown in Table 4 were used as the input data for slope aspect, lithology, and land use. For the remaining explanatory variables, the original (continuous) data were used as input data to compute the IOE values. Based on the obtained results, the landslide susceptibility index for the IOE model (LSI IOE ) was calculated using Equation (9) and was written as follows: LSI IOE = (slope aspect × 0.084) + (slope angle × 0.064) + (altitude × 0.874) + (lithology × 0.119) + (mean annual precipitation × 0.232) + (distance to roads × 0.517) + (distance to rivers × 0.127) + (distance to faults × 0.030) + (land use × 0.974) + (NDVI × 0.303) (20) In the end, all of the 10 explanatory variables were used to build the IOE model, and LSI IOE values range from −10.37 to 11.67. LSI IOE values reflect the probability of landslide occurrence. In other words, the closer the values of LSI IOE are to 11.67, the higher the probability of landslide occurrence, and the values of LSI IOE are close to −10.37, indicating that the probability of occurrence of a landslide is lower. Then, the natural break method was applied to classify the final LSM produced by the IOE model into four categories, which were low (−10.37 to −4.33), moderate (−4.33 to −1.65), high (−1.65 to 1.64), and very high (1.64 to 11.67) (Figure 3a). Additionally, the area percentage of low, moderate, high, and very high regions is 31.24%, 16.39%, 33.23%, and 19.14%, respectively.

Result of LR-IOE Model
The calculation method of Z has already been described in Section 4.2, Equations (3)- (8). The S ij values shown in Table 4 were used as the input data for all 10 explanatory variables through the reclassification method to build the LR-IOE model and to compute B 0 and B i using SPSS software. Based on the results, Equation (11) Subsequently, the LSI LR-IOE values were obtained, which range from 0.016 to 0.983. LSI LR-IOE values reflect the probability of landslide occurrence. In other words, the closer the values of LSI LR-IOE are to 1, the higher the probability of landslide occurrence, and the values of LSI LR-IOE are close to 0, indicating that the probability of landslide occurrence is lower. Similarly, the natural break method was applied to classify the final LSM produced by the LR-IOE model into four categories: Low (0.016-0.248), moderate (0.248-0.445), high (0.445-0.688), and very high (0.688-0.983) (Figure 3b). In addition, the area percentage of low, moderate, high, and very high is 16.77%, 33.06%, 21.05%, and 29.12%, respectively.

Result of SVM-IOE Model
In the current research, the parameters of the radial basis function were selected by the grid search method with 10-fold cross validation, and then the entropy was regarded as the input data to calculate the LSI SVM-IOE values based on SVM-IOE model. The LSI SVM-IOE values range from 0.061 to 0.984. The closer the values are to 1, the higher the probability of landslide occurrence, and the values of LSI SVM-IOE are close to 0, indicating that the probability of landslide occurrence is lower. Then, the natural break method was applied to classify the final LSM produced by the SVM-IOE model into four categories: Low (0.061-0.271), moderate (0.271-0.437), high (0.437-0.658), and very high (0.658-0.984) (Figure 3c). The area percentage of low, moderate, high, and very high is 15.08%, 29.56%, 33.39%, and 21.97%, respectively.

Validation of Landslide Susceptibility Maps
In the current study, the ROC curve was used to validate and compare the performance of the IOE, LR-IOE, and SVM-IOE models. The final AUC values represent the success and prediction rate derived from the training and validating dataset, respectively.
In the end, for success rate results, the AUC values for the IOE, LR-IOE, and SVM-IOE models were observed to be 0.8743, 0.9011, and 0.8653, respectively (Figure 4a). That is to say, the training accuracy of the susceptibility maps is 87.43%, 90.11%, and 86.53%, respectively. In terms of prediction rate results, the AUC values for the IOE, LR-IOE, and SVM-IOE models were found to be 0.7686, 0.8184, and 0.7661, respectively (Figure 4b). In other words, the prediction accuracy of the susceptibility maps is 76.86%, 81.84%, and 76.61%, respectively.  Generally, the results of both the success rate and prediction rate express reasonable and practical accuracies in the current research. However, the LR-IOE model shows the best result for the current study.

Discussion
Spatial prediction of landslides is a critical process in the study of landslides and the accuracy of prediction will be affected by the models that we used, and the input data extracted from explanatory variables. However, there is no definitive conclusion about the methods used to select and evaluate explanatory variables. Therefore, it is necessary to investigate the methods which will help us to obtain reasonable conclusions. In this study, we calculated the IOE and PCC to assess 10 explanatory variables, and evaluated three classification models, namely, IOE, LR-IOE, and SVM-IOE, for landslide susceptibility mapping.
According to PCC values (Table 2), all 10 factors are less than 0.7, which means these 10 factors cannot generate noise in landslide susceptibility modeling. From the index of entropy (Table 4), we can see the residential areas have the highest value (7.555), which means that most landslides occurred in this region. We believe that the reason for this condition is the concentration of population and the fact that human engineering activities are intense in this area. Similarly, the closer to the road, the higher the frequency of landslides that occurred was. For the slope aspect, most landslides occurred on south-facing slopes; the reason for this condition may be the climate, and the same results were also reported by the authors of [37] (p. 82). The category C (Siltstone, sandstone, mudstone, shale, coal seam, glutenite) in lithology is the region where the largest number of landslides has occurred. This may be due to the softness of sandstone and siltstone structures and strong weathering erosion. In the case of slope angle and mean annual precipitation, the rate of landslide occurrence is roughly proportional to them. The reason may be that a large amount of water infiltrate increases the water content and weight of the rock and soil mass and increases the sliding force of the rock and soil mass, and the steeper the slope, the stronger the slip force of the rock and soil mass. Interestingly, with the values of distance to faults, distance to rivers, distance to roads, altitude, and NDVI increasing, the IOE is gradually decreasing. The reason for this phenomenon is that road construction usually causes instability, while roads in the study region are generally built at low altitudes and away from faults. The root of the vegetation is conducive to the stability of the soil, while the erosion of the rivers will affect the stability of the slope. These conditions are roughly the same as those observed in the field.
In this study, the selection of explanatory variables was based on previous studies and field observations, which will cause interference from human factors. In addition, although we calculated all the Wj values for the 10 explanatory variables, it is not clear how much the method developed in Generally, the results of both the success rate and prediction rate express reasonable and practical accuracies in the current research. However, the LR-IOE model shows the best result for the current study.

Discussion
Spatial prediction of landslides is a critical process in the study of landslides and the accuracy of prediction will be affected by the models that we used, and the input data extracted from explanatory variables. However, there is no definitive conclusion about the methods used to select and evaluate explanatory variables. Therefore, it is necessary to investigate the methods which will help us to obtain reasonable conclusions. In this study, we calculated the IOE and PCC to assess 10 explanatory variables, and evaluated three classification models, namely, IOE, LR-IOE, and SVM-IOE, for landslide susceptibility mapping.
According to PCC values (Table 2), all 10 factors are less than 0.7, which means these 10 factors cannot generate noise in landslide susceptibility modeling. From the index of entropy (Table 4), we can see the residential areas have the highest value (7.555), which means that most landslides occurred in this region. We believe that the reason for this condition is the concentration of population and the fact that human engineering activities are intense in this area. Similarly, the closer to the road, the higher the frequency of landslides that occurred was. For the slope aspect, most landslides occurred on south-facing slopes; the reason for this condition may be the climate, and the same results were also reported by the authors of [37] (p. 82). The category C (Siltstone, sandstone, mudstone, shale, coal seam, glutenite) in lithology is the region where the largest number of landslides has occurred. This may be due to the softness of sandstone and siltstone structures and strong weathering erosion. In the case of slope angle and mean annual precipitation, the rate of landslide occurrence is roughly proportional to them. The reason may be that a large amount of water infiltrate increases the water content and weight of the rock and soil mass and increases the sliding force of the rock and soil mass, and the steeper the slope, the stronger the slip force of the rock and soil mass. Interestingly, with the values of distance to faults, distance to rivers, distance to roads, altitude, and NDVI increasing, the IOE is gradually decreasing. The reason for this phenomenon is that road construction usually causes instability, while roads in the study region are generally built at low altitudes and away from faults. The root of the vegetation is conducive to the stability of the soil, while the erosion of the rivers will affect the stability of the slope. These conditions are roughly the same as those observed in the field.
In this study, the selection of explanatory variables was based on previous studies and field observations, which will cause interference from human factors. In addition, although we calculated all the W j values for the 10 explanatory variables, it is not clear how much the method developed in the work is sensitive to the number of the classes and to the choice of the breaking points. Therefore, this is the focus of future research.
As shown in Figure 4, we can see the AUC value of the LR-IOE model is the highest among the three models, whether it is for the success or prediction rate, which means that the LR-IOE model performs best in landslide susceptibility mapping in this study. However, the AUC value of the SVM-IOE model is the lowest, which may be due to the fact that the SVM-IOE model is more dependent on the selection of the kernel function, and there is no objective way to solve it.
In terms of the proportion of the final susceptibility mapping results ( Figure 5), it can be observed that the proportion of high and very high regions obtained by the three models is about 52%. Among them, the LR-IOE model has the lowest result (50.17%), which implies an efficient result corresponding to the LR-IOE model, and it can also improve the efficiency of decision-making and reduce costs.
Entropy 2018, 20, x 19 of 24 the work is sensitive to the number of the classes and to the choice of the breaking points. Therefore, this is the focus of future research. As shown in Figure 4, we can see the AUC value of the LR-IOE model is the highest among the three models, whether it is for the success or prediction rate, which means that the LR-IOE model performs best in landslide susceptibility mapping in this study. However, the AUC value of the SVM-IOE model is the lowest, which may be due to the fact that the SVM-IOE model is more dependent on the selection of the kernel function, and there is no objective way to solve it.
In terms of the proportion of the final susceptibility mapping results ( Figure 5), it can be observed that the proportion of high and very high regions obtained by the three models is about 52%. Among them, the LR-IOE model has the lowest result (50.17%), which implies an efficient result corresponding to the LR-IOE model, and it can also improve the efficiency of decision-making and reduce costs.

Conclusions
In this present study, the IOE model, LR-IOE model, and SVM-IOE model were used to obtain landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Ten explanatory variables, namely, altitude, slope aspect, mean annual precipitation, slope angle, lithology, distance to roads, land use, distance to rivers, distance to faults, and NDVI, were selected and the potential multicollinearity problem among them was detected by PCC, VIF, and TOL. The results of the analysis showed that there are no potential multicollinearity problems between these 10 factors and they are available for landslide susceptibility modeling. A total of 194 landslides, including landslides recognized from extensive field investigations and historical landslide records, and 194 nonlandslide points were also randomly generated. To build the models, 272 (70%) landslide and nonlandslide points were randomly selected and the remaining 116 (30%) landslide and nonlandslide points were applied for validating purposes. A natural break method was used to split the study region into four categories: Low, moderate, high, and very high. In the end, the performance of the achieved landslide susceptibility maps was evaluated using AUC values. In terms of the success rate presented by the AUC values, the LR-IOE model has the highest training accuracy (90.11%), followed by the IOE model (87.43%) and the SVM-IOE model (86.53%). As for the prediction rate, the LR-IOE model has the highest training accuracy (81.84%), followed by the IOE model (76.86%) and the SVM-IOE model (76.61%). Thus, the results prove that these three models present good performance in landslide susceptibility mapping. The LR-IOE model performed best for this research and is more suitable for landslide susceptibility mapping in the study area.
The results of this study provide available information for the engineers, decision makers, and urban planners in this study region.

Conclusions
In this present study, the IOE model, LR-IOE model, and SVM-IOE model were used to obtain landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Ten explanatory variables, namely, altitude, slope aspect, mean annual precipitation, slope angle, lithology, distance to roads, land use, distance to rivers, distance to faults, and NDVI, were selected and the potential multicollinearity problem among them was detected by PCC, VIF, and TOL. The results of the analysis showed that there are no potential multicollinearity problems between these 10 factors and they are available for landslide susceptibility modeling. A total of 194 landslides, including landslides recognized from extensive field investigations and historical landslide records, and 194 nonlandslide points were also randomly generated. To build the models, 272 (70%) landslide and nonlandslide points were randomly selected and the remaining 116 (30%) landslide and nonlandslide points were applied for validating purposes. A natural break method was used to split the study region into four categories: Low, moderate, high, and very high. In the end, the performance of the achieved landslide susceptibility maps was evaluated using AUC values. In terms of the success rate presented by the AUC values, the LR-IOE model has the highest training accuracy (90.11%), followed by the IOE model (87.43%) and the SVM-IOE model (86.53%). As for the prediction rate, the LR-IOE model has the highest training accuracy (81.84%), followed by the IOE model (76.86%) and the SVM-IOE model (76.61%). Thus, the results prove that these three models present good performance in landslide susceptibility mapping. The LR-IOE model performed best for this research and is more suitable for landslide susceptibility mapping in the study area.
The results of this study provide available information for the engineers, decision makers, and urban planners in this study region.