Spatial Proximity-Based Geographically Weighted Regression Model for Landslide Susceptibility Assessment: A Case Study of Qingchuan Area, China

Landslides pose a serious threat to the safety of human life and property in mountainous regions. Susceptibility assessment for landslides is critical in landslide management strategy. Recent studies indicate that the traditional assessment models in many previous studies commonly assume a fixed relationship between influencing factors and landslide occurrence within an area, resulting in an inadequate evaluation for the local landslides susceptibility. To address this issue, in this paper we propose a spatial proximity-based geographically weighted regression (S-GWR) model considering spatial non-stationarity of landslide data for assessing the landslide susceptibility. Spatial proximity is the basic input condition for the proposed S-GWR model. The challenge lies in defining the spatial proximity expression that shows the geographical features of landslides and therefore affects the model ability of S-GWR. Our solution chooses the slope unit as spatial adjacency, rather than the grid unit in DTM. The multicollinearity between landslide influencing factors is then eliminated through variance inflation factor (VIF) method and principal component analysis (PCA). The proposed model is subsequently validated by using data in Qingchuan County, southwestern China. Spatial non-stationary is identified for landslide data. A comparison with grid unit and four traditional evaluation models is conducted. Validation results using the area under the ROC (receiver operating characteristic) curve and success rate curve indicate that the spatial proximity-based GWR model with slope unit has the highest predictive accuracy (0.859 and 0.850 respectively).


Introduction
Landslides are catastrophic natural hazards frequently posing risks to the major societal, economic, and environmental on an international scale [1]. According to the report from EM-DAT [2], 21,412 landslides occurred worldwide between 1900 and 2014, resulting in 38,521,499 fatalities, with 7,229,487,068 people affected and total direct economic losses exceeding $2.7 trillion.
Landslide susceptibility assessment has long been recognized as a useful tool for landslide hazard management through land use planning and better decision making in landslide prone areas [3]. It is generally based on heuristic, statistical, or deterministic models [4][5][6][7][8]. Heuristic models are subjective and much susceptible to the expectation of the results [9,10]. Deterministic models have been reported with higher accuracy, but are limited by the difficulty of obtaining detailed landslide database [11,12]. Statistical models are the most widely used models due to their simplicity and high efficiency [13,14]. Many remarkable studies on the above aspects have been made, laying a solid foundation for landslide susceptibility mapping. However, in general, most of the previous studies consider the relationship between triggering factors and landslide occurrence as a fixed effect within an area, whereas different degrees of parameter influence may occur, such that, with the change of location, the effect of parameters can be consequently changed. The uncertainties due to this varied relationship remain a scientific challenge.
The second law of geography suggests that there exists variability over space of a given relationship between variables widely in spatial data [15], which is the so-called spatial non-stationarity. In view of that landslide susceptibility assessment is heavily based on spatial data, the relationship between influencing factors and landslide susceptibility may also have the characteristics of spatial non-stationarity. Previous studies, e.g., [16], have also suggested that the effective parameters in the occurrence of a natural disaster phenomenon do not have the same importance in different parts of an area. The existence of spatial non-stationarity indicates that average relationships fitted to the whole study area of traditional models might be inappropriate since they do not accurately fit local conditions [17]. This spatial non-stationarity characteristics in the data pose difficulties in landslide susceptibility assessment based on the traditional models.
Geographically weighted regression (GWR), the most popular local regression format, shows great capability in dealing with spatial non-stationary relationships [18]. It allows the relationship between dependent and independent variables to vary over space, as well as that regression coefficients in the model are calculated for each spatial zone [19]. This method has been applied in various fields of study such as social economics, geography, and meteorology [20][21][22]. However, previous studies applying GWR model for the assessment of geological hazard susceptibility have not yet been reported.
One difficulty limiting the application of GWR model in landslide susceptibility assessment is spatial proximity. It is the basic input condition and core problem for GWR model, and the issue regarding an adequate expression for the spatial proximity at different locations directly affects the modeling ability of GWR model [23,24]. Spatial proximity is the distance relationship between two units in space, and the closer the distance, the greater the impact. The key to determining the spatial proximity is segmenting the study area into map units to effectively express the spatial adjacency relationship between landslide data. The relationship should satisfy the requirements of GWR for good internal homogeneity and between-units heterogeneity. The commonly used segmentation methods in previous studies relating to GWR model can be categorized into two major kinds, i.e., administrative units [25] and grid units [26]. Administrative unit is mostly used for social and economic issues, and its segmentation does not accord with the neighborhood characteristics of landslide data. As such, administrative unit is rarely used in geological hazard assessment. Grid unit is a popular mapping unit for susceptibility assessment since it is easily accessible, but it is not associated with geological environments. Slope unit is a relative new mapping unit for evaluating landslide susceptibility, which is generated according to hydrology theory and is the watershed area defined by drainage lines (valley lines) and water divide lines (ridge lines) [27]. It is the basic topographical unit of geological hazard occurrence. Slope unit has higher internal homogeneity and between-unit heterogeneity than grid unit. It is closely related to geological environment conditions. In this sense, slope unit provides an alternative solution for spatial proximity expression of the GWR model for landslide susceptibility assessment.
Two other key issues of GWR model are the multicollinearity elimination and the kernel function establishment. Previous studies, e.g., [28], indicated that GWR is highly susceptible to the effects of multicollinearity between explanatory variables, and collinearity among pairs of explanatory variables or multicollinearity among more than two variables often lead to problems such as parameter estimate instability and unintuitive parameter signs. These problems remain significant owing to the complicated conditions of landslide posing a high possibility of correlations between explanatory variables. Kernel function is based on the distances between observations and calibration units to place emphasis on observations that are closer in space [28]. The selection of kernel function type and the determination of its bandwidth are crucial to the spatial proximity modeling of GWR.
In this study, we attempt to propose a spatial proximity based on geographically weighted regression (S-GWR) model for landslide susceptibility assessment. The presented model resolves the spatial non-stationarity of landslide susceptibility assessment with GWR model. Firstly, we generate slope units to establish spatial adjacency. Then, variance inflation factor (VIF) method [29] and principal component analysis (PCA) method [30] were adopted to eliminate multi-collinearity, and kernel function was determined according to the characteristics of landslide data. Finally, we chose Qingchuan County, Sichuan Province, China, as the study area to validate the applicability of the model, and further compared the established model with the grid-unit GWR model and other evaluation models.

Study Area
The study area is the Qingchuan County in the transitional region between the Sichuan Basin and the Western Sichuan Plateau. This area has long been recognized as one of the most landslide-prone areas of China [31]. It locates between 32 • 12 ~32 • 56 N in latitude and 104 • 36 ~105 • 38 E in longitude, covering a total area of 3217 km 2 . The minimum elevation of the Qingchuan County is approximately 500 m and the maximum is 3820 m, characterized by northwestern part with higher elevation than the southeastern. Slope gradient reaches a maximum of about 80 • , with a mean value of 38 • .
The tectonics and geological settings in the area are complex. Because of the neotectonics, soft-lithology and hard-lithology usually appears alternately. There are about eight types of lithological outcrops throughout the study region (as shown in Figure 1), including the sedimentary rock (limestones, sandstone, and conglomerate) from Cambrian to Jurassic age, magmatic (granite), metamorphic rock (shales, schists, gneiss) from Cambrian to Jurassic age and Quaternary loess unconsolidated sedimentary. Two main active faults cross the area: the Pingwu-Qingchuan fault located in the north and crossing the whole territory, and the Yingxiu-Beichuan fracture which belongs to the Longmenshan fault belt, is a thrust fault 60 • -70 • NW dipping. Bailong river, Qingzhu river and Qiaozhuang river are distributed in the area. The discharges of the three rivers are measured approximately 525, 30, and 40 m 3 s −1 , respectively, serving as the main channel for atmospheric precipitation and groundwater drainage.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 16 owing to the complicated conditions of landslide posing a high possibility of correlations between explanatory variables. Kernel function is based on the distances between observations and calibration units to place emphasis on observations that are closer in space [28]. The selection of kernel function type and the determination of its bandwidth are crucial to the spatial proximity modeling of GWR.
In this study, we attempt to propose a spatial proximity based on geographically weighted regression (S-GWR) model for landslide susceptibility assessment. The presented model resolves the spatial non-stationarity of landslide susceptibility assessment with GWR model. Firstly, we generate slope units to establish spatial adjacency. Then, variance inflation factor (VIF) method [29] and principal component analysis (PCA) method [30] were adopted to eliminate multi-collinearity, and kernel function was determined according to the characteristics of landslide data. Finally, we chose Qingchuan County, Sichuan Province, China, as the study area to validate the applicability of the model, and further compared the established model with the grid-unit GWR model and other evaluation models.

Study Area
The study area is the Qingchuan County in the transitional region between the Sichuan Basin and the Western Sichuan Plateau. This area has long been recognized as one of the most landslideprone areas of China [31]. It locates between 32°12′~32°56′ N in latitude and 104°36′~105°38′ E in longitude, covering a total area of 3217 km 2 . The minimum elevation of the Qingchuan County is approximately 500 m and the maximum is 3820 m, characterized by northwestern part with higher elevation than the southeastern. Slope gradient reaches a maximum of about 80°, with a mean value of 38°.
The tectonics and geological settings in the area are complex. Because of the neotectonics, softlithology and hard-lithology usually appears alternately. There are about eight types of lithological outcrops throughout the study region (as shown in Figure 1), including the sedimentary rock (limestones, sandstone, and conglomerate) from Cambrian to Jurassic age, magmatic (granite), metamorphic rock (shales, schists, gneiss) from Cambrian to Jurassic age and Quaternary loess unconsolidated sedimentary. Two main active faults cross the area: the Pingwu-Qingchuan fault located in the north and crossing the whole territory, and the Yingxiu-Beichuan fracture which belongs to the Longmenshan fault belt, is a thrust fault 60°-70° NW dipping. Bailong river, Qingzhu river and Qiaozhuang river are distributed in the area. The discharges of the three rivers are measured approximately 525, 30, and 40 m 3 s −1 , respectively, serving as the main channel for atmospheric precipitation and groundwater drainage.

Flowchart of Research
The methodologies used in this study are as shown in Figure 2. The flowchart consists of three major steps. major steps.
The first step is dataset preparation, including the preparation of basic data and the determination and extraction of landslide-related causal factors (LRCFs); The second step is the establishment of the landslide susceptibility assessment model, which is the most critical step. Firstly, slope units are generated to establish spatial adjacency of landslide data. After obtaining the data in the slope units, the multicollinearity between data is eliminated by variance inflation factor method (VIF) and principal component analysis (PCA). Then, the spatial kernel function of GWR model is determined, and finally the GWR model based on slope unit (S-GWR) is established for landslide susceptibility evaluation.
The last step is model validation, including applying the S-GWR model to the actual area and generating the landslide susceptibility. The results are then used for validation and compared with the grid-unit GWR model (G-GWR) and other evaluation models including ordinary least squares regression (OLS), artificial neural network (ANN), and information models (I).

Dataset Preparation
The basic data utilized in the study include the digital elevation model (DEM) of the study area with a spatial resolution of 10 m; a geological map with the scale of 1:100,000; aerial photographs taken by Ministry of Land and Resources. After field identification and interpretation of the color aerial photographs, 973 landslide points are identified, and the distribution is as shown in Figure 1.
Owing to that a variety of factors make a contribution to the landslide occurrence, an adequate selection of the effective parameters is essential to identify areas prone to landslide. We choose these factors in accordance with the well-illustrated criterion in many previous studies (as listed in Table  1). In detail, firstly, the actual conditions of research area are considered. The study area has a complex topography, so topographic factors, such as slope and slope height, provide terrain conditions for landslides. The alternating appearance of soft and hard lithology provides a material basis for the development of landslides. Besides, there are two large faults across the study area, and The first step is dataset preparation, including the preparation of basic data and the determination and extraction of landslide-related causal factors (LRCFs); The second step is the establishment of the landslide susceptibility assessment model, which is the most critical step. Firstly, slope units are generated to establish spatial adjacency of landslide data. After obtaining the data in the slope units, the multicollinearity between data is eliminated by variance inflation factor method (VIF) and principal component analysis (PCA). Then, the spatial kernel function of GWR model is determined, and finally the GWR model based on slope unit (S-GWR) is established for landslide susceptibility evaluation.
The last step is model validation, including applying the S-GWR model to the actual area and generating the landslide susceptibility. The results are then used for validation and compared with the grid-unit GWR model (G-GWR) and other evaluation models including ordinary least squares regression (OLS), artificial neural network (ANN), and information models (I).

Dataset Preparation
The basic data utilized in the study include the digital elevation model (DEM) of the study area with a spatial resolution of 10 m; a geological map with the scale of 1:100,000; aerial photographs taken by Ministry of Land and Resources. After field identification and interpretation of the color aerial photographs, 973 landslide points are identified, and the distribution is as shown in Figure 1.
Owing to that a variety of factors make a contribution to the landslide occurrence, an adequate selection of the effective parameters is essential to identify areas prone to landslide. We choose these factors in accordance with the well-illustrated criterion in many previous studies (as listed in Table 1). In detail, firstly, the actual conditions of research area are considered. The study area has a complex topography, so topographic factors, such as slope and slope height, provide terrain conditions Appl. Sci. 2020, 10, 1107 5 of 16 for landslides. The alternating appearance of soft and hard lithology provides a material basis for the development of landslides. Besides, there are two large faults across the study area, and the landslide frequency may increase as the distance to a major or minor fault decreased [26]. There are also many river systems in the study area. In the dense-river-areas, with the increase of catchment area and reduction of erosion basis, the shearing force of river flow is enhanced, which gives rise to the steep gorges in the downstream and provides premise conditions for the growth of geological disasters. Table 1. Criteria for a rational selection of landslide factors on the basis of literature review.

Criteria
Former Studies Using the Same Criterion for Landslide Susceptibility Assessment Based on the analysis of the research area, the factors commonly used in previous studies on landslide susceptibility are further referenced (Table 1), and 7 variables (elevation, slope, slope height, aspect, distance-to-stream, distance-to-fault, and lithology) are finally identified for modeling. All these factors are processed within ArcGIS. Slope, elevation, slope height, and aspect are calculated using DEM. All data are directly obtained by ArcGIS without data transformation. Lithology map is digitized from the existing geological map. Lithology map is digitized from the existing geological map. Distance-to-fault is obtained by calculating the distance to the nearest fault, and distance-to-stream is the distance to the nearest stream.

Establishment of Spatial Proximity
Spatial proximity is the distance relationship between two units in space. GWR model establishes a model for each unit according to the spatial proximity, and the areas within a unit are considered homogeneous. Therefore, it is essential to determine the spatial adjacency relationship between units, i.e., the boundary of map units. Administrative boundaries [25] and grid boundaries [26] were commonly used as the spatial proximity expressions of GWR in previous studies, but they are inconsistent with the neighborhood characteristics of landslides. These boundaries could not perform well to express the heterogeneity between units and the homogeneity within units, affecting the modeling ability of GWR model. Therefore, we incorporate the methodology of slope units to express spatial proximity. Slope unit is the basic unit of geological disasters. It divides the terrain into mapping units with similar hydrological and geomorphological conditions, and is shaped by similar processes occurring in the natural landscape under the same geo-environmental conditions [37]. A slope unit division map is formed by the GIS-based hydrologic analysis tool [38]. Firstly, reverse DEM is generated by subtracting the elevation value from the highest elevation value in each unit. Secondly, fill the DEM and reverse DEM, and the flow direction can be obtained by these filled DEMs. Then, by setting the minimum number of cells that flow to the calculating point, the watershed can be calculated. Eventually, by combining the watershed by DEM and the watershed by reverse DEM, slope units can be obtained.

Elimination of Multicollinearity
Collinearity among pairs of explanatory variables or multicollinearity among more than two variables in regression analysis is known to cause problems such as parameter estimate instability, unintuitive parameter signs, high coefficient of determination (R 2 ) diagnostics despite few or no significant parameters, and others [28]. Due to the huge and complex landslide data, the complicated relationship between geological and topographical factors should be considered. Therefore, the multicollinearity elimination in landslide susceptibility assessment is very important. Correlations in GWR parameters, both within a set of local parameter estimates for all locations (global multicollinearity) and among different parameter estimates at each location (local multicollinearity), are the symptom of multicollinearity among explanatory variables [28].
Global multicollinearity of the entire area can be easily distinguished by the variance inflation factor (VIF) method [29], which measures the effect of collinearity on the estimated variance of a regression coefficient. Local multicollinearity is the linear dependencies in the design matrix of local regression model. Principal component analysis (PCA) [30] is adopted to eliminate local multicollinearity since the diagnosis of local multicollinearity is very complicated. Through data transformation and processing, the influencing factors of landslide susceptibility can be grouped into less integrated factors, which not only maintains the main information of original factors, but also weakens the correlation among them. First obtained initial factor loading matrix. There are two main principles for selecting the principal component: (1) the principal component eigenvalue is greater than 1; (2) the cumulative contribution rate of principal components reaches 80%. Then, the factor rotation is performed so that the obtained factors have clear professional interpretation significance. Quartimax method [39], a common factor rotation method, was used for factor rotation.

GWR Modeling
GWR model extends the ordinary least squares regression (OLS) [40] by weighting the spatial dependence [41]. Based on the established spatial proximity, the coefficients of the model are estimated for each unit, and the value and symbol of the coefficients vary at different units [16]. This model is in the form of Equation (1) where (µ i , v i ) represents the coordinates of an ith unit in space and p is the number of independent factors. x ij is the jth independent variable of the ith unit. β 0 (µ i , v i ) is the intercept parameter in position i and ε i is the random error. β ij (µ i , v i ) is the local regression coefficient for the jth explanatory variable in position i, which varies with the change of spatial position and is a very important parameter for the embodiment of spatial non-stationary. The establishment of the weight kernel function is an important step of GWR, which is used to determine the scope and degree of spatial dependence. The establishment process includes the selection of the type of kernel function and the determination of its bandwidth, and previous studies have found that the latter has a greater impact on the result of GWR than the former [42]. This paper used a common kernel function, Gauss kernel function [43], as the type of kernel function. Its function form is where w ij is the weight for unit j in the neighborhood of unit i, and d ij is the distance between the center point of the unit i and j as the measurement of spatial proximity degree. b is the bandwidth of the Guass kernel function. Many approaches are available for determining the bandwidth, including cross-validation (CV) method [42], Akaike information criterion (AIC) method [44]. Compared with CV method, AIC method is easier to avoid over-fitting problems, and the selected optimal model is often more effective. Therefore, the AIC method was adopted, and the bandwidth corresponding to the weight function with the minimum AIC value is the optimal bandwidth. Equation (3) is used for the calculation of AIC, AIC = 2n ln(σ) + n ln(2π) + n n + tr(S) n − 2 − tr(S) where tr(S) is the function of b, andσ is the maximum likelihood estimation for GWR model.

Validation Processes
The GEZM validation process is important; without this, the study lacks scientific credibility. In this research, we first compare the prediction results with actual units free of or containing landslides [45]. Sample points with 30% of the number of units, accounting for totally 291 landslide points, are randomly selected. The values of the predicted result belonging to the low-susceptibility and very-low-susceptibility area are regarded as the stable points and the others are as the unstable points. Then, calculate the missing rate, which is the ratio of actual unstable slopes classified as stable slopes.
Although the missing rate is one indicator that evaluates the model results, the results are subject to the threshold value, which limits the accuracy of results [46]. Therefore, we introduced ROC curve [47] and success rate curve [34] for further evaluation. The two methods are independent of the specific decision threshold and are able to further verify the accuracy of the result. ROC curve method is a measure of the ability to discriminate landslides from non-landslide locations. Success rate curve, different from the ROC curve, only considers the prediction of the landslide samples. The areas under the curve (AUC) for ROC curve and success rate curve (0.5 to 1.0) are used to assess the accuracy of the models.

Landslide Susceptibility Map Using the Proposed Model
Based on the hydrologic analysis tool in ArcGIS environment, we segment the topography of the study area into 55,899 slope units in total. The schematic diagram of slope units is shown in Figure 3.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 16 corresponding to the weight function with the minimum AIC value is the optimal bandwidth. Equation (3) is used for the calculation of AIC, where is the function of , and is the maximum likelihood estimation for GWR model.

Validation Processes
The GEZM validation process is important; without this, the study lacks scientific credibility. In this research, we first compare the prediction results with actual units free of or containing landslides [45]. Sample points with 30% of the number of units, accounting for totally 291 landslide points, are randomly selected. The values of the predicted result belonging to the low-susceptibility and verylow-susceptibility area are regarded as the stable points and the others are as the unstable points. Then, calculate the missing rate, which is the ratio of actual unstable slopes classified as stable slopes.
Although the missing rate is one indicator that evaluates the model results, the results are subject to the threshold value, which limits the accuracy of results [46]. Therefore, we introduced ROC curve [47] and success rate curve [34] for further evaluation. The two methods are independent of the specific decision threshold and are able to further verify the accuracy of the result. ROC curve method is a measure of the ability to discriminate landslides from non-landslide locations. Success rate curve, different from the ROC curve, only considers the prediction of the landslide samples. The areas under the curve (AUC) for ROC curve and success rate curve (0.5 to 1.0) are used to assess the accuracy of the models.

Landslide Susceptibility Map Using the Proposed Model
Based on the hydrologic analysis tool in ArcGIS environment, we segment the topography of the study area into 55,899 slope units in total. The schematic diagram of slope units is shown in Figure  3. Firstly, in order to test the global multicollinearity, the variance inflation factor (VIF) value of each influencing factor is calculated (results shown in Table 2). A greater VIF value indicates a greater multicollinearity of the influencing factors. A value greater than 10 denotes that multicolinearity problems may exist. In this case, elevation factor for instant, the maximum VIF value is 1.88, which Firstly, in order to test the global multicollinearity, the variance inflation factor (VIF) value of each influencing factor is calculated (results shown in Table 2). A greater VIF value indicates a greater multicollinearity of the influencing factors. A value greater than 10 denotes that multicolinearity Appl. Sci. 2020, 10, 1107 8 of 16 problems may exist. In this case, elevation factor for instant, the maximum VIF value is 1.88, which is apparently lower than 10. Table 2 indicates that all of the influencing factors used in the proposed S-GWR model pass the global multicollinearity test. Then, the local multicollinearity problem is subsequently processed with PCA method. According to the principles of selecting principal components, four principal components are selected in our study, and the components explained 81.10% of the total variances. After the factor rotation, four new principal components were obtained. As is shown in Table 3, the accumulative loadings of four principal components decrease systematically. The first component can be interpreted as geological factor, as it has high positive loadings of distance-to-fault and lithology. The second component has high positive loadings of slope and slope height, indicating that this component represents slope shape factor. In the same way, the third and the fourth component represent hydrographic factor and aspect factor, respectively. The S-GWR model is constructed using the Gaussian kernel function and the AIC method. The neighbors of the model estimated by the AIC method is 1000. According to the prediction results, landslide susceptibility of Qingchuan County is mapped. The landslide susceptibility map is classified into five classes based on the natural breaks method [48], including very low, low, moderate, high, and very-high (shown in Figure 4). The landslide susceptibility map shows that the high-susceptibility area and very-high-susceptibility area are highly consistent with the actual landslide points, and very-low-to very-high-susceptibility classes occupy 46.37%, 25.67%, 17.62%, 8.69%, and 1.65% of study area. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 16

Validation of the Spatial Non-Stationary
In order to validate the spatial non-stationary of the S-GWR model, we explore the spatial distribution of the regression coefficients between four principal components and landslide susceptibility. Results are shown in Figure 5. The coefficients estimated by S-GWR vary greatly with districts in space. This result implies a noticeable spatial non-stationarity of the relationship between four principal components and landslide susceptibility. The regression coefficients of each component have positive and negative values, which indicates that the relevance between each influencing factor and landslide susceptibility has varying direction and strength in space.
Spatial distribution of the regression coefficient of geological factors is approximately from northeast to southwest, similar to the distribution of actual lithology features and fault zones. The coefficient values of geological factors tend to decrease with the increase of the distance to fault zone and the large absolute values are mainly distributed in the metasandstones, phyllite, and sandy slates area. The distribution area with large absolute values of the coefficients of slope shape factor consist of the actual landslide points. In most of the eastern part of the study area, slope shape factor is positively correlated with landslide susceptibility, but the opposite effect exists in some central regions. The majority of the coefficient of hydrographic factor is negative, indicating that there is a significant negative correlation between the distance-to-stream and landslide susceptibility in most regions. This correlation is quite obvious in the central area, where most of the landslides occurred, such as Chaoba township, Courtyard Hui township, and the northeast area near the Bailong river.
The aspect coefficient distribution map shows that the absolute value in the eastern region is larger than that in the western region. Compared with the first three principal components, the difference between the maximum and minimum values of aspect factor coefficients is the smallest.

Validation of the Spatial Non-Stationary
In order to validate the spatial non-stationary of the S-GWR model, we explore the spatial distribution of the regression coefficients between four principal components and landslide susceptibility. Results are shown in Figure 5. The coefficients estimated by S-GWR vary greatly with districts in space. This result implies a noticeable spatial non-stationarity of the relationship between four principal components and landslide susceptibility. The regression coefficients of each component have positive and negative values, which indicates that the relevance between each influencing factor and landslide susceptibility has varying direction and strength in space.
Spatial distribution of the regression coefficient of geological factors is approximately from northeast to southwest, similar to the distribution of actual lithology features and fault zones. The coefficient values of geological factors tend to decrease with the increase of the distance to fault zone and the large absolute values are mainly distributed in the metasandstones, phyllite, and sandy slates area. The distribution area with large absolute values of the coefficients of slope shape factor consist of the actual landslide points. In most of the eastern part of the study area, slope shape factor is positively correlated with landslide susceptibility, but the opposite effect exists in some central regions. The majority of the coefficient of hydrographic factor is negative, indicating that there is a significant negative correlation between the distance-to-stream and landslide susceptibility in most regions. This correlation is quite obvious in the central area, where most of the landslides occurred, such as Chaoba township, Courtyard Hui township, and the northeast area near the Bailong river.
The aspect coefficient distribution map shows that the absolute value in the eastern region is larger than that in the western region. Compared with the first three principal components, the difference between the maximum and minimum values of aspect factor coefficients is the smallest. Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 16

Validation of the Accuracy
In order to validate the accuracy and effectiveness of the proposed S-GWR model, we compare the prediction results with actual units free of or containing landslides. Table 4 shows the prediction result of the proposed S-GWR model. Among the 291 actual landslide points, 36 points are classified as stable points with a missing rate of 12.37%. ROC curve and success rate curve are used for further evaluation. Figure 6 shows that the AUC value of ROC curve is 0.859, and the AUC value of success rate curve is 0.850.

Validation of the Accuracy
In order to validate the accuracy and effectiveness of the proposed S-GWR model, we compare the prediction results with actual units free of or containing landslides. Table 4 shows the prediction result of the proposed S-GWR model. Among the 291 actual landslide points, 36 points are classified as stable points with a missing rate of 12.37%. ROC curve and success rate curve are used for further evaluation. Figure 6 shows that the AUC value of ROC curve is 0.859, and the AUC value of success rate curve is 0.850.

Comparison with Grid-Unit-Based GWR (G-GWR)
In order to verify the applicability of slope units in expressing spatial proximity of the GWR, the commonly used grid unit is adopted as a comparison. For the validity of contrast and simplicity of calculation, grid size is selected as 200 × 200 m, approximately equal to the slope unit, generating 80,264 grid units.
The following procedure is the same as the process of the proposed S-GWR. The variance inflation factor (VIF) values of each influencing factor are calculated, with the maximum test result of 2.14 (as shown in Table 5). It indicates that the original G-GWR passes the global multicollinearity test. Based on the principles of selecting principal components, G-GWR also obtained four components, including geological factor, slope shape factor, hydrographic factor, and aspect factor. The establishment of kernel function of G-GWR is the same as S-GWR. The established G-GWR model is then applied to the landslide susceptibility assessment in Qingchuan County. The sample evaluation contingency table of the G-GWR model is shown in Table  6. It shows that 61 points of the 291 actual landslide points are classified as stable points, with a missing rate of 20.96%, which is higher than that of the S-GWR model with 12.37%. The ROC curves

Comparison with Grid-Unit-Based GWR (G-GWR)
In order to verify the applicability of slope units in expressing spatial proximity of the GWR, the commonly used grid unit is adopted as a comparison. For the validity of contrast and simplicity of calculation, grid size is selected as 200 × 200 m, approximately equal to the slope unit, generating 80,264 grid units.
The following procedure is the same as the process of the proposed S-GWR. The variance inflation factor (VIF) values of each influencing factor are calculated, with the maximum test result of 2.14 (as shown in Table 5). It indicates that the original G-GWR passes the global multicollinearity test. Based on the principles of selecting principal components, G-GWR also obtained four components, including geological factor, slope shape factor, hydrographic factor, and aspect factor. The establishment of kernel function of G-GWR is the same as S-GWR. The established G-GWR model is then applied to the landslide susceptibility assessment in Qingchuan County. The sample evaluation contingency table of the G-GWR model is shown in Table 6. It shows that 61 points of the 291 actual landslide points are classified as stable points, with a missing rate of 20.96%, which is higher than that of the S-GWR model with 12.37%. The ROC curves and success rate curves of the two models are shown in Figure 7. The AUC values of ROC curve and success rate curve of G-GWR are 0.837 and 0.827 respectively, both lower than those of S-GWR. The comparative verification of the S-GWR model and the G-GWR model shows that the S-GWR model has a better performance in classification precision, thus can better identify susceptible areas to landslide occurrence. This comparison indicates that slope unit is more suitable for GWR model in the expression of spatial relationship than traditional grid unit, in particular for the purpose of landslide susceptibility assessment. has a better performance in classification precision, thus can better identify susceptible areas to landslide occurrence. This comparison indicates that slope unit is more suitable for GWR model in the expression of spatial relationship than traditional grid unit, in particular for the purpose of landslide susceptibility assessment.

Comparison with OLS, ANN, and I Models
In order to further verify the applicability and accuracy of the landslide susceptibility assessment using the GWR model, we choose the ordinary least squares (OLS) model [40], artificial neural network (ANN) model [32], and information (I) model [49] as compared models, which are commonly used for the susceptibility assessment. The slope units and grid units are adopted as the sampling units of the three models respectively. The ROC curve and success rate curve of eight models are shown in Figures 8 and 9. It can be seen from the figures that for the same unit type, the AUC values of ROC curve and success rate curve of GWR models are both higher than those of ANN, I, and OLS models. It shows that the GWR model performs better, and further implies the importance of spatial non-stationarity in landslide data sets. As to the results by the same model via different unit types, the AUC values of ROC curve and success rate curve of the models with slope unit are all higher than those of the models with grid unit. It indicates that adopting slope unit as mapping unit can express the landslide characteristics better than the traditional grid unit, highlighting that slope unit is more suitable for landslide assessment.

Comparison with OLS, ANN, and I Models
In order to further verify the applicability and accuracy of the landslide susceptibility assessment using the GWR model, we choose the ordinary least squares (OLS) model [40], artificial neural network (ANN) model [32], and information (I) model [49] as compared models, which are commonly used for the susceptibility assessment. The slope units and grid units are adopted as the sampling units of the three models respectively. The ROC curve and success rate curve of eight models are shown in Figures 8 and 9. It can be seen from the figures that for the same unit type, the AUC values of ROC curve and success rate curve of GWR models are both higher than those of ANN, I, and OLS models. It shows that the GWR model performs better, and further implies the importance of spatial non-stationarity in landslide data sets. As to the results by the same model via different unit types, the AUC values of ROC curve and success rate curve of the models with slope unit are all higher than those of the models with grid unit. It indicates that adopting slope unit as mapping unit can express the landslide characteristics better than the traditional grid unit, highlighting that slope unit is more suitable for landslide assessment.

Conclusions
In this paper, a spatial proximity-based geographically weighted regression (S-GWR) model is proposed for assessing the landslide susceptibility. The presented model solves the issues of the spatial non-stationarity between landslide factors and its occurrence that usually neglected in previous landslide susceptibility assessment studies. In order to express spatial proximity properly, slope units are adopted. The multicollinearity between the data is eliminated through VIF method and principal component analysis.
The Qingchuan area in China after the 2008 Ms. 8.0 Wenchuan earthquake is selected as a case study to illustrate the effect and validity of the proposed S-GWR model. The result shows that the four influencing factors (including geological factor, slope shape factor, hydrographic factor, and

Conclusions
In this paper, a spatial proximity-based geographically weighted regression (S-GWR) model is proposed for assessing the landslide susceptibility. The presented model solves the issues of the spatial non-stationarity between landslide factors and its occurrence that usually neglected in previous landslide susceptibility assessment studies. In order to express spatial proximity properly, slope units are adopted. The multicollinearity between the data is eliminated through VIF method and principal component analysis.
The Qingchuan area in China after the 2008 Ms. 8.0 Wenchuan earthquake is selected as a case study to illustrate the effect and validity of the proposed S-GWR model. The result shows that the four influencing factors (including geological factor, slope shape factor, hydrographic factor, and

Conclusions
In this paper, a spatial proximity-based geographically weighted regression (S-GWR) model is proposed for assessing the landslide susceptibility. The presented model solves the issues of the spatial non-stationarity between landslide factors and its occurrence that usually neglected in previous landslide susceptibility assessment studies. In order to express spatial proximity properly, slope units are adopted. The multicollinearity between the data is eliminated through VIF method and principal component analysis.
The Qingchuan area in China after the 2008 Ms. 8.0 Wenchuan earthquake is selected as a case study to illustrate the effect and validity of the proposed S-GWR model. The result shows that the four influencing factors (including geological factor, slope shape factor, hydrographic factor, and aspect factor) in the S-GWR model all have noticeable spatial non-stationary effects on the landslide susceptibility. By quantitatively testing, the missing rate of S-GWR is 12.37%, which is much lower than that of G-GWR of 20.96%, and the AUC values of ROC curve and success rate curve of S-GWR (0.859 and 0.850, respectively) are both higher than those of G-GWR (0.837 and 0.827, respectively). Besides, the AUC values of the ROC curve and success rate curve of the GWR models are higher than those of ANN, I, and OLS models, and the accuracy of each model using slope unit is higher than those of the grid unit with similar cell size.
Our study verifies the importance of considering the spatial non-stationary and the applicability of the GWR model in the landslide susceptibility assessment. It also suggests that slope unit can better express the spatial relationship between landslide data and make the evaluation results more accurate.