Source Identification and Apportionment of Trace Elements in Soils in the Yangtze River Delta, China

Trace elements pollution has attracted a lot of attention worldwide. However, it is difficult to identify and apportion the sources of multiple element pollutants over large areas because of the considerable spatial complexity and variability in the distribution of trace elements in soil. In this study, we collected total of 2051 topsoil (0–20 cm) samples, and analyzed the general pollution status of soils from the Yangtze River Delta, Southeast China. We applied principal component analysis (PCA), a finite mixture distribution model (FMDM), and geostatistical tools to identify and quantitatively apportion the sources of seven kinds of trace elements (chromium (Cr), cadmium (Cd), mercury (Hg), copper (Cu), zinc (Zn), nickel (Ni), and arsenic (As)) in soil. The PCA results indicated that the trace elements in soil in the study area were mainly from natural, multi-pollutant and industrial sources. The FMDM also fitted three sub log-normal distributions. The results from the two models were quite similar: Cr, As, and Ni were mainly from natural sources caused by parent material weathering; Cd, Cu, and Zu were mainly from mixed sources, with a considerable portion from anthropogenic activities such as traffic pollutants, domestic garbage, and agricultural inputs, and Hg was mainly from industrial wastes and pollutants.


Introduction
Trace elements pollution has attracted a lot of attention around the world [1][2][3][4][5]. There is much concern about trace elements-contaminated soils because of their high toxicity and resistance to degradation [6]. Trace elements pose high risks to health and ecosystem function when introduced to the human body and ecosystems via food chains, respectively [7][8][9]. The sources of soil trace elements are generally natural or anthropogenic [10,11]. Formerly, trace elements were mainly introduced to soils through weathering of parent materials but with rapid increases in urban, industrial, and agricultural activities, such as electronic plating, fossil fuel combustion, and agricultural chemical fertilizer abuse, in recent years, the high trace element concentrations in soil now mainly reflect human activities [3,12]. The geographical distribution of trace elements in soil is complex [7]. In order to repair polluted soils and prevent further soil pollution, we need to qualitatively and quantitatively identify and apportion the sources of trace elements. This information can help decision-makers understand how trace elements vary spatially and why their concentrations fluctuate erratically which is a necessary prerequisite to mitigating and preventing soil pollution [13].
Identifying and apportioning the sources of soil trace elements dates back to the 1970s [14]. Various methods, including chemical mass balance (CMB) [15], molecular markers (MM) [16], isotope tracing (IT) [17], UNMIX models [18], factor analysis (FA) [19], positive definite matrix factor analysis (PMF) [20], and multiple linear regression (APCS-MLR) [21], are currently used to identify the sources of trace elements in soil. Applications of such models are generally limited by their rigorous demands for preliminary information and numerous datasets comprising high resolution data, for example, about the number of pollution sources, properties of soils from the source areas, and the comparative stability of source transport processes [22,23].
Principle component analysis (PCA) is a widely-used multi-statistical analysis method whose purpose is to switch multidimensional data into a couple of relevant variables that simultaneously maintain the information involved in the original variables and diminish the dimensions of the data [24][25][26]. While finite mixture distribution model (FMDM) is a mathematical approach for statistical modeling of massive random data sets [27]. It can be applied to identify whether the trace elements from soil samples originate from a natural background or anthropogenic background, without knowing the general contaminated condition of the full areal extent of the study area [28,29]. But, to date, few studies have used and compared different models to identify the sources of trace elements. To fill this gap, in our study, we compared the abilities of PCA and FMDM to identify and apportion the sources of trace elements in soils in the Yangtze River Delta. The main objectives of this study were to (1) describe and estimate the concentrations and distribution of trace elements in soil and assess the pollution status of the Yangtze River Delta; (2) apply PCA and FMDM to determine the sources of seven trace elements over a large area; and (3) to confirm and apportion the main sources of trace elements in soil.

Study Area and Sampling
The study area is a crucial port in the Yangtze River Delta, China ( Figure 1). The area is dominated by plains and low hills. It has an average elevation of about 4.2 m and the elevation generally decreases from the southwest to the northeast. A total of 2051 topsoil (0-20 cm) samples were collected from the study area, which was first divided into strata according to land use type, and systematic grid sampling was applied. At some of the grid nodes, grid sampling was augmented by sampling nearby areas ( Figure 1). We collected a soil sample at an intersection point and combined it with five subsamples collected from five locations within 5 m. The coordinates of the sampling locations were recorded with a differential global positioning system.
We air-dried the soil samples in the lab in ambient air for several days, then passed them through a 2-mm nylon sieve and save for further analysis of soil properties [30]. Soil pH was measured by the Glass Electrode method with a soil/solution of 1:2.5 (m/v); The total concentration of Cr, As, Cu, Cu, Zn, and Ni of soil samples were all digested with the acid HCl-HNO 3 -HClO 4 and total Cd was acid-digested with HF-HNO 3 -HClO 4 ; As for total Hg, it was digested by a double channel Atomic Fluorescence Spectrometer with the digestion of HNO3-HCl bathing in the hot water. We use reagent blanks and standard reference in the whole analysis procedure for quality assurance and quality control, and the recovery ranges of the trace elements were from 90 to 110% [31].

Auxiliary Variables Data
The study area is known for its industrial prosperity, and its total industrial output value accounted for 45% of its whole economic output value in 2016. We obtained information of 48,206 industrial enterprises in the study area from Baidu Map POI (Point of Interest) data that listed all the industrial enterprises in 2016. We removed data for irrelevant non-industrial enterprises, and then classified the industrial enterprises into four categories: textiles, metal products, chemical products, and other industries. We obtained information about the soil parent material in the study area from a 1:20,000 soil map of Zhejiang Province (1990) [32] and the soil types from a 1:50,000 Map of Chinese Soil (1990) published by the National Soil Survey of China, respectively.

Single Pollution Index (SPI)
We used the single pollution index (SPI) to assess the pollution degree of trace elements contents in soil. The SPI is calculated as follows: where Pi is the SPI of trace element i in soil, Ci is the test value of trace element i, and Si is the regulation value in China [33]. When Pi is less than or equal to 1, the content of the soil trace element i is within a safe range, while when Pi is within the range of 1 to 2, the content of the soil trace element i slightly exceeds the standard value; when Pi is within the range of 2 to 3, soils could be moderately contaminated by trace element i; and when Pi is greater than 3, soils could be severely contaminated by trace element i.

Ordinary Kriging (OK)
We used the widely-used ordinary kriging (OK) model to show the spatial variations of each soil trace element across the study area. The equation for this model is as follows [34,35]:

Auxiliary Variables Data
The study area is known for its industrial prosperity, and its total industrial output value accounted for 45% of its whole economic output value in 2016. We obtained information of 48,206 industrial enterprises in the study area from Baidu Map POI (Point of Interest) data that listed all the industrial enterprises in 2016. We removed data for irrelevant non-industrial enterprises, and then classified the industrial enterprises into four categories: textiles, metal products, chemical products, and other industries. We obtained information about the soil parent material in the study area from a 1:20,000 soil map of Zhejiang Province (1990) [32] and the soil types from a 1:50,000 Map of Chinese Soil (1990) published by the National Soil Survey of China, respectively.

Single Pollution Index (SPI)
We used the single pollution index (SPI) to assess the pollution degree of trace elements contents in soil. The SPI is calculated as follows: where P i is the SPI of trace element i in soil, C i is the test value of trace element i, and S i is the regulation value in China [33]. When P i is less than or equal to 1, the content of the soil trace element i is within a safe range, while when P i is within the range of 1 to 2, the content of the soil trace element i slightly exceeds the standard value; when P i is within the range of 2 to 3, soils could be moderately contaminated by trace element i; and when P i is greater than 3, soils could be severely contaminated by trace element i.

Ordinary Kriging (OK)
We used the widely-used ordinary kriging (OK) model to show the spatial variations of each soil trace element across the study area. The equation for this model is as follows [34,35]: where Z*(x 0 ) is the linear prediction value while Z(x i ) is the observed value. n and i denote the quantities of observed and predicted samples, respectively, and ϕ i is the optimal weight value that results in an unbiased prediction with the minimum variance.

Principle Component Analysis (PCA)
We distinguished different groups of trace elements with PCA. This tool uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables. We then used non-transformed data to calculate the correlation matrix, performed an orthogonal rotation based on the Kaiser Standard, and extracted factors with eigenvalues greater than 1 after a maximum of 25 iterations [36].

Finite Mixture Distribution Model (FMDM)
For a random variable x, if a mixture distribution consists of m components and the distribution of the ith individual component is determined by a specific probability density function (pdf) f i (x), then the general pdf f (x) for the mixture distribution can be expressed as [27,37]: where π i denotes the mixed weights of every sub-distribution. Many natural processes follow a normal distribution or a log-normal distribution. Here, we used a log-normal distribution as the pdf to describe the trace element content of soils from different sources [38], as follows: where µ m and σ m represent the mean and standard deviation of every sub-distribution, which can be obtained using an expectation maximization algorithm [39]. We then used the Chi-square goodness-of-fit test to test the null hypothesis H 0 to confirm that the assumed model was consistent with the observed distribution. The cut-off value between the ith and (i + 1) components can be calculated after the above parameters have been determined, as follows:

Data Analysis
In this study, we did all the statistical analyses with Microsoft Excel 2016 (Office 2016, Redmond, WA, USA). We used the geostatistical analyst tool in ArcGIS 10.2 (ESRI, ArcGIS10.2, Redlands, CA, USA) for the kriging interpolation. Principal component analysis of the seven soil trace elements was carried out with the psych package of R3.4.2 [40] and the FMDM was done in R3.4.2 using Mclust package [41].

Summary Statistical Analysis of Soil Trace Elements
The average concentrations of Cr, Cd, Hg, As, Cu, Zn, and Ni in the study area were 67.72, 0.197, 0.288, 6.58, 34.77, 110.67, and 29.22 mg/kg, respectively (Table 1). Apart from Cr, the average values of the trace elements were higher than the background values; the averages of Hg, Cd, Zn, and Cu were much higher than the background values which suggest that these trace elements may have been affected by human activities [26].
The coefficient of variation (CV) reflects the degree of dispersion of the sample data [42]. The CV values of the soil trace elements in the study area were ranked as follows: Hg (104.55%) > Ni (56.96%) > Cd (50.68%) > Cu (47.99%) > Cr (43.41%) > As (40.33%) > Zn (33.0%). The uneven distribution of Hg reflects anthropogenic impacts, and the other trace elements also showed different degrees of variability.

Analysis of Auxiliary Variable Data
As shown in previous studies, industrial pollution is a major source of trace element pollution in the soil [44][45][46]. Trace elements in industrial waste water, waste gas, and waste residues enter the soil via sewage irrigation, garbage dumping, and atmospheric deposition, resulting in high trace element concentrations in the soil [46][47][48]. Analysis of the density of all the industrial enterprises ( Figure 2) in the study area shows that the density was highest for the metal product enterprises, followed by chemical product enterprises, then textiles, and was lowest for other enterprises, and that most enterprises were clustered around the urban city areas.  Rhyolitic, tuffaceous residual slope faces dominated the parent materials (Table 3 and Figure 3a) in the study area. Lakes and marshes extended over large parts of the north-central part of the study area, while coastal sedimentary parent materials covered large areas of the northern and southern parts. The main soil types (Table 4 and Figure 3b) were divided into paddy, fluvo-aquic, red, coastal As Table 2 indicates, the Cr and Ni concentrations increased as the quantity of textile enterprises increased, reflecting the quantitative impacts of the textile industry on these two trace elements. Similarly, chemical product enterprises influenced the concentrations of Hg, Cu, Zn, and Ni; metal product enterprises influenced the concentrations of Cr, Hg, Cu, and Zn, and other enterprises influenced the Ni, Hg, and Cu concentrations. In general, as the total number of all enterprises increased, the contents of all seven trace elements increased dramatically, which shows that industrial activities were a major contributor to trace elements in soils. Rhyolitic, tuffaceous residual slope faces dominated the parent materials (Table 3 and Figure 3a) in the study area. Lakes and marshes extended over large parts of the north-central part of the study area, while coastal sedimentary parent materials covered large areas of the northern and southern parts. The main soil types (Table 4 and Figure 3b) were divided into paddy, fluvo-aquic, red, coastal saline, skeletal, yellow, and purple soils. Paddy soils and red soils dominated the study area; red soils were distributed across the entire study area and paddy soils were mainly distributed over most of the central and northern parts. Fluvo-aquic and coastal saline soils were mostly distributed in the northern coastal areas, and skeletal soils were distributed in the central and southern regions.

Assessment of Trace Element Pollution
The SPI values showed that, apart from Hg, the soils were only slightly polluted by trace elements. The soils from more than 90% of the sampling points had Pi values less than or equal to 1 (Table 5), which indicates that the other 6 trace elements did not pose much risk to the environment or ecology. There are obvious signs of Hg contamination in the study area and, with 18.97%, 6.05%, and 5.70% of the soils lightly, moderately, and severely polluted, respectively, around 30% of the soils from the study area were adversely impacted by Hg.

Assessment of Trace Element Pollution
The SPI values showed that, apart from Hg, the soils were only slightly polluted by trace elements. The soils from more than 90% of the sampling points had P i values less than or equal to 1 (Table 5), which indicates that the other 6 trace elements did not pose much risk to the environment or ecology. There are obvious signs of Hg contamination in the study area and, with 18.97%, 6.05%, and 5.70% of the soils lightly, moderately, and severely polluted, respectively, around 30% of the soils from the study area were adversely impacted by Hg.

Spatial Distribution of Soil Trace Elements
The general spatial distribution of trace elements in soils in the study area is presented in Figure 4. Cr, As, and Ni concentrations were highest in the central urban areas and the coastal areas (Figure 4a,d,g). It is revealed by Table 3 that the main sources of Cr, As, and Ni were aleurite and silt face (with silt face as the dominant source in the coastal areas), and the fact that the concentrations of these trace elements were highest in the urban centers indicates that they were closely related to anthropogenic activities. As shown in Figure 4b,f, areas with high Cd and Zn concentrations were relatively scattered and were mainly concentrated in the northern and central urban areas. The chemical industries probably influenced the Cd and Zn concentrations ( Table 2). Concentrations of Hg and Cu were highest in the central parts of the urban areas (Figure 4c,f), and close to the metal and textile industries ( these trace elements were highest in the urban centers indicates that they were closely related to anthropogenic activities. As shown in Figure 4b,f, areas with high Cd and Zn concentrations were relatively scattered and were mainly concentrated in the northern and central urban areas. The chemical industries probably influenced the Cd and Zn concentrations ( Table 2). Concentrations of Hg and Cu were highest in the central parts of the urban areas (Figure 4c,f), and close to the metal and textile industries ( Table 2).

Source Identification Based on PCA
The Cattell's scree plot ( Figure 5) shows that the seven trace elements fell into three main components, while the PCA ( Table 6) results show that the cumulative contribution rate of the three main components was up to 70.0%. The first principal component (PC1) contained three trace elements, Cr, As, and Ni; the second principal component (PC2) contained Cd, Cu, and Zn, and the third component (PC3) represented Hg. PC1, PC2, and PC3 accounted for 30%, 25%, and 15%, respectively.
The contribution of PC1 was up to 30% (Table 6), and had factor loadings of 0.88, 0.73, and 0.87 for Cr, As, and Ni, respectively. This suggests Cr, As, and Ni had the same pollution source. The OK spatial interpolation map (Figure 6a) shows that PC1 gradually decreased from the coast to the inland area, where the high-value areas were mainly confined to the city centers, the northeast, and the southern coastal areas. This trend greatly matches previous spatial distribution analysis, so these trace elements were mainly from natural sources.

Source Identification Based on PCA
The Cattell's scree plot ( Figure 5) shows that the seven trace elements fell into three main components, while the PCA ( Table 6) results show that the cumulative contribution rate of the three main components was up to 70.0%. The first principal component (PC1) contained three trace elements, Cr, As, and Ni; the second principal component (PC2) contained Cd, Cu, and Zn, and the third component (PC3) represented Hg. PC1, PC2, and PC3 accounted for 30%, 25%, and 15%, respectively.
The contribution of PC1 was up to 30% (Table 6), and had factor loadings of 0.88, 0.73, and 0.87 for Cr, As, and Ni, respectively. This suggests Cr, As, and Ni had the same pollution source. The OK spatial interpolation map (Figure 6a) shows that PC1 gradually decreased from the coast to the inland area, where the high-value areas were mainly confined to the city centers, the northeast, and the southern coastal areas. This trend greatly matches previous spatial distribution analysis, so these trace elements were mainly from natural sources.  The contribution of PC2 was 25% (Table 6); Cd, Cu, and Zn had factor loadings of 0.81, 0.72, and 0.82, respectively, which suggest a common pollution source. Alternatively, because Zn and Cu contributed as much as 0.45 and 0.28 to the first principal component, Zn and Cu may have shared the sources for PC1. The areas with high values for PC2 (Figure 6b) were scattered across the study area, with clusters in the northwestern, southern, and central urban parts of the study area. This suggests that PC2 represented mixed pollutant sources, such as traffic exhaust gases, domestic garbage, and agricultural inputs.
The contribution rate from PC3 was 15%, and the factor loading of Hg was 0.67. The OK interpolation map of PC3 (Figure 6c) shows that the areas with high values were mainly distributed in the central parts of the urban areas, which perfectly matches the spatial distribution map. This shows that industrial pollution was the main source of Hg.

Source Identification Based on FMDM
In general, the FMDM results showed that Cr, As, Hg, and Ni conformed to the log-normal mixture distribution; Cd, Cu, and Zn conformed to the log-normal distribution, and the seven trace elements passed the significance level test (p > 0.05).
Cd, Cu, and Zn fitted the single log-normal distribution model as shown in Figure 7b,e,f. The FMDM was not able to clearly identify whether these trace elements had natural or anthropogenic sources, because their sources did not significantly differ throughout the entire study area; the modelled average concentrations were higher than the soil background values, which suggests unnatural sources. So, these three trace elements may have derived in parts from human activities that generate multiple pollutants, including urban development, population increases, domestic garbage, traffic pollution, and agricultural inputs.
The concentrations of Cr, As, and Ni were all consistent with the double log-normal distribution model (Figure 7a,d,g), and should fall within the natural and anthropogenic distributions in theory. As shown in Table 7, the means of Cr, As, and Ni from the natural distribution were lower than the The contribution of PC2 was 25% (Table 6); Cd, Cu, and Zn had factor loadings of 0.81, 0.72, and 0.82, respectively, which suggest a common pollution source. Alternatively, because Zn and Cu contributed as much as 0.45 and 0.28 to the first principal component, Zn and Cu may have shared the sources for PC1. The areas with high values for PC2 (Figure 6b) were scattered across the study area, with clusters in the northwestern, southern, and central urban parts of the study area. This suggests that PC2 represented mixed pollutant sources, such as traffic exhaust gases, domestic garbage, and agricultural inputs.
The contribution rate from PC3 was 15%, and the factor loading of Hg was 0.67. The OK interpolation map of PC3 (Figure 6c) shows that the areas with high values were mainly distributed in the central parts of the urban areas, which perfectly matches the spatial distribution map. This shows that industrial pollution was the main source of Hg.

Source Identification Based on FMDM
In general, the FMDM results showed that Cr, As, Hg, and Ni conformed to the log-normal mixture distribution; Cd, Cu, and Zn conformed to the log-normal distribution, and the seven trace elements passed the significance level test (p > 0.05).
Cd, Cu, and Zn fitted the single log-normal distribution model as shown in Figure 7b,e,f. The FMDM was not able to clearly identify whether these trace elements had natural or anthropogenic sources, because their sources did not significantly differ throughout the entire study area; the modelled average concentrations were higher than the soil background values, which suggests unnatural sources. So, these three trace elements may have derived in parts from human activities that generate multiple pollutants, including urban development, population increases, domestic garbage, traffic pollution, and agricultural inputs.  The Hg concentrations fit the triple log-normal distribution mode, as shown in Table 7 and Figure 7c. A natural distribution and two anthropogenically-influenced distributions were identified. The modelled mean of natural distribution was below the soil background value, and the modelled means of both anthropogenically-influenced distributions significantly exceeded the soil background value and its maximum range (Table 1), showing that anthropogenic industrial activities were a major influence on the content of Hg in soils in the study area.

Conclusions
Source identification and apportionment of soil trace elements have always been important prerequisites for treating trace element-contaminated soils and preventing further contamination. In The concentrations of Cr, As, and Ni were all consistent with the double log-normal distribution model (Figure 7a,d,g), and should fall within the natural and anthropogenic distributions in theory. As shown in Table 7, the means of Cr, As, and Ni from the natural distribution were lower than the soil background values ( Table 7). The means of the anthropogenic distributions exceeded the soil background values, but were still within the ranges of the background values (Table 1); the FMDM therefore identified two sub-distributions as a low-value natural distribution and a high-value natural and anthropogenic mixed distribution. This suggests that Cr, As, and Ni may have been influenced by anthropogenic activities, but their impacts were not much more than the impacts from a high-value natural background. In such cases, the effects of these trace elements from anthropogenic sources would not be apparent. The Hg concentrations fit the triple log-normal distribution mode, as shown in Table 7 and Figure 7c. A natural distribution and two anthropogenically-influenced distributions were identified. The modelled mean of natural distribution was below the soil background value, and the modelled means of both anthropogenically-influenced distributions significantly exceeded the soil background value and its maximum range (Table 1), showing that anthropogenic industrial activities were a major influence on the content of Hg in soils in the study area.

Conclusions
Source identification and apportionment of soil trace elements have always been important prerequisites for treating trace element-contaminated soils and preventing further contamination. In this study, we quantitatively identified the sources of seven trace elements in the Yangtze River Delta using PCA and FMDM. Our results showed that Cr, As, and Ni in soil were mainly from natural sources through weathering of parent materials; Cd, Cu, and Zn were mainly from multiple mixed pollution sources, such as traffic pollutants, domestic garbage, and agricultural inputs, and Hg was mainly from industrial waste and discharge. Our study can efficiently provide valuable information for policy makers and administrators on trace elements contamination.