Acoustic–Seismic Mixed Feature Extraction Based on Wavelet Transform for Vehicle Classification in Wireless Sensor Networks

An acoustic–seismic mixed feature extraction method based on the wavelet coefficient energy ratio (WCER) of the target signal is proposed in this study for classifying vehicle targets in wireless sensor networks. The signal was decomposed into a set of wavelet coefficients using the à trous algorithm, which is a concise method used to implement the wavelet transform of a discrete signal sequence. After the wavelet coefficients of the target acoustic and seismic signals were obtained, the energy ratio of each layer coefficient was calculated as the feature vector of the target signals. Subsequently, the acoustic and seismic features were merged into an acoustic–seismic mixed feature to improve the target classification accuracy after the acoustic and seismic WCER features of the target signal were simplified using the hierarchical clustering method. We selected the support vector machine method for classification and utilized the data acquired from a real-world experiment to validate the proposed method. The calculated results show that the WCER feature extraction method can effectively extract the target features from target signals. Feature simplification can reduce the time consumption of feature extraction and classification, with no effect on the target classification accuracy. The use of acoustic–seismic mixed features effectively improved target classification accuracy by approximately 12% compared with either acoustic signal or seismic signal alone.


Introduction
Wireless sensor networks (WSNs) consist of nodes capable of sensing, signal processing, and communicating. In addition to object detection, also of interest are the inherent properties of control and activation in WSNs [1]. One application of WSNs is the classification of moving vehicles in the interested area, which is a precondition for further decision-making in military operations. Moreover, vehicle classification is an important signal processing task and has found widespread civilian applications, such as intelligent transportation systems [2] and real-time traffic surveillance [3].
Signal feature extraction is the basis of vehicle classification and significantly affects the results of classification. Feature extraction methods of target signals applied to WSNs can be classified into two categories: Data fusion algorithms and algorithms based on a special sensor signal.
The feature extraction algorithm, based on data fusion, requires at least two signal sequences of the vehicle. The Haar discrete wavelet transform (DWT)-based method [4] is a vehicle feature extraction method based on data fusion. This method was proposed after combining the multi-sensor data fusion method with the wavelet transform (WT) method. This method employs a maximum entropy measure to determine significant wavelet coefficients. Features are obtained by calculating the analysis, and denoising [15]. After wavelet composition of the signal, the wavelet coefficients represent the signal components in different frequency bands, making it feasible to differentiate signals of different targets.

Wavelet Decomposition Based on à Trous Algorithm
The wavelet decomposition method and reconstruction method are based on WT. WT decomposes a signal into wavelet coefficients by using a set of base functions [16]. A set of functions gained following basic wavelet function Ψ(t) were flexed in scale and displaced in the time domain.
where Ψ(t) is the wavelet "prototype", which can be considered a bandpass function [17], a is the contraction-expansion or scale factor, and b is the displacement factor. Instead of continuously varying the parameters, as in the case of the continuous wavelet transform [18], which contains a large amount of redundant information, we can analyze the signal with a smaller number of scales with a varying number of translations at each scale. This is the DWT process [19]. The à trous algorithm [20] is one of the most concise methods to implement the fast DWT. The algorithm utilizes the equivalent translocation property of the z-transform and performs a convolution operation of orthogonal filter coefficients and signal to calculate the wavelet coefficients of the signal sequence after inserting a certain number of zeros between the coefficients of the orthogonal filter. Figure 1 shows the scheme of the three-level wavelet decomposition based on the à trous algorithm. x(k) is the signal sequence and h j and g j are the orthogonal filter bank coefficients of the (j − 1) level in the wavelet decomposition process of the signal. cA j (k) indicates the wavelet approximation coefficients of the jth level and represents the low-frequency components of the current wavelet decomposition of the signal. cD j (k) indicates the wavelet detail coefficients of the jth level and represents the high-frequency components of the current wavelet decomposition of the signal. h j represents h 0 up-sampled by 2 j , while g j represents g 0 up-sampled by 2 j . When using the à trous algorithm to decompose the target signal, the orthogonal filter bank coefficients of the upper level are interpolated to obtain the filter coefficients of the current level first and, thereafter, the input sequence of the current level is convolved with the orthogonal filter banks to calculate the wavelet coefficients. The approximation coefficient obtained at the previous level is utilized as the input signal of the next level. The formulas for the calculation of cA j (k) and cD j (k) are as follows: h j (l)cA j−1 (k − l + 1), h j (l) = 0 while l > L j (2) cD j (k) = cD j−1 (k) ⊗ g j = k ∑ l=1 g j (l)cD j−1 (k − l + 1), g j (l) = 0 while l > L j (3) where ⊗ represents a convolution operation; k = 1, 2, . . . , K and K is the length of the signal sequence; and j = 1, 2, . . . , J and J is the desired wavelet composition depth. The lengths of the orthogonal filter bank coefficients, g i and h i , at each decomposition level are different. L j represents the length of the orthogonal filter bank coefficients at the jth level. cA 0 (k) is equal to the initial signal sequence, x(k), when calculating cA 1 (k) and cD 1 (k) using Equations (2) and (3), respectively. The length of the result sequence after the convolution operation is K + L j − 1, which indicates that the sequence length after convolution at different levels of wavelet decomposition is not the same. To simplify the calculation, we intercept a sequence of length K from the middle of the convolution result sequence at each wavelet decomposition level. Considering the wavelet decomposition of the first layer as an example, the convolution result of the signal, x(k), and decomposition high-pass filters, g0, is the wavelet detail coefficient, cD1(k), and the wavelet approximation coefficient, cA1(k), is obtained using the convolved signal, x(k), and the decomposition low-pass filters, h0. cA1(k), is, thereafter, utilized as the input signal of the second level. After this wavelet decomposition algorithm is completed, the signal, x(k), is decomposed into a set of approximation coefficients and detail coefficients.

Wavelet Coefficients Energy Ratio (WCER) Method
As the wavelet coefficients of the signal correspond to different frequency bands of the target signal with different signal spectrum distribution, the wavelet coefficients of the target signals have different characteristics and can be utilized to extract spectrum features for classification purposes.
The difference on spectrums of different target signals make the energy of the signal component in different frequency bands discrepant, hence, the wavelet coefficients of acoustic and seismic signals of different targets vary. WCER, which can indicate the difference between energies of coefficients, can intuitively reflect the energy difference of signals in different frequency bands. WCER can be expressed as the wavelet detail coefficient energy ratio (WDcer) and wavelet approximation coefficient energy ratio (WAcer), which are defined as follows: where J is the desired wavelet transform depth, PD(j) represents the jth wavelet detail coefficient energy, and PA(j) represents the jth wavelet approximation coefficient energy. The formulas for calculating PD(j) and PA(j) are as follows: where N is the length of the wavelet coefficient, which is also the length of the signal sequence. After WDcer and WAcer are calculated, the feature of the target signal based on WCER can be obtained. The feature vector of the target signal is denoted as f.

Signal Feature Extraction
The feature vector should be extracted after a specified length of the signal sequence is acquired. Assuming the feature of the target signal sequence is extracted every K points, the target signal that is acquired is divided into N segments. The process of feature extraction based on WCER is shown in Figure 2. Considering the wavelet decomposition of the first layer as an example, the convolution result of the signal, x(k), and decomposition high-pass filters, g 0 , is the wavelet detail coefficient, cD 1 (k), and the wavelet approximation coefficient, cA 1 (k), is obtained using the convolved signal, x(k), and the decomposition low-pass filters, h 0 . cA 1 (k), is, thereafter, utilized as the input signal of the second level. After this wavelet decomposition algorithm is completed, the signal, x(k), is decomposed into a set of approximation coefficients and detail coefficients.

Wavelet Coefficients Energy Ratio (WCER) Method
As the wavelet coefficients of the signal correspond to different frequency bands of the target signal with different signal spectrum distribution, the wavelet coefficients of the target signals have different characteristics and can be utilized to extract spectrum features for classification purposes.
The difference on spectrums of different target signals make the energy of the signal component in different frequency bands discrepant, hence, the wavelet coefficients of acoustic and seismic signals of different targets vary. WCER, which can indicate the difference between energies of coefficients, can intuitively reflect the energy difference of signals in different frequency bands. WCER can be expressed as the wavelet detail coefficient energy ratio (WDcer) and wavelet approximation coefficient energy ratio (WAcer), which are defined as follows: where J is the desired wavelet transform depth, PD(j) represents the jth wavelet detail coefficient energy, and PA(j) represents the jth wavelet approximation coefficient energy. The formulas for calculating PD(j) and PA(j) are as follows: where N is the length of the wavelet coefficient, which is also the length of the signal sequence. After WDcer and WAcer are calculated, the feature of the target signal based on WCER can be obtained. The feature vector of the target signal is denoted as f. f = [WDcer(1), · · · , WDcer(J); WAcer(1), · · · , WAcer(J)]

Signal Feature Extraction
The feature vector should be extracted after a specified length of the signal sequence is acquired. Assuming the feature of the target signal sequence is extracted every K points, the target signal that is acquired is divided into N segments. The process of feature extraction based on WCER is shown in Figure 2.  Figure 2 shows the process of feature extraction based on WCER. J is the desired wavelet decomposition depth and K is the length of each segment of the signal. cDj(k) (k = 1, 2, …, K) represents the wavelet detail coefficient and cAj(k) represents the wavelet approximation coefficient of the jth level. F represents the feature matrix of the target signal. f = [W1, W2, …, Wm] represents the feature vector and fi is the feature vector of the ith segment of the target signal. Wm is the mth variables of the feature vector, f, and [w1m, w2m, …, wNm] is the observation sequence of the variable Wm.
If the sensors acquire acoustic and seismic signals of targets at a sampling rate of 4096 Hz and the signal generated by the target lasts 10 s, then the sensor node acquired 40,960 points sequence. With the K set to 512, then the acquired signal sequence is divided into 80 segments and every 512 points is taken as a segment for feature extracting using the WCER method. This means point 1 to point 512 is the first segment, point 513 to point 1024 is the second segment, and the entire data sequence is segmented in this way.
Subsequently, we utilize the feature extraction method, based on the WCER, to process the realworld acquired signal of vehicles in WSNs. The datasets were gathered from a real-world experiment, the third SensIT situational experiment [12]. In the experiment, 75 sensor nodes with acoustic, seismic, and infrared sensing capability were located at the Marine Corps Air Ground Combat Center. Testing runs were performed by driving different kinds of vehicles across the testing field. Four target vehicle classes were used: Assault amphibian vehicle (AAV), main battle tank, highmobility multipurpose wheeled vehicle, and dragon wagon (DW). The acoustic and seismic data of each run were recorded and the sampling rate of the signals was 4096 Hz. There were three paths in the experimental area for the vehicles to pass and a testing run was accomplished by driving the vehicle on the path. A series of numbers was utilized to indicate these runs, e.g., DW3, DW4, DW5.  Figure 2 shows the process of feature extraction based on WCER. J is the desired wavelet decomposition depth and K is the length of each segment of the signal. cD j (k) (k = 1, 2, . . . , K) represents the wavelet detail coefficient and cA j (k) represents the wavelet approximation coefficient of the jth level. F represents the feature matrix of the target signal. f = [W 1 , W 2 , . . . , W m ] represents the feature vector and f i is the feature vector of the ith segment of the target signal. W m is the mth variables of the feature vector, f, and [w 1m , w 2m , . . . , w Nm ] is the observation sequence of the variable W m .
If the sensors acquire acoustic and seismic signals of targets at a sampling rate of 4096 Hz and the signal generated by the target lasts 10 s, then the sensor node acquired 40,960 points sequence. With the K set to 512, then the acquired signal sequence is divided into 80 segments and every 512 points is taken as a segment for feature extracting using the WCER method. This means point 1 to point 512 is the first segment, point 513 to point 1024 is the second segment, and the entire data sequence is segmented in this way.
Subsequently, we utilize the feature extraction method, based on the WCER, to process the real-world acquired signal of vehicles in WSNs. The datasets were gathered from a real-world experiment, the third SensIT situational experiment [12]. In the experiment, 75 sensor nodes with acoustic, seismic, and infrared sensing capability were located at the Marine Corps Air Ground Combat Center. Testing runs were performed by driving different kinds of vehicles across the testing field. Four target vehicle classes were used: Assault amphibian vehicle (AAV), main battle tank, high-mobility multipurpose wheeled vehicle, and dragon wagon (DW). The acoustic and seismic data of each run were recorded and the sampling rate of the signals was 4096 Hz. There were three paths in the experimental area for the vehicles to pass and a testing run was accomplished by driving the vehicle on the path. A series of numbers was utilized to indicate these runs, e.g., DW3, DW4, DW5. The acoustic and seismic data of each run were recorded by the sensor nodes deployed at the side of the road. The details of the experiment are described in the paper mentioned above.
K, which is also the length of the segment of the target signal sequence, was set to 512 and the desired wavelet transform depth, J, was set to 8. In addition, a nearly orthogonal design of a bi-orthogonal filter bank, shown in Table 1, was utilized as the orthogonal filter bank to decompose the signal sequence. Table 1. Coefficients of quasi-orthogonal bi-orthogonal filters.
Subsequently, we utilize the feature extraction method, based on the WCER, to extract the acoustic and seismic features of AAV3 runs.
In the AAV3 run, the acoustic and seismic signals of AAV were sensed and acquired by 18 sensor nodes in the WSN monitoring system. All the acquired signal sequences were divided into 4676 segments at intervals of 512 points for feature extraction. Figure 3b shows the extracted features of the segmented acoustic signal sequence using the WCER method and Figure 3d shows the extracted seismic features. The horizontal axis represents the number of signal segments, which also could be considered as a set of observations of a certain feature variable, while the vertical axis represents the WCER of each segmented signal. The acoustic and seismic data of each run were recorded by the sensor nodes deployed at the side of the road. The details of the experiment are described in the paper mentioned above. K, which is also the length of the segment of the target signal sequence, was set to 512 and the desired wavelet transform depth, J, was set to 8. In addition, a nearly orthogonal design of a biorthogonal filter bank, shown in Table 1, was utilized as the orthogonal filter bank to decompose the signal sequence.  Subsequently, we utilize the feature extraction method, based on the WCER, to extract the acoustic and seismic features of AAV3 runs.
In the AAV3 run, the acoustic and seismic signals of AAV were sensed and acquired by 18 sensor nodes in the WSN monitoring system. All the acquired signal sequences were divided into 4676 segments at intervals of 512 points for feature extraction. Figure 3b shows the extracted features of the segmented acoustic signal sequence using the WCER method and Figure 3d shows the extracted seismic features. The horizontal axis represents the number of signal segments, which also could be considered as a set of observations of a certain feature variable, while the vertical axis represents the WCER of each segmented signal.

Feature Vector Simplification Using Hierarchical Cluster Method
The acoustic and seismic signal features of AAV3, shown in Figure 3, indicate that differences between parts of the WCER feature variables are evident and can reflect the target characteristic, whereas the differences between the remaining feature variables are insignificant. Simplifying the feature vector will remove the redundant variables in the feature vector, which can decrease the computational complexity of the subsequent target classification and effectively reduce the time consumption of the process of target classification. The hierarchical cluster method [21] and principal component analysis (PCA) method [22] are two effective approaches to reduce the dimensions of the feature matrix.

Hierarchical Cluster Method
The hierarchical cluster method can aggregate variables into several clusters based on their similarity. After determining representative variables from aggregated clusters to reduce the number of variables, the dimension of the feature vector will be reduced. The feature vector of the signal obtained using the WCER method is f = [W1, W2, …, Wm], m = 2J, and J is the desired wavelet decomposition depth. The observation matrix of the feature vector of the target signal is denoted as F, whose column vector is the data sequence of the variable in the feature vector, f. By analyzing the sample observation matrix, the main variables in the feature vector are selected to realize its dimensionality reduction.
The hierarchical cluster method utilizes the distance between variables in the feature vector, f, and between clusters in the process of clustering to quantify the similarities between variables and the clusters. First, a pair of the nearest feature variables is selected to be merged into a new class according to the calculated feature variable distance matrix, D, expressed in Equation (7), then the two closest clusters are merged into a class. This process is repeated until a preset number of clusters is achieved: where dist (Wu,Wv) represents the distance between the variable, Wu, and the variable, Wv. Wu is the uth variable in the feature vector and Wv is the vth variable. Figure 4 shows the distance between variables in the feature vector and between clusters in the clustering process. d (r,s) represents the distance between the cluster, r, and the cluster, s.

Feature Vector Simplification Using Hierarchical Cluster Method
The acoustic and seismic signal features of AAV3, shown in Figure 3, indicate that differences between parts of the WCER feature variables are evident and can reflect the target characteristic, whereas the differences between the remaining feature variables are insignificant. Simplifying the feature vector will remove the redundant variables in the feature vector, which can decrease the computational complexity of the subsequent target classification and effectively reduce the time consumption of the process of target classification. The hierarchical cluster method [21] and principal component analysis (PCA) method [22] are two effective approaches to reduce the dimensions of the feature matrix.

Hierarchical Cluster Method
The hierarchical cluster method can aggregate variables into several clusters based on their similarity. After determining representative variables from aggregated clusters to reduce the number of variables, the dimension of the feature vector will be reduced. The feature vector of the signal obtained using the WCER method is f = [W 1 , W 2 , . . . , W m ], m = 2J, and J is the desired wavelet decomposition depth. The observation matrix of the feature vector of the target signal is denoted as F, whose column vector is the data sequence of the variable in the feature vector, f. By analyzing the sample observation matrix, the main variables in the feature vector are selected to realize its dimensionality reduction.
The hierarchical cluster method utilizes the distance between variables in the feature vector, f, and between clusters in the process of clustering to quantify the similarities between variables and the clusters. First, a pair of the nearest feature variables is selected to be merged into a new class according to the calculated feature variable distance matrix, D, expressed in Equation (7), then the two closest clusters are merged into a class. This process is repeated until a preset number of clusters is achieved: where dist (W u ,W v ) represents the distance between the variable, W u , and the variable, W v . W u is the uth variable in the feature vector and W v is the vth variable.  Figure 4 shows the distance between variables in the feature vector and between clusters in the clustering process. d (r,s) represents the distance between the cluster, r, and the cluster, s. The distance between the variables, Wu and Wv, is calculated using their correlation coefficient and the calculation method is expressed in Equation (8).
where R (Wu,Wv) represents the correlation coefficient of Wu and Wv. C(Wu,Wv) is the covariance of Wu and Wv, and it is calculated as follows: After obtaining the distance matrix, we must also calculate the distance between the clusters in the clustering process. In this study, the unweighted pair grouping method with arithmetic mean (UPGMA) [23] was utilized to calculate the distance between clusters. UPGMA is a widely used bottom-up hierarchical clustering method that defines cluster similarity [24] in terms of the average pairwise distance between all the objects in two different clusters. If the two clusters in the clustering process are the cluster, r, and the cluster, s, the formula for calculating the distance between them utilizing the UPGMA is expressed in Equation (10).
where nr is the number of objects in cluster r and ns is the number of objects in cluster s. Wrp is the pth subject in cluster r and Wsq is the qth subject in cluster s.

PCA Method
The PCA method is another efficient approach to reduce the dimension of the feature vector. This approach reduces the dimensions of the feature by extracting several factors that contribute the most to signal differences in the observation matrix of the target feature vector.
The feature vector of the signal is f = [W1, W2, …, Wm] and the column vector, [wi1, wi2, …, wim], in the feature matrix, F, is the data sequence of the variable, Wi, in the feature vector, f. The variable should be normalized according to Equation (11) before performing principal component analysis. The distance between the variables, W u and W v , is calculated using their correlation coefficient and the calculation method is expressed in Equation (8).
where R (W u ,W v ) represents the correlation coefficient of W u and W v . C(W u ,W v ) is the covariance of W u and W v , and it is calculated as follows: After obtaining the distance matrix, we must also calculate the distance between the clusters in the clustering process. In this study, the unweighted pair grouping method with arithmetic mean (UPGMA) [23] was utilized to calculate the distance between clusters. UPGMA is a widely used bottom-up hierarchical clustering method that defines cluster similarity [24] in terms of the average pairwise distance between all the objects in two different clusters. If the two clusters in the clustering process are the cluster, r, and the cluster, s, the formula for calculating the distance between them utilizing the UPGMA is expressed in Equation (10).
where n r is the number of objects in cluster r and n s is the number of objects in cluster s. W rp is the pth subject in cluster r and W sq is the qth subject in cluster s.

PCA Method
The PCA method is another efficient approach to reduce the dimension of the feature vector. This approach reduces the dimensions of the feature by extracting several factors that contribute the most to signal differences in the observation matrix of the target feature vector. The feature vector of the signal is f = [W 1 , W 2 , . . . , W m ] and the column vector, [w i1 , w i2 , . . . , w im ], in the feature matrix, F, is the data sequence of the variable, W i , in the feature vector, f. The variable should be normalized according to Equation (11) before performing principal component analysis.
where µ i is the mean and S i is the standard deviation of the sequence of the variable, W i . After calculating the correlation coefficient of the variables in the feature vector, the correlation coefficient matrix can be achieved.
Then, all the eigenvalues, λ i (i = 1, 2, . . . , m), of the correlation coefficient matrix, C, and the corresponding eigenvectors, [u i1 , u i2 , . . . , u im ], are calculated. Then the eigenvectors consists of m new indicator variables shown as follows: · · · · · · · · · y m = u m1 · W 1 + u m2 · W 2 + · · · + u mm · W m (13) where y i is the ith principal component. The contribution rate of the principal component, which means how much the new indicator variable contributes to the difference of the target features, is calculated from its corresponding eigenvector: When the sum of the former P contribution rates of the of new indicator variables is approximately 1 (usually 0.9 or 0.95), the former P indicator variables, [y 1 , y 2 , . . . , y P ], are selected as the new features vector instead of the original m features.

Comprasion of Two Feature Simplification Methods
To show the performance of these two methods, we calculated the time consumption of the signal feature extraction using the methods for feature dimensionality reduction. We also calculated the classification accuracy of the features after reducing the dimensions using two methods. The acoustic feature data of AAV3 and AAV4 after dimension reduction were used to train the SVM classifier. Subsequently, we used the trained classifier to classify the dataset of the other 15 runs (AAV5-AAV11, DW5-DW12) after their acoustic feature data were simplified using the two methods. The classification accuracy is denoted as Ac. We evaluated the accuracy as follows: The classification accuracy of the ith dataset is denoted as Ac(i) (i = 1, 2, . . . , M) and the average classification accuracy Aca is: where M is the number of datasets, acquired from different targets, that have been classified. A comparison of the time needed and classification accuracy when using the hierarchical clustering and principal factor analysis methods to simplify features is shown in Table 2. fa_HC represents the acoustic features of the target after being simplified by hierarchical clustering and fa_PCA represents the acoustic features of the target after being simplified by principal factor analysis. It can be seen from Table 2 that after training the classifier, when using hierarchical clustering to reduce the dimensions of the feature, the feature extraction time is slightly less than when using principal factor analysis and the classification accuracy is higher. Therefore, the hierarchical clustering method was chosen for simplifying the feature vector.

Feature Vector Simplification Using Hierarchical Clustering
The acoustic and seismic signal features of AAV3 were considered as examples for simplifying the variables in the feature vectors using the hierarchical clustering method. Figure 5 shows the clustering results of the feature vectors of AAV3. Eight-level wavelet decomposition of the acoustic and seismic signals of AAV3 yields two feature vectors with 16 variables. The acoustic feature vector is denoted as fa and the seismic feature vector is denoted as fs. After clustering variables with a distance less than 1, the variables with large differences in the feature vectors are retained and the variables with small differences are aggregated into a cluster. when using principal factor analysis and the classification accuracy is higher. Therefore, the hierarchical clustering method was chosen for simplifying the feature vector.

Feature Vector Simplification Using Hierarchical Clustering
The acoustic and seismic signal features of AAV3 were considered as examples for simplifying the variables in the feature vectors using the hierarchical clustering method. Figure 5 shows the clustering results of the feature vectors of AAV3. Eight-level wavelet decomposition of the acoustic and seismic signals of AAV3 yields two feature vectors with 16 variables. The acoustic feature vector is denoted as fa and the seismic feature vector is denoted as fs. After clustering variables with a distance less than 1, the variables with large differences in the feature vectors are retained and the variables with small differences are aggregated into a cluster.
The 16 variables in the acoustic feature vector, (fa = [W 1 ,⋯,W 16 ]), were reduced to 9 after feature vector simplification. The variables, W1, W2, W3, and W4, were aggregated into a cluster and replaced with the variable, W4. The variable, W12, was utilized instead of the variables, W9, W10, W11, and W12, after they were aggregated into a cluster. Further, the variables, W15 and W16, were aggregated into a cluster and the variable, W15, was subsequently utilized instead.  (7)].
The results of using simplified features for target classification will be discussed in Section 5.1. The 16 variables in the acoustic feature vector, ( fa = [W 1 , · · · , W 16 ]), were reduced to 9 after feature vector simplification. The variables, W 1 , W 2 , W 3 , and W 4 , were aggregated into a cluster and replaced with the variable, W 4 . The variable, W 12 , was utilized instead of the variables, W 9 , W 10 , W 11 , and W 12 , after they were aggregated into a cluster. Further, the variables, W 15 and W 16 , were aggregated into a cluster and the variable, W 15 , was subsequently utilized instead.
The 16 variables in the acoustic feature vector, ( fs = [W 1 , · · · , W 16 ])), were reduced to 7. The variables, W 1 , W 2 , W 3 , and W 4 , were aggregated into a cluster and replaced with the variable, W 4 . The variable, W 14 , was utilized instead of the variables, W 9 , W 10 , W 11 , W 12 , W 13 , and W 14 , after they were aggregated into a cluster. Further, the variables, W 15 and W 16 , were aggregated into a cluster and the variable, W 15 , was subsequently utilized instead.  (7)].
The results of using simplified features for target classification will be discussed in Section 5.1.

Acoustic-Seismic Mixed Feature
After simplifying the features, we utilized the simplified acoustic and seismic features of the signal to classify the target.

Support Vector Machine Classifier
The commonly used classification methods are k-nearest neighbor (KNN) [25], decision tree (DT) [26], naive Bayes (NB) [27], and support vector machine (SVM). We used the acoustic feature data of AAV3 and AAV4 to train the classifiers and classify the acoustic feature data of the other 15 runs (AAV5-AAV11, DW5-DW12). Then, average classification accuracy and time needed were recorded to compare the performance of these methods in vehicle classification. Table 3 shows the average classification accuracy and average time needed for different classification methods. The results indicate that KNN has the highest accuracy, but the classification takes too much time. The time needed for the DT and NB methods is very short, but their classification accuracy is lower than that of the KNN and SVM. In summary, the SVM method has satisfactory classification accuracy and acceptable time needs. Therefore, the SVM algorithm was used as a classifier to validate the performance of the proposed WCER feature extraction method before and after feature simplification.
The SVM classification algorithm was performed by using the C support vector classifier (C-SVC) shared in LIBSVM [28], a commonly used SVM software library.
The principle of classification using SVM is to consider a multivariate feature as an independent point in a multidimensional space and, thereafter, utilize the training data to determine an optimal hyperplane to classify the independent points. Summarized mathematically, C-SVC is aimed at solving the following optimization problem: under the following constraints: where N t is the number of training vectors, f i is the sample feature vector for training, and y i is the class label corresponding to the feature vector. Function, φ(f i ), maps the feature vector, f i , to a high-dimensional space and R is the regularization parameter, which is set to 1 in this study. Usually, we solve the following dual problem: under the following constraints: where e = [1, . . . , 1] T is the unit vector. Q is an N t by N t positive semi-definite matrix. Q ij = y i y j K(f i f j ), is the kernel function. Moreover, the function for classifying a feature, f, is as follows: The label of the class is 1 or −1 for representing different kinds of targets, which indicates that y i ∈ {1, −1}. Furthermore, we utilized a radial basis function kernel [29], with the following format: where the parameter, γ, in the kernel function is set to 0.5.

Vehicle Classification Using SVM Classifier
We extracted features from the acoustic and seismic signal data collected from 18 sensor nodes in runs AAV3, AAV4, DW3, and DW4. These extracted features were used to train the SVM classifier. Subsequently, we used the trained classifier to classify the dataset of the other 15 runs (AAV5-AAV11, DW5-DW12) before and after simplifying the acoustic and seismic WCER features.
The classification results, listed in Table 4, show that the classification accuracy of AAV is high, but that of DW was unsatisfactory, when classifying the vehicles using simplified acoustic features. By contrast, the classification accuracy of AAV was low and that of DW was satisfactory when the simplified seismic features were used. Therefore, we mixed acoustic and seismic features of the target signal to achieve satisfactory classification accuracy for both AAV and DW. The acoustic-seismic mixed features of the target signal can be described by the vectors, fmix and fmix = [fanew; fsnew].  Figure 6 shows the acoustic-seismic mixed features of signals acquired from AAV3 ( Figure 6a) and DW3 (Figure 6b). The results of using acoustic-seismic mixed features for target classification are discussed in Section 5.2.

Performance of WCER Method
As there is no effective approach to directly evaluate the performance of the proposed WCER feature extraction method, we evaluated the method indirectly by comparing the performance of an SVM classifier using different features.

Effects of Feature Simplification and Mixed Features on Classification Accuracy
To study the performance when using simplified features for target classification, we compared the classification accuracy obtained using the features before and after simplification. The features extracted from runs AAV3, AAV4, DW3, and DW4 were utilized to train the SVM classifier before and after simplification. Subsequently, the trained classifier was used to classify the target using the features extracted from the dataset of the other 15 runs (AAV5-AAV11, DW5-DW12) before and after simplification. The set of parameters of the classifier is described in detail in Section 4. The classification accuracy obtained by using the acoustic-seismic mixed features was calculated similarly for studying the effect of feature mixing on vehicle classification. Table 5 shows the classification accuracy obtained by using different feature vectors based on the WCER method. fa (fs) indicates that the targets were classified using the feature extracted from the acoustic (seismic) signal of the target without simplification and fanew (fsnew) indicates that the targets were classified using the feature extracted from the acoustic (seismic) signal of the target after feature simplification. fmix represents the classification of vehicles using acoustic-seismic mixed features of the target signal.
To further clarify the classification accuracy obtained by using different extracted features, Table 4 summarizes the average classification accuracy.
We also calculated the time consumption of target classification using these features to compare the efficiency of using different features. Two types of time use were calculated to compare the efficiency: Time consumption of the feature extraction procedure and the time needed for subsequent target classification using the extracted features. To compare the time consumption of the feature extraction step, we utilized the feature extraction methods to process the dataset of 15 runs (AAV5-AAV11, DW5-DW12) and calculated the average time consumed by feature extraction in each run. Subsequently, the average time consumption of target classification using these feature vectors was recorded to study the effect of using different feature vectors on the time consumption of the target classification. The average classification accuracy and average time consumption when using different feature vectors are shown in Table 6. The results show that the WCER feature extraction method can effectively extract the target features from the target signals. When using the target acoustic and seismic features extracted using the WCER method to classify vehicles, the classification accuracy reached approximately 67%. Feature simplification can reduce the time consumption of feature extraction and classification, while having no effect on target classification accuracy. Compared with the use of feature vectors before simplification, the time consumption after feature simplification decreased slightly and the time consumption of target classification decreased by almost half. The average classification accuracy obtained by using simplified WCER features of the acoustic signal was 0.56% lower than that obtained by using unsimplified features, and the classification accuracy decreased by 0.38% after the features extracted from the seismic signal were simplified. This is mainly because the removed parts of the feature vectors in the simplification process were redundant and had no effect on the overall characteristics of the target signal. Using mixed features can effectively improve the target classification accuracy. Compared with the use of acoustic or seismic features, classification accuracy was improved by approximately 12% when the acoustic-seismic mixed feature of the target signal was used.

Comparison with FFT-Based Feature Extraction Method
The experimental results in Section 5.1 show that the WCER feature extraction method can effectively extract the characteristics of the target signal and that the feature mixing approach can improve classification accuracy. However, whether the performance of this feature extraction method is better than that of the existing methods is yet to be determined.
The FFT-based feature extraction method mentioned in Section 1 is a mature and commonly used feature extraction method. To extract the features of the target signal, the FFT of the acoustic signals is computed every 512 points. Subsequently, the averages of every two points of the first 100 FFT values that have been normalized are used as the extracted features of 50 dimensions for target classification. We used this feature extraction method to compare with the proposed hybrid feature extraction method.
The data of runs AAV3, AAV4, DW3, and DW4 were used for training and the features extracted from the dataset of the other 15 runs (AAV5-AAV11, DW5-DW12) were classified by the trained SVM classifier.
The average time consumptions of the FFT-based and the WCER feature extraction methods were calculated to compare the efficiency of these methods.
As shown in Tables 7 and 8, the average classification accuracy of the WCER method was 2.88% higher than that of the FFT-based method when the features extracted from the target acoustic signal were used. This is mainly because only the energy of the low-frequency band of the target signal is extracted by the FFT-based method and utilized for target classification. Compared with the FFT-based method, the proposed WCER feature extraction method utilizes all the signal spectrum energy characteristics to classify targets. We also observed that the time consumed by the proposed WCER method to extract features from the target acoustic signal is less than that of the FFT-based method because wavelet decomposition of the signal sequence takes less time than the FFT operation of the same signal sequence. Moreover, the number of variables in the signal feature extracted by the WCER method is less than the number of feature variables extracted by the FFT-based method, resulting in less time spent on subsequent target classification when using the proposed WCER method. Compared with the FFT-based feature extraction method, average classification accuracy is improved by 15.5% when using acoustic-seismic mixed features. At the same time, the time consumption of target feature extraction is increased significantly. This is because the increased number of calculations leads to more time spent as the WCER method requires extraction of not only the features of the target acoustic signal, but also of the seismic features of the same target when using the acoustic-seismic mixed features.

Conclusions
In this paper, we proposed a novel feature extraction method named WCER based on WT for vehicle classification in WSNs. The acoustic-seismic mixed feature extraction method is presented after the acoustic and seismic WCER features of the target signal were simplified using the hierarchical clustering method. The flow of the proposed method for extracting acoustic-seismic mixed features of the target is shown in Figure 7. Finally, we studied the target classification accuracy and time consumption of the proposed feature extraction method. The experiment results show that classification accuracy of the targets detected in WSN is effectively improved by using the acoustic-seismic mixed feature of the target.
Compared with the FFT-based feature extraction method, average classification accuracy is improved by 15.5% when using acoustic-seismic mixed features. At the same time, the time consumption of target feature extraction is increased significantly. This is because the increased number of calculations leads to more time spent as the WCER method requires extraction of not only the features of the target acoustic signal, but also of the seismic features of the same target when using the acoustic-seismic mixed features.

Conclusions
In this paper, we proposed a novel feature extraction method named WCER based on WT for vehicle classification in WSNs. The acoustic-seismic mixed feature extraction method is presented after the acoustic and seismic WCER features of the target signal were simplified using the hierarchical clustering method. The flow of the proposed method for extracting acoustic-seismic mixed features of the target is shown in Figure 7. Finally, we studied the target classification accuracy and time consumption of the proposed feature extraction method. The experiment results show that classification accuracy of the targets detected in WSN is effectively improved by using the acousticseismic mixed feature of the target. Another advantage of using the wavelet-based feature extraction method is that the wavelet coefficients can be processed using the thresholding method [30] for denoising before calculating the WCER of the target signal. Thus, the combination of the denoising and feature extraction methods can effectively reduce the time consumption of the entire signal processing of WSN from denoising to target classification. Another advantage of using the wavelet-based feature extraction method is that the wavelet coefficients can be processed using the thresholding method [30] for denoising before calculating the WCER of the target signal. Thus, the combination of the denoising and feature extraction methods can effectively reduce the time consumption of the entire signal processing of WSN from denoising to target classification.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript