Adaptive Multiclass Mahalanobis Taguchi System for Bearing Fault Diagnosis under Variable Conditions

Bearings are vital components in industrial machines. Diagnosing the fault of rolling element bearings and ensuring normal operation is essential. However, the faults of rolling element bearings under variable conditions and the adaptive feature selection has rarely been discussed until now. Thus, it is essential to develop a practicable method to put forward the disposal of the fault under variable conditions. Considering these issues, this paper uses the method based on the Mahalanobis Taguchi System (MTS), and overcomes two shortcomings of MTS: (1) MTS is an effective tool to classify faults and has strong robustness to operating conditions, but it can only handle binary classification problems, and this paper constructs the multiclass measurement scale to deal with multi-classification problems. (2) MTS can determine important features, but uses the hard threshold to select the features, and this paper selects the proper feature sequence instead of the threshold to overcome the lesser adaptivity of the threshold configuration for signal-to-noise gain. Hence, this method proposes a novel method named adaptive Multiclass Mahalanobis Taguchi system (aMMTS), in conjunction with variational mode decomposition (VMD) and singular value decomposition (SVD), and is employed to diagnose the faults under the variable conditions. Finally, this method is verified by using the signal data collected from Case Western Reserve University Bearing Data Center. The result shows that it is accurate for bearings fault diagnosis under variable conditions.


Introduction
Rolling element bearings have wide applications in industrial machines and are one of the most critical components. If faults occur in bearings, equipment could be damaged and disasters might happen consequently. Therefore, it is essential to monitor the health conditions of bearings. The analysis of vibration signals has been a hot research pot and used to detect faults of bearings. It is crucial to recognize faults occurring in bearings and avoid fatal breakdowns as early as possible. For decades, many researchers have conducted extensive research on fault diagnosis. At present, the fault diagnosis methods are divided into model-based methods and data-driven methods. The model-based methods generally build on the physics of the process, generating the residuals between the measure process variables and estimates [1], such as Hidden Markov Modeling (HMM) which less adaptivity of the threshold configuration for signal-to-noise gain. During the feature selection, the threshold is normally set as a constant, which might result in the overfitting problem. If the threshold value is too large, some critical features might be eliminated. On the contrary, if the threshold value is too low, some useless or harmful feature might be selected. Therefore, if the threshold value does not march the training data sufficiently, misclassifications emerge.
To overcome this drawback, this paper presents a novel method named adaptive Multiclass Mahalanobis Taguchi system (aMMTS) for bearing fault diagnosis. This method employs the MTS for multi-classification by considering different conditions as different benchmarks respectively. The results are based on the minimum MDs between the data and each benchmark data, and the label of the data is determined to be consistent with the label of the benchmark data whose MD is minimum. The method can be described briefly as follows: Firstly, after the Mahalanobis space (MS) is constructed by the two-level orthogonal array of the Taguchi method, aMMTS calculates the MDs from the data to the benchmark data and obtains features' signal-to-noise ratios (SNRs) and gain values by using two-level orthogonal array. Secondly, the features are selected adaptively by recalculating the MDs via rearranging the order of features' gain values by ascending and descending. Therefore, the proposed method is able to select the best classification result according to the adaptive chosen sequence of features' SNRs instead of a hard threshold. Here, the sequence of SNR is determined by a function, which selects several maximum or minimum features to calculated MDs. Finally, a set of features with the best results is selected as the final feature vector. In this method, two different sets of training samples are employed to calculate the SNRs respectively and obtain the final feature vectors respectively. By the aforementioned improvement, the proposed aMMTS is capable to overcome the drawback of the conventional MTS and prevent the over-fitting problem. Therefore, the aMMTS is insensitive to the operation conditions and can be employed for bearing fault diagnosis.
Moreover, this method is combined with variational mode decomposition (VMD) [29] and singular value decomposition (SVD) to diagnose the faults. VMD is an entirely non-recursive algorithm, and is used to decompose the signal. It has been proven that due to the characteristics of nonlinear vibration in the bearings, VMD is more efficient than empirical mode decomposition (EMD) and Fourier transform (FT) under variable condition. SVD is used to extract the features. Therefore, VMD and SVD are employed in this paper.
The rest of this article is organized as follows: Section 2 introduces the algorithms involved in this paper. Section 3 illustrates the experiments to validate the proposed method. Section 4 is the conclusions.

Methodology
In this paper, the main steps of fault diagnosis are signal decomposition, feature extraction and fault detection. The detailed process and method are as follows: Step1: The VMD in conjunction with wavelet denoising is employed to eliminate the noises and decompose the raw signals; Step2: Extracting features from the decomposed signals by SVD; Step3: The proposed aMMTS is employed for the fault diagnosis. The steps of this method are shown in Figure 1.

VMD
After the wavelet denoising is used to estimate the noise of the raw signal, this paper employs VMD to decompose non-stationary signals. VMD can decompose a signal into different simple intrinsic mode functions, whose frequency center and bandwidth are band-limited and determined by iterative searching for the optimal solution of the variational model. The constrained formula is given as: where µ k is the sub-signals, ω k represents the center frequency of sub-modes. The optimal solution can be solved as the minimization problem, which could be addressed by introducing a quadratic penalty and Lagrangian multipliers [30]: α denotes the balancing parameter of the constraint.

VMD
After the wavelet denoising is used to estimate the noise of the raw signal, this paper employs VMD to decompose non-stationary signals. VMD can decompose a signal into different simple intrinsic mode functions, whose frequency center and bandwidth are band-limited and determined by iterative searching for the optimal solution of the variational model. The constrained formula is given as: where k μ is the sub-signals, k ω represents the center frequency of sub-modes. The optimal solution can be solved as the minimization problem, which could be addressed by introducing a quadratic penalty and Lagrangian multipliers [30]: α denotes the balancing parameter of the constraint. All sub-signals µ k are updated for all ω ≥ 0 as follow: µ 1 k ,ω 1 k andλ 1 are initialized to all zeroes. All center frequency of sub-modes ω k are updated as follow: End forλ Until:

SVD
After the signals are decomposed into several modes by VMD, the features are extracted from the modes by SVD, and can be constructed as a feature matrix. SVD is a powerful tool for feature extraction in linear algebra. According to SVD, the matrix could be decomposed as follow: where X represents a m × n matrix. There are two orthogonal matrices: matrix U(m × m) and V(n × n), and a singular diagonal matrix ω(ω ij = 0, i = j and ω 11 ≥ ω 22 ≥ · · · ≥ 0), the diagonal element ω 11 , ω 22 , · · · , ω mm is the singular value of X. U is called left singular vector, and the columns of U. V is called right singular vector. The columns of U and V especially are orthogonal to each other, and are base vector. To obtain more intrinsic information in the matrix, the singular vectors are selected. As a consequence, SVD is employed to decompose the eigenmatrix, and obtained the singular value vectors (ω 11 , ω 22 , · · · , ω mm ).

Mahalanobis-Taguchi System
Mahalanobis-Taguchi System is a pattern recognition method integrated by the MD, orthogonal table and other tools such as SNR that are proposed by Taguchi [31], who introduces the experimental design of the field SNR to pattern recognition, which can reduce the dimensions of data, and use the orthogonal table to construct the MS. The MSs are used to calculate the MDs of the experimental data, and the valid features are distinguished by the SNR. Then, the MSs are recalculated by using the valid features. Finally, the results are obtained. The calculation of the MD is described as follows:

Mahalanobis Distance
The MD is a method of using normal data to normalize the fault data to compute the average distance between points and groups using normal data, the calculation formula of MD is as follows: MD j represents the MD of the jth sample, k represents the number of the feature, x ij represents the ith feature's value of the of the jth sample x i represents the mean of ith feature, s i represents the Standard deviation of ith feature, C −1 represents the inverse matrix of the correlation coefficient matrix.

Taguchi Method
In the Mahalanobis-Taguchi System, the MD measures the deviation of the test value from the normal value. The Taguchi method is able to select the features which have a larger contribution to identifying bearing faults, and then use the selected valid features to calculate the MD; the method is as below: Orthogonal Array Selecting the appropriate two-level orthogonal array, and then the k-original features obtained by VMD and SVD, are arranged into each column of the orthogonal array. In the orthogonal array, "1" indicates that the feature is selected, "2" indicates that the feature is not selected, and a MS is generated according to each row of the orthogonal array.

SNR and Its Gain
The main function of signal noise ratio is to select a valid feature, the calculation formula of generating SNR η i in the i line based on the orthogonal array.
represents the number of training samples.
η i represents the recognition effects of the characteristic feature, the valid feature is selected by comparing the mean of SNR of each characteristic feature at two levels. The formula is as follow: j = 1, 2 represents two levels, i represents the number of rows in the MS, η 1 represents that in the level of '1', the recognition effects to identify abnormal conditions of using this feature. η 2 represents that in the level of '2', the recognition effects to identify abnormal conditions of not using this feature.
If ∆η j represents the SNR gain, R = {∆η j η i > 0 that indicates that this feature is a valid feature, if ∆η j < 0 that indicates that this feature is not a valid feature.

Adaptive Multiclass MTS
After SNR gain is calculated, to overcome the drawback of the conventional MTS during the threshold selection, this paper presents the adaptive multiclass MTS. There are several following improvements: (1) Solving the multiple-classification problem. Selecting samples from each kind as the benchmark data. Then, the distances between other data and each benchmark data are calculated by MTS. Therefore, the label of the benchmark data with the minimum distance is selected as the label of the training data.
(2) Selecting the features adaptively. The feature sequence is selected several times, and the best fault recognition is the one which is minus MDs between it and benchmark. It solves the error problem caused by hard threshold selection.
(3) Avoiding the overfitting problem. Since the SNR gains are calculated by training samples, different training samples are employed to calculate the MDs and identify bearing faults, and the difference validation samples are set to validate the identified result and recalculate MDs.
This method is shown in Figure 2. This adaptive multiclass MTS can be described as follow: First, the data are labeled, the multi-class MSs are constructed, MDs between MSs are calculated and SNR gains are obtained. This step is shown as Figure 3; m is the number of samples, n is the number of features, and j is the number of the label. The training data are divided into three parts, and one of them is named as Benchmark. The data A j and C j represents one kind of data respectively (such as normal, fault of inner race, outer race, rolling element), and data B includes all kinds of data.
(2) Selecting the features adaptively. The feature sequence is selected several times, and the best fault recognition is the one which is minus MDs between it and benchmark. It solves the error problem caused by hard threshold selection.
(3) Avoiding the overfitting problem. Since the SNR gains are calculated by training samples, different training samples are employed to calculate the MDs and identify bearing faults, and the difference validation samples are set to validate the identified result and recalculate MDs.
This method is shown in Figure 2. This adaptive multiclass MTS can be described as follow: First, the data are labeled, the multi-class MSs are constructed, MDs between MSs are calculated and SNR gains are obtained. This step is shown as Figure 3; m is the number of samples, n is the number of features, and j is the number of the label. The training data are divided into three parts, and one of them is named as Benchmark. The data j A and j C represents one kind of data respectively (such as normal, fault of inner race, outer race, rolling element), and data B includes all kinds of data.    Second, new sequences of feature parameters are generated by the sequence of features' SNR gains in ascending and descending order, then two collections are obtained from the above sequences based on the ascending or descending order, with the ascending and descending collection as follows: , ..., , This step is shown in Figure 4: Second, new sequences of feature parameters are generated by the sequence of features' SNR gains in ascending and descending order, then two collections are obtained from the above sequences based on the ascending or descending order, with the ascending and descending collection as follows: This step is shown in Figure 4: , ..., , This step is shown in Figure 4:    Third, and the positions of feature's SNR gain are the same as the positions of corresponding features in the sequences, the features that are used to recalculate the MDs are selected by the corresponding feature's SNR gain. This step is shown in Figure 5. The MDs that are between each kind of A and Ci are calculated.      Figure 5. The third step of aMMTS.
Forth, the proper sequence of SNR gain is chosen by a function, which is according to the minimum MD. If two labels of data corresponding to minimum MD are the same, the recognition result is right, and accumulate the number of right recognition results. S is the recognition result.
This step is shown in Figure 6.  Forth, the proper sequence of SNR gain is chosen by a function, which is according to the minimum MD. If two labels of data corresponding to minimum MD are the same, the recognition result is right, and accumulate the number of right recognition results. S is the recognition result. This step is shown in Figure 6. Forth, the proper sequence of SNR gain is chosen by a function, which is according to the minimum MD. If two labels of data corresponding to minimum MD are the same, the recognition result is right, and accumulate the number of right recognition results. S is the recognition result.
This step is shown in Figure 6. Figure 6. The forth step of aMMTS.
Finally, the recognition result is verified if there is a unique optimal recognition result, and the SNR gain's position in the sequence is the feature's order. If there are several optimal recognition results, repeat step 3 to recalculate the result. Afterwards, a set of sequences with the best recognition effect is determined as the feature sequence, which solves the self-adaptation problem of thresholds. Finally, the recognition result is verified if there is a unique optimal recognition result, and the SNR gain's position in the sequence is the feature's order. If there are several optimal recognition results, repeat step 3 to recalculate the result. Afterwards, a set of sequences with the best recognition effect is determined as the feature sequence, which solves the self-adaptation problem of thresholds.

Results
In this paper, the experimental data are from Case Western Reserve University Bearing Data Center. This experiment involved three different faults that occurred on three components: inner race, outer race and rolling element. The vibration signals were acquired under four different speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min, and the sampling frequency was set to 12 kHz. To demonstrate the aMMTS, this study randomly selected the data in the dataset under the defect of 0.07 inches. The number of samples are shown in Table 1. There were 2192 samples; 548 for inner race, 548 for outer race, 548 for rolling element and 548 for normal. The data are divided into three parts: training data, validation data and test data. In order to avoid the overfitting caused by the training data, training samples were used to construct MS, generate the SNR gain and calculate MDs by using the sequences of SNR gains, and were divided into three parts, with one of the parts set as benchmark group. To avoid the over-fitting problem, group A was used to construct MS and generate the SNR gain, and group B were used to calculate the MDs with the sequences of SNR gain and identify faults.
Validation samples were used to verify the recognition result if there exists the same minimum MDs, and the sequence was selected according to the best result.
Test samples were used to validate the proposed method.

Signal Decomposition by Using VMD and Wavelet Denosing
Above all, this study employed wavelet denoising to remove the noise from the raw signals. First, the Daubechies 5 (db5) was used to decompose the signal, and obtained the wavelet decomposition vector and the bookkeeping vector. Second, thresholds wavelet coefficient was calculated by setting the detail vector which would be compressed as [1][2][3] and the vector which is the corresponding percentages of lower coefficients as [100,90,80], and using the wavelet decomposition vector and the bookkeeping vector. Lastly, the thresholds, Daubechies 5 (db5) and decomposed signals were used to reconstruct the denoising signal. Then the VMD was used to decompose the signal, and was needed to give the preset IMF component number K and penalty parameter α which constrained the moderate bandwidth. The value of α toke the default value 1024, the value of K was 8. An example is shown in Figure 7.  IMFs are as shown in Figure 8.

Feature Extraction by Using SVD
SVD was used to analyze the IMFs. After the signal decomposition, the IMF matrix was decomposed by SVD, and obtained singular value vectors. The singular value vectors were considered as features and formed the feature matrix. Then, the feature matrix was used to diagnose IMFs are as shown in Figure 8.  IMFs are as shown in Figure 8.

Feature Extraction by Using SVD
SVD was used to analyze the IMFs. After the signal decomposition, the IMF matrix was decomposed by SVD, and obtained singular value vectors. The singular value vectors were considered as features and formed the feature matrix. Then, the feature matrix was used to diagnose

Feature Extraction by Using SVD
SVD was used to analyze the IMFs. After the signal decomposition, the IMF matrix was decomposed by SVD, and obtained singular value vectors. The singular value vectors were considered as features and formed the feature matrix. Then, the feature matrix was used to diagnose the fault by aMMTS. To avoid the over-fitting problem, the features were divided into training samples, validation samples and test samples. The features of the above IMFs of those were shown in Table 2. The features obtained by SVD are shown in Table 3.

Fault Diagnosis Using aMMTS
After the feature extraction, the aMMTS was used to identify and diagnose fault modes. The steps of aMMTS are as follow: Firstly, the MS of training and benchmark were constructed, the eight-factor and two-level orthogonal array is shown in Table 4, and the MS based on Table 2 is shown in Table 5;  Secondly, the MD was calculated, and SNR gain was also obtained by benchmark samples and training samples. The SNR gain is shown in Table 6; Thirdly, the MDs between the benchmark samples and validation samples were calculated by using the ascending and descending order of SNR; Fourthly, the validation samples were used to verify the correctness of feature selection which existed more than one smallest MD; Fifthly, the best sequence was chosen and set as the sequence of features. Lastly, the best sequence was used to identify the test samples. Took the benchmark is outer race as the example shown in Figure 9.    Finally, the test sample was used to test the result of the method, and the benchmarks were inner race, rolling element, outer race and normal. The results are shown in Table 7 and the MDs between benchmark and test sample are shown in Figure 10.

Inner Race
Outer Race Rolling Element Normal Total Finally, the test sample was used to test the result of the method, and the benchmarks were inner race, rolling element, outer race and normal. The results are shown in Table 7 and the MDs between benchmark and test sample are shown in Figure 10. between benchmark (Inner race) and testing data; (b) The MDs between benchmark (Outer race) and testing data; (a) The MDs between benchmark (Rolling element) and testing data; (a) The MDs between benchmark (Normal) and testing data.
As shown in Table 8 and Figure 10, this method accurately classified and diagnosed the fault of the bearing by using the different benchmarks. The recognition results of normal and outer race reached 100%. However, it is not accurate enough to diagnose the fault of inner race and rolling element. However, in the normal and the fault of inner race, it is effective in industrial application.

Discussion
Rolling element bearings are one of the most frequently used components in rotating machineries. This paper presents the method based on the wavelet denoising VMD-SVD-aMMTS to diagnose the fault of bearings under the variable conditions. Firstly, VMD is used to decompose the signal. Secondly, SVD is used to extract the feature. The adaptive aMMTS uses the feature sequences and multi-benchmarks to overcome the drawback of MTS for adaptive feature selection, multi-classification and over-fitting. The experimental result shows that the method could accurately diagnose faults effectively.
However, in the actual situation, there is an imbalance between fault data and normal data. In this method, aMMTS lacks research on the imbalanced study. The absence of faulty data may create a new problem, such over-fitting. Therefore, additional experiments under imbalanced data should be done to improve the method.
Author Contributions: N.W. collected and analyzed the data, made charts and diagrams, conceived and performed the experiments and wrote the paper; Z.W. and L.J. conceived the structure, provided guidance and modified the manuscript; X.C. analyzed the data and contributed analysis tools. Y.Q. provided guidance. Y.Z. revised the reviews.   As shown in Table 7 and Figure 10, this method accurately classified and diagnosed the fault of the bearing by using the different benchmarks. The recognition results of normal and outer race reached 100%. However, it is not accurate enough to diagnose the fault of inner race and rolling element. However, in the normal and the fault of inner race, it is effective in industrial application.

Discussion
Rolling element bearings are one of the most frequently used components in rotating machineries. This paper presents the method based on the wavelet denoising VMD-SVD-aMMTS to diagnose the fault of bearings under the variable conditions. Firstly, VMD is used to decompose the signal. Secondly, SVD is used to extract the feature. The adaptive aMMTS uses the feature sequences and multi-benchmarks to overcome the drawback of MTS for adaptive feature selection, multi-classification and over-fitting. The experimental result shows that the method could accurately diagnose faults effectively.
However, in the actual situation, there is an imbalance between fault data and normal data. In this method, aMMTS lacks research on the imbalanced study. The absence of faulty data may create a new problem, such over-fitting. Therefore, additional experiments under imbalanced data should be done to improve the method.
Author Contributions: N.W. collected and analyzed the data, made charts and diagrams, conceived and performed the experiments and wrote the paper; Z.W. and L.J. conceived the structure, provided guidance and modified the manuscript; X.C. analyzed the data and contributed analysis tools. Y.Q. provided guidance. Y.Z. revised the reviews.

Conflicts of Interest:
The authors declare no conflict of interest.