Time-Shift Multi-scale Weighted Permutation Entropy and GWO-SVM Based Fault Diagnosis Approach for Rolling Bearing

Multi-scale permutation entropy (MPE) is an effective nonlinear dynamic approach for complexity measurement of time series and it has been widely applied to fault feature representation of rolling bearing. However, the coarse-grained time series in MPE becomes shorter and shorter with the increase of the scale factor, which causes an imprecise estimation of permutation entropy. In addition, the different amplitudes of the same patterns are not considered by the permutation entropy used in MPE. To solve these issues, the time-shift multi-scale weighted permutation entropy (TSMWPE) approach is proposed in this paper. The inadequate process of coarse-grained time series in MPE was optimized by using a time shift time series and the process of probability calculation that cannot fully consider the symbol mode is solved by introducing a weighting operation. The parameter selections of TSMWPE were studied by analyzing two different noise signals. The stability and robustness were also studied by comparing TSMWPE with TSMPE and MPE. Based on the advantages of TSMWPE, an intelligent fault diagnosis method for rolling bearing is proposed by combining it with gray wolf optimized support vector machine for fault classification. The proposed fault diagnostic method was applied to two cases of experimental data analysis of rolling bearing and the results show that it can diagnose the fault category and severity of rolling bearing accurately and the corresponding recognition rate is higher than the rate provided by the existing comparison methods.


Introduction
Rolling bearing is one of the most important parts of rotating machinery and the one most prone to failure [1]. Due to the complexity of mechanical conditions, once the rolling bearing beings working with failures, it is very likely to cause unpredictable security accidents and economic losses. Thus, it is particularly important to implement condition monitoring and fault diagnosis for rolling bearing [2,3]. Generally, the vibration signals of rolling bearing with failures represent non-linear and non-stationary characteristics and traditional linear or stationary time domain analysis methods have certain limitations when dealing with these types of vibration signals [4][5][6][7]. As a non-linear dynamic analysis tool, entropy plays an important role in measuring complexity and randomness of time series stemming from a nonlinear dynamical system.
In recent years, lots of nonlinear dynamic methods have been proposed, including approximate entropy (APE) [8], sample entropy (SE) [9], permutation entropy (PE) [10], fuzzy entropy (FE) [11]. The literature [12] indicates that PE at times differed the similar failure modes of complex gearbox vibration signals when it was applied to wind turbine gearbox fault diagnosis. In literature [13], PE was applied to experiment data analysis of rolling bearings and the result concluded that PE is a randomness and dynamic behavior detection method of vibration signals and that it can effectively identify the bearing fault types and degrees. However, the entropy-based indicators mentioned above are generally limited to single-scale analysis of time series and the information of time series on other scales is ignored, which results in a serious loss of information. For this reason, multi-scale sample entropy (MSE) [14], multi-scale fuzzy entropy (MFE) [15] and multi-scale permutation (MPE) were developed by the scholars to measure the complexity of time series in different scales. In literature [16], MSE was applied to rotor fault diagnosis and the experiment results show that MSE contains more information than single-scale sample entropy. In literature [17], MFE was studied and then used to extract the fault features of rolling bearing. However, MSE and MFE also have some intrinsic weaknesses. MPE is an effective method for evaluating the random mutation behavior of time series [18], whose strong anti-noise ability and low computational cost make it stand out. However, due to the limitations of coarse-grained time series, which become shorter and shorter when the scale factor increases, much information of time series is lost. Based on the idea of time-shift coarse-grained time series [19], in this paper, time-shift multi-scale permutation entropy (TSMPE) is developed to enhance the robust performance of MPE, together with the time-shift multi-scale weighted permutation entropy (TSMWPE) based on weighted permutation entropy [20,21]. TSMWPE fully considers the probability calculation of the same modes which have different amplitudes of the state vector in symbol sequence after a reconstruction matrix of coarse-grained time series. TSMWPE optimizes the inadequate coarse-grained time series used in MPE and the process of probability calculation that cannot fully consider the symbol modes. The parameters of TSMWPE are determined by analyzing Gaussian white noise (WGN) and pink noise (1/f Noise) data [22] and stability analysis is performed by comparing TSMWPE with TSMPE and MPE. Finally, an intelligent fault diagnosis approach for rolling bearings is proposed based on TSMWPE and gray wolf optimized support vector machine (GWO-SVM) [23] and then applied to two kinds of experimental data analysis of rolling bearing.
The rest of this paper is structured as follows. In Section 2, the algorithms of MPE and WPE are reviewed and then TSMWPE is developed. The selection of parameters in TSMWPE is studied and the stability analysis among TSMWPE, TSMPE and MPE are made in Section 3. In Section 4, the TSMWPE and GWO-SVM based fault diagnosis approach for rolling bearing is proposed and applied to two sets of experiment data analysis to verify its effectiveness by comparing it with the existing other methods. Finally, Section 5 concludes this paper.

MPE method
MPE is proposed to measure the complexity and randomness of time series in multiple scales and its steps can be described as follows.
(1) For a given maximum scale factor τ max , the coarse-grained time series can be constructed from the original time series x(i), i = 1, 2, . . . , N by using formula (1) y (τ) where j represents the length of coarse-grained time series. (2) For the scale factor τ ≥ 2, permutation entropy of each coarse-grained time series is calculated.
Finally, the entropy values of all scales are obtained and seen as a function of the scale factor. The above definition of "coarse-grained" indicates that the length of coarse-grained time series will become short when the scale factor increases, and thus the coarse-grained time series will inevitably cause the important information loss of the original time series.

Algorithm of TSMPE
In this part, TSMPE is firstly developed to solve the problem of MPE mentioned above and its steps can be given as follows.
(2) For scale factor τ ≥ 2, the PEs of each time-shift coarse-grained time series are calculated.
The obtained different PEs of each time-shift coarse-grained time series are averaged by where m is the embedding dimension and λ is delay time.
TSMPE optimizes the insufficient time series coarse granulation process of the MPE algorithm, which makes the time-shift coarse-grained time series have little dependence on the length of the original time series. However, TSMPE does not optimize the process of phase space reconstruction, which undoubtedly makes TSMPE not fully consider the influences of different amplitudes of the same pattern.

Algorithm of TSMWPE
In this subsection, TSMWPE algorithm is proposed to enhance the performance of MPE and TSMPE and its steps can be briefly described as follows.

1.
For the original time series x(i), i = 1, 2, · · · , N , the process of time-shift coarse-grained time series y k,β can be obtained by Equation (2).

3.
Each row in this matrix is regarded as a state vector and each state vector is mapped into m! possible sorting mode π r , f ω (π r ) represents the frequency of the r-th permutation in the time series.
where S is the number of possible patterns in the same motif. If the state vector can be mapped into the sort mode π r , f (π r (s)) = 1 will be obtained, otherwise f (π r (s)) = 0. ω r is denoted as variance of each vector. It represents the weight value for each same pattern, which has different amplitudes.

4.
The weighted relative probability of each state vector p ω (π r ) can be concluded by 6.
Finally, τ H k w are obtained and final TSMWPE of original time series is described as Theoretically, TSMWPE relies little on the length of raw signal by applying the idea of time-shift coarse-grained time series. Meanwhile, after the phase space is reconstructed, the probability of symbol sequence using the same pattern but different amplitudes is fully considered through the idea of weighting. The flowchart of the proposed method can be simply described as Figure 1.
Theoretically, TSMWPE relies little on the length of raw signal by applying the idea of time-shift coarse-grained time series. Meanwhile, after the phase space is reconstructed, the probability of symbol sequence using the same pattern but different amplitudes is fully considered through the idea of weighting. The flowchart of the proposed method can be simply described as Figure 1.

Selection of Parameter m
The computation of TSMWPE algorithm is affected by the parameters m, λ and N. If m is too small, fewer patterns can be generated by elements contained in the state vector, which will have less significance of the reconstruction matrix. Next, m = [4,5,6,7], λ = 1, and N = 3000 are selected to study the influence of m on TSMWPE. Without the loss of generality, the white Gaussian noise

Selection of Parameter m
The computation of TSMWPE algorithm is affected by the parameters m, λ and N. If m is too small, fewer patterns can be generated by elements contained in the state vector, which will have less significance of the reconstruction matrix. Next, m =  It has a small range of variations from 0.95 to 1. The result indicates the smaller the value of parameter m, the fewer types of patterns are produced by each state vector. Even though the occurrence number of state vectors with the same pattern but different amplitudes is relatively small, the occurrence frequency is relatively larger, which causes an increase in the weighted relative probability of each pattern. Eventually the calculated entropy value will inevitably become larger. When m = 6 or 7, there is a large ranges of amplitude changes, which indicates that it can more fully reflect the change process of entropy value on different scale factors relatively to when m = 4 or 5. However, when m = 7, it can be found that there is a great consistency between TSMPE and MPE, and it is difficult to distinguish between MPE and TSMPE. Based on the analysis above, the parameter m = 6 is chosen in TSMWPE, TSMPE and MPE. In addition, it is worth noting that the curve of TSMWPE is significantly more stable than that of MPE, which indicates that TSMWPE exhibits good stability in feature extraction. It has a small range of variations from 0.95 to 1. The result indicates the smaller the value of parameter m, the fewer types of patterns are produced by each state vector. Even though the occurrence number of state vectors with the same pattern but different amplitudes is relatively small, the occurrence frequency is relatively larger, which causes an increase in the weighted relative probability of each pattern. Eventually the calculated entropy value will inevitably become larger. When m = 6 or 7, there is a large ranges of amplitude changes, which indicates that it can more fully reflect the change process of entropy value on different scale factors relatively to when m = 4 or 5. However, when m = 7, it can be found that there is a great consistency between TSMPE and MPE, and it is difficult to distinguish between MPE and TSMPE. Based on the analysis above, the parameter m = 6 is chosen in TSMWPE, TSMPE and MPE. In addition, it is worth noting that the curve of TSMWPE is

Selection of Parameter λ
Based on the analysis above, the effect of m on TSMWPE, TSMPE and MPE curves is studied and m is set as 6 in the following part. The delay time λ = [1,2,3,4] and the 1/f noise with a length of 3000 is chosen to analyze the effect of different delay time on TSMWPE. The TSMWPE, TSMPE and MPE of 1/f noise signals under different time delays are computed and shown in Figure 4. From Figure 4 the linear trend of TSMWPE is consistent with that of TSMPE and MPE. The entropy values are very close for different time delays. Therefore, the time delay generally has a very slight effect on TSMWPE and thus the delay time λ is set as 1. In addition, it can be found that the TSMWPE and TSMPE curves are smoother than MPE, because the process of time-shift coarse-grained time series saves more useful time information and can effectively improve the process of a single coarse-grained time series.

Selection of Parameter N
In this subsection, the 1/f noise with lengths of N = 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500 are set with m = 6 and time delay 1 = λ to determine the effect of parameter N on TSMWPE. The TSMWPE of 1/f noise with different lengths are shown in Figure 5. It can be found from Figure 5 that the TSMWPE curve of 1/f noise shows a slight fluctuation when N ≤ 2500 and when N ≥ 2500, it appears to be quite stable and has a nearly parallel trend. Therefore, N ≥ 3000 is generally selected in the subsequent step.

Stability Analysis
Based on the analysis above, we set m = 6 and 1 = λ . 20 sets of WGN and 1/f noise with a length of 3000 are selected to verify the superiority of TSMWPE in feature extraction. The mean standard deviations of TSMWPE, TSMPE and MPE under the same parameter are shown in Figure 6. It can be seen from the Figure 6 that for WGN and 1/f noise, the standard deviations of TSMWPE and TSMPE are much smaller than that of MPE, which indicates that TSMPE and TSMWPE are more

Selection of Parameter N
In this subsection, the 1/f noise with lengths of N = 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500 are set with m = 6 and time delay λ = 1 to determine the effect of parameter N on TSMWPE. The TSMWPE of 1/f noise with different lengths are shown in Figure 5. It can be found from Figure 5 that the TSMWPE curve of 1/f noise shows a slight fluctuation when N ≤ 2500 and when N ≥ 2500, it appears to be quite stable and has a nearly parallel trend. Therefore, N ≥ 3000 is generally selected in the subsequent step.

Selection of Parameter N
In this subsection, the 1/f noise with lengths of N = 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500 are set with m = 6 and time delay 1 = λ to determine the effect of parameter N on TSMWPE. The TSMWPE of 1/f noise with different lengths are shown in Figure 5. It can be found from Figure 5 that the TSMWPE curve of 1/f noise shows a slight fluctuation when N ≤ 2500 and when N ≥ 2500, it appears to be quite stable and has a nearly parallel trend. Therefore, N ≥ 3000 is generally selected in the subsequent step.

Stability Analysis
Based on the analysis above, we set m = 6 and 1 = λ . 20 sets of WGN and 1/f noise with a length of 3000 are selected to verify the superiority of TSMWPE in feature extraction. The mean standard deviations of TSMWPE, TSMPE and MPE under the same parameter are shown in Figure 6. It can be seen from the Figure 6 that for WGN and 1/f noise, the standard deviations of TSMWPE and TSMPE are much smaller than that of MPE, which indicates that TSMPE and TSMWPE are more

Stability Analysis
Based on the analysis above, we set m = 6 and λ = 1. 20 sets of WGN and 1/f noise with a length of 3000 are selected to verify the superiority of TSMWPE in feature extraction. The mean standard deviations of TSMWPE, TSMPE and MPE under the same parameter are shown in Figure 6. It can be seen from the Figure 6 that for WGN and 1/f noise, the standard deviations of TSMWPE and TSMPE

GWO-SVM
Generally, the parameters in original SVM were set by the users' experience. Once the kernel function has been selected, it will not be changed, This inevitably makes the classification effect of SVM be limited. Therefore, it is necessary to study the selection of kernel functions and setting of parameters [26]. In this paper, the penalty factor c and the parameter g of radial basis kernel function of SVM are optimized by the gray wolf optimization algorithm.
The GWO algorithm was proposed by Seyedai et al [27] in 2014 inspired by the division of labor between wolves and collaborative hunting of food. It is a new swarm intelligence algorithm that simulates the hierarchy in wolves and the hunting behavior of wolves. Followed by wolf species B, wolf species C and wolf species E, the highest ranking wolf is the wolf species A, which is located at the top of the food chain and is responsible for leadership, decision making and other behaviors. Although wolf species B and wolf species C are not the highest-ranking wolf species, they can

GWO-SVM
Generally, the parameters in original SVM were set by the users' experience. Once the kernel function has been selected, it will not be changed, This inevitably makes the classification effect of SVM be limited. Therefore, it is necessary to study the selection of kernel functions and setting of parameters [26]. In this paper, the penalty factor c and the parameter g of radial basis kernel function of SVM are optimized by the gray wolf optimization algorithm.
The GWO algorithm was proposed by Seyedai et al [27] in 2014 inspired by the division of labor between wolves and collaborative hunting of food. It is a new swarm intelligence algorithm that simulates the hierarchy in wolves and the hunting behavior of wolves. Followed by wolf Entropy 2019, 21, 621 9 of 21 species B, wolf species C and wolf species E, the highest ranking wolf is the wolf species A, which is located at the top of the food chain and is responsible for leadership, decision making and other behaviors. Although wolf species B and wolf species C are not the highest-ranking wolf species, they can succeed wolf species A and become a new leader when the wolf species A loses its leadership. Wolf species E, the lowest level of wolf, is responsible for balancing the relationship between the inside of the population.
The GWO algorithm treats each wolf as a potential solution, where wolf species A is the first optimal solution, while wolf species B and C are respectively the second optimal solution and the sub-optimal solution. The GWO algorithm is an iterative optimization process in which the positions of wolves A, B and C are constantly updated. The wolves update the distance and position through the formulas (8) and (9) to complete the search for the prey.
where D is the distance between gray wolf and the prey, t is the number of iterations; X p indicates the position of the prey, X indicates the position of the gray wolf and its initial position coordinates are defined as (c, g). A and C represent the coefficients where A = 2a × r 2 − a , C = 2r 1 . When |A| > 1 it represents a global search, that is, the gray wolf group expands the search range to find a better prey. In contrast, while |A| ≤ 1, it represents a local search, and the gray wolf group will narrow the encirclement and search for the prey nearby. a = 2 − 2t tmax and the convergence factor a linearly decreases from 2 to 0 as the number of iterations increases, and t max is the maximum number of iterations. r 1 and r 2 are respectively a random value of [0,1].
When the gray wolf judges the position of the prey, the head wolves A lead the wolves B and wolves C to surround the prey, because wolves A, B, and C are the closest to the prey, so the position of the three wolves gradually approaches the prey, they are described as follows.
where X a represents current location of wolves A, X b represents current location of wolves B, X c represents current location of wolves C. C 1 , C 2 and C 3 are random variables. X(t) is the current location of the wolf species. The step lengths and directions of wolves E to wolves A, B and C are defined by formulas (13)- (15) and the final position of wolves E are defined by formulas (16).
When the wolves are hunting, wolves A, wolves B and wolves C have different fitness values for the prey. By calculating different fitness values, the first optimal solution, the optimal solution and the sub-optimal solution are obtained, and the current position information is saved. Meanwhile, the wolves judge the moving direction of the prey and approach the prey to complete the hunting based on the three sets of positional information. After that, the positions of the gray wolves are updated again until the first optimal solution is provided. The position coordinate value corresponding to the first optimal solution is defined as (best c, best g). The flowchart of GWO-SVM is shown in Figure 7. The GWO-SVM can optimize the penalty factor c and the parameter g in kernel function of original SVM and ensure that the best c and the best g can be found, which is more superior to SVM in theory.
Entropy 2019, 20, x FOR PEER REVIEW 10 of 20 The GWO-SVM can optimize the penalty factor c and the parameter g in the kernel function of the original SVM and ensure that the best c and the best g can be found, which is superior to SVM in theory. The best c and the best g change as different models change.

The Proposed Fault Diagnosis Approach
Due to the advantages of TSMWPE and GWO-SVM, the GWO-SVM based multi-class classifier is constructed to achieve an intelligent fault diagnosis of rolling bearing. The steps of the proposed methods for rolling bearing can be described as follows.
(   The GWO-SVM can optimize the penalty factor c and the parameter g in the kernel function of the original SVM and ensure that the best c and the best g can be found, which is superior to SVM in theory. The best c and the best g change as different models change.

The Proposed Fault Diagnosis Approach
Due to the advantages of TSMWPE and GWO-SVM, the GWO-SVM based multi-class classifier is constructed to achieve an intelligent fault diagnosis of rolling bearing. The steps of the proposed methods for rolling bearing can be described as follows.

Case 1
In this subsection, the experimental data of Case Western Reserve University [28] are used to verify the proposed method in fault diagnosis of rolling bearing. As shown in Figure 9, the test rig consists of a fan end bearing, a drive end bearing and a torque transducer. The type of tested rolling bearing is 6205-2RS JEM SKF deep groove ball bearing; here single point faults are seeded to the rolling bearings through electric discharge machining technology. In the test, the rotary speed is 1, 730 r/min, load of rolling bearing is 2205 W and sampling frequency is 12 kHz. The data with fault diameters 0.5334 mm and 0.1778 mm are applied in the following part. The vibration signals are collected from normal (Norm) and inner race (IR) Ball Element (BE) and Outer Race (OR) with local single point pitting and they are successively denoted in Table 1. Experimental data is collected by the acceleration sensor in which the binary counting method is adopted. Each class has 20 samples with the length of 4096 points and the waveforms of vibration signal of rolling bearings are shown in Figure 10.

Case 1
In this subsection, the experimental data of Case Western Reserve University [28] are used to verify the proposed method in fault diagnosis of rolling bearing. As shown in Figure 9, the test rig consists of a fan end bearing, a drive end bearing and a torque transducer. The type of tested rolling bearing is 6205-2RS JEM SKF deep groove ball bearing; here single point faults are seeded to the rolling bearings through electric discharge machining technology. In the test, the rotary speed is 1, 730 r/min, load of rolling bearing is 2205 W and sampling frequency is 12 kHz. The data with fault diameters 0.5334 mm and 0.1778 mm are applied in the following part. The vibration signals are collected from normal (Norm) and inner race (IR) Ball Element (BE) and Outer Race (OR) with local single point pitting and they are successively denoted in Table 1. Experimental data is collected by the acceleration sensor in which the binary counting method is adopted. Each class has 20 samples with the length of 4096 points and the waveforms of vibration signal of rolling bearings are shown in Figure 10.

Case 1
In this subsection, the experimental data of Case Western Reserve University [28] are used to verify the proposed method in fault diagnosis of rolling bearing. As shown in Figure 9, the test rig consists of a fan end bearing, a drive end bearing and a torque transducer. The type of tested rolling bearing is 6205-2RS JEM SKF deep groove ball bearing; here single point faults are seeded to the rolling bearings through electric discharge machining technology. In the test, the rotary speed is 1, 730 r/min, load of rolling bearing is 2205 W and sampling frequency is 12 kHz. The data with fault diameters 0.5334 mm and 0.1778 mm are applied in the following part. The vibration signals are collected from normal (Norm) and inner race (IR) Ball Element (BE) and Outer Race (OR) with local single point pitting and they are successively denoted in Table 1. Experimental data is collected by the acceleration sensor in which the binary counting method is adopted. Each class has 20 samples with the length of 4096 points and the waveforms of vibration signal of rolling bearings are shown in Figure 10.      TSMPE and TSMWPE are very similar, but they are different from that of MPE, especially for BE1 and OR1. Third, TSMWPE is more stable than MPE under all states of rolling bearings especially for OR2. It is very difficult to judge the superiority of TSMWPE by observing the curve of TSMWPE, TSMPE and MPE. The fault identification accuracy will be compared by combining TSMWPE, TSMPE, MPE with GWO-SVM for fault feature extraction and classification. The seven states of rolling bearing are marked as 1 to 7. Among the 20 samples of each class, 10 samples are randomly selected from all ones and seen as training data, while the remaining 10 are used for testing.  Tables 2-4. It can be seen from the Tables 2-4 that when the number of used features is different, the corresponding best c and best g are different. That indicates that according to the status of feature input, the best c and the best g will make corresponding changes to optimizing the Performance of SVM. The corresponding identification accuracy for different number of features is shown in Figure 12. It can be seen from the Figure 12 that when the single feature, i.e. only WPE is used, the recognition accuracy of TSMWPE and GWO-SVM based fault diagnosis method of rolling bearing is 92.8%. The recognition accuracy  Tables 2-4 that when the number of used features is different, the corresponding best c and best g are different.
That indicates that according to the status of feature input, the best c and the best g will make corresponding changes to optimizing the Performance of SVM. The corresponding identification accuracy for different number of features is shown in Figure 12. It can be seen from the Figure 12 that when the single feature, i.e. only WPE is used, the recognition accuracy of TSMWPE and GWO-SVM based fault diagnosis method of rolling bearing is 92.8%. The recognition accuracy of proposed method will maintain at 100% when the number of inputting features is larger than one. The recognition accuracy of MPE and GWO-SVM based fault diagnosis method are correspondingly 84.3% and 98.6% when the single feature and the first three features are used. When the first, the first two and the first three TSMPE features are input into the trained GWO-SVM classifier, the identification accuracy of GWO-SVM classifier are 85.7%, 92.9% and 97.1%. Also, the original un-optimized SVM is used for comparison to verify the necessity and superiority of GWO-SVM, where the kernel function used in SVM is polynomial function. It can be seen from Figure 12 that the identification accuracy of TSMWPE and SVM based fault diagnosis method is 85.7% and 98.6% when the single feature and the first two features are used and always remains at 100% after there are more than three inputting features. However, by observing Figure 12, it can be found that for the equal number of inputting features (less than five), the highest fault identifying accuracy are generally obtained by the proposed method rather than other methods. Therefore, we set the number of inputting number ranging from 5 to 10 for a high and fast diagnosis. And the above analysis also indicates that TSMWPE is an effective method for distinguishing the fault categories and degrees of rolling bearings.  Table 3. The best c and the best g in TSMPE and GWO-SVM based fault diagnosis method for case 1.  generally obtained by the proposed method rather than other methods. Therefore, we set the number of inputting number ranging from 5 to 10 for a high and fast diagnosis. And the above analysis also indicates that TSMWPE is an effective method for distinguishing the fault categories and degrees of rolling bearings.

Number of Used
. Figure 12. Comparison of identification accuracy for different number of features.

Case 2
The experimental data of rolling bearing used in this subsection were provided by Soochow University [29,30] to further verify the effectiveness of the proposed method. The test bearing is 6205-2RS deep groove ball bearing and the faulty rolling bearings are machined by a metal electric engraving machine to set a local fault. The spindle speed is 900 r/min and sampling frequency is 10 kHz. The experiment has six different fault classes and locations of rolling bearings, which are listed in Table 5. The test rig of rolling bearing is shown in Figure 13. The operating system includes plum coupling, driving motor, testing bearing, normal bearing, acceleration sensor, buffer device, dynamometer and loading device. Each class of rolling bearing has 28 samples with a length of 4096 points and the vibration signal waveforms of rolling bearing are depicted in Figure 14. The experimental data of rolling bearing used in this subsection were provided by Soochow University [29,30] to further verify the effectiveness of the proposed method. The test bearing is 6205-2RS deep groove ball bearing and the faulty rolling bearings are machined by a metal electric engraving machine to set a local fault. The spindle speed is 900 r/min and sampling frequency is 10 kHz. The experiment has six different fault classes and locations of rolling bearings, which are listed in Table 5. The test rig of rolling bearing is shown in Figure 13. The operating system includes plum coupling, driving motor, testing bearing, normal bearing, acceleration sensor, buffer device, dynamometer and loading device. Each class of rolling bearing has 28 samples with a length of 4096 points and the vibration signal waveforms of rolling bearing are depicted in Figure 14.  Figure 13. The rolling bearing test rig of Soochow university for case 2. Figure 13. The rolling bearing test rig of Soochow university for case 2.
Loading device Dynamometer Figure 13. The rolling bearing test rig of Soochow university for case 2.   Figure 15a-c. First, it can be obviously obtained from Figure 15 that the standard deviation of MPE is larger than that of TSMPE and TSMWPE, especially for OR2. Second, the TSMPEs are much denser than TSMWPEs and MPEs, while TSMWPEs are more scattered than TSMPEs and MPEs. The above analysis indicates that the TSMWPE based feature extraction method has irreplaceable superiority to MPE and TSMPE and is more stable and robust than TSMPE and MPE. First, it can be obviously obtained from Figure 15 that the standard deviation of MPE is larger than that of TSMPE and TSMWPE, especially for OR2. Second, the TSMPEs are much denser than TSMWPEs and MPEs, while TSMWPEs are more scattered than TSMPEs and MPEs. The above analysis indicates that the TSMWPE based feature extraction method has irreplaceable superiority to MPE and TSMPE and is more stable and robust than TSMPE and MPE.  Next, 10 samples of each class rolling bearing are randomly chosen from 28 samples for training and the left 18 are used for testing. Therefore, the fault features (with dimensions 60 × 20) can be obtained and employed to train the GWO-SVM based multi-classifier and the left fault feature sets (with dimensions 108 × 20) are inputting to the trained multi-classifier for testing. The identification accuracy of the proposed method with different numbers of inputting features used are given in Figure 16, together with that of the TSMPE and MPE based methods, where the optimized parameters in GWO-SVM for the three methods are shown in Tables 6-8. It can be seen from Figure 16 that for the TSMWPE and GWO-SVM based fault diagnosis method, the recognition accuracy when the single feature is considered is 93.5% and it remains at 100% when the number of features is larger than 2. In fact, the identification accuracy of the proposed method is higher than that of MPE and TSMPE based methods for different numbers of features.
Like the above process, the identification accuracy of the TSMWPE, TSMPE and MPE based fault extraction method by combing SVM for classification is given in Figure 16. It can be observed from Figure 16 that for the first eight features, the identification accuracy of the TSMWPE and SVM based method gradually increases from 79.6% to 89.8% then remains at 89.8% from the eighth features to the fourteenth features and at 90.7% from the fourteenth features to the twentieth feature. The identification accuracies of the fault diagnosis method based on TSMPE and SVM are stable at 83.3% when the number of features is larger than 4. The identification accuracy of the MPE and SVM based fault diagnosis method varies from 82.4% to 84.3%. The identification accuracy of GWO-SVM based fault classification is higher than 90%, while that of SVM based multi-classifier is lower than 90% and this indicates the superiority of GWO-SVM to SVM. By observing Figure 16 carefully, it can be found that the GSOSVM is superior to SVM and the proposed TSMWPE and GWO-SVM based fault diagnosis method has higher fault identifying rates than other comparative methods. Generally, we set the number of input number ranging from 5 to 10 for a high diagnosis effect. Therefore, the results above demonstrate the superiority of TSMWPE to TSMPE and MPE in feature extraction, together with that of GWO-SVM to original SVM.      The GWO-SVM in the proposed method was also replaced by original SVM for comparison. Like the above process, the identification accuracy of the TSMWPE, TSMPE and MPE based fault extraction method by combing SVM for classification is given in Figure 16. It can be observed from Figure 16 that for the first eight features, the identification accuracy of the TSMWPE and SVM based method gradually increases from 79.6% to 89.8% then remains at 89.8% from the eighth features to the fourteenth features and at 90.7% from the fourteenth features to the twentieth feature. The identification accuracies of the fault diagnosis method based on TSMPE and SVM are stable at 83.3% when the number of features is larger than 4. The identification accuracy of the MPE and SVM based fault diagnosis method varies from 82.4% to 84.3%. The identification accuracy of GWO-SVM based fault classification is higher than 90%, while that of SVM based multi-classifier is lower than 90% and this indicates the superiority of GWO-SVM to SVM. By observing Figure 16 carefully, it can be found that the GSOSVM is superior to SVM and the proposed TSMWPE and GWO-SVM based fault diagnosis method has higher fault identifying rates than other comparative methods. Generally, we set the number of input number ranging from 5 to 10 for a high diagnosis effect. Therefore, the results above demonstrate the superiority of TSMWPE to TSMPE and MPE in feature extraction, together with that of GWO-SVM to original SVM.

Conclusions
In this paper, the TSMWPE algorithm was proposed to measure the complexity and irregularity of time series, which can effectively optimize the traditional coarse-grained time series and fully consider the same symbol modes with different amplitudes, in which the weighted relative probability of each pattern is calculated. Also, the superiority of TSMWPE to MPE and TSMPE was further verified by two simulation analyses. Based on TSMWPE and GWO-SVM, a new fault diagnosis method for rolling bearing was proposed and applied to two experimental data case analyses of experiment data of rolling bearing. The proposed fault feature extraction method of rolling bearing was compared with MPE and TSMPE based fault feature extraction one and the analysis results validated that TSMWPE shows a better performance than MPE and TSMPE, and the TSMWPE and GWO-SVM based fault diagnosis method has a higher recognition accuracy than the TSMPE and GWO-SVM based method, together with the MPE and GWO-SVM based method. Also, the GWO-SVM for fault classification method was compared with the original SVM to verify the effectiveness of the proposed method. Additionally, the number of inputting features were discussed and recommended in the paper. In future work, the TSMWPE algorithm will be further studied and applied to machine condition monitoring.