The Prediction of Human Abdominal Adiposity Based on the Combination of a Particle Swarm Algorithm and Support Vector Machine

Background: Abdominal adiposity is an important risk factor of chronic cardiovascular diseases, thus the prediction of abdominal adiposity and obesity can reduce the risks of contracting such diseases. However, the current prediction models display low accuracy and high sample size dependence. The purpose of this study is to put forward a new prediction method based on an improved support vector machine (SVM) to solve these problems. Methods: A total of 200 individuals participated in this study and were further divided into a modeling group and a test group. Their physiological parameters (height, weight, age, the four parameters of abdominal impedance and body fat mass) were measured using the body composition tester (the universal INBODY measurement device) based on BIA. Intelligent algorithms were used in the modeling group to build predictive models and the test group was used in model performance evaluation. Firstly, the optimal boundary C and parameter gamma were optimized by the particle swarm algorithm. We then developed an algorithm to classify human abdominal adiposity according to the parameter setup of the SVM algorithm and constructed the prediction model using this algorithm. Finally, we designed experiments to compare the performances of the proposed method and the other methods. Results: There are different abdominal obesity prediction models in the 1 KHz and 250 KHz frequency bands. The experimental data demonstrates that for the frequency band of 250 KHz, the proposed method can reduce the false classification rate by 10.7%, 15%, and 33% in relation to the sole SVM algorithm, the regression model, and the waistline measurement model, respectively. For the frequency band of 1 KHz, the proposed model is still more accurate. (4) Conclusions: The proposed method effectively improves the prediction accuracy and reduces the sample size dependence of the algorithm, which can provide a reference for abdominal obesity.


Introduction
As the number of fat people increases, obesity has become a worldwide epidemic. The number of patients with obesity has exceeded the number of patients who have contracted infectious diseases or who suffer from undernutrition. It constitutes a severe menace to human health [1], as overweight or obesity can easily cause high blood pressure [2]. Unfortunately, the populations of patients with obesity are increasing every year in many countries. It is well known that the main manifestation of obesity is abdominal obesity which is an important factor leading to chronic diseases [3]. Especially, BIA based methods depend on specific samples and are difficult to solve the problem of small sample size, whose prediction accuracy is unsatisfactory.
In this paper, we put forward a method of abdominal adiposity prediction based on BIA and improved support vector machine (iSVM), in order to improve prediction accuracy and solve the problem that the model is affected by the sample size. The main contributions include: (1) The application of particle swarm algorithm (PSA) to optimize the parameters of SVM, which obtains the improved SVM (iSVM); (2) An abdominal adiposity classification model is developed using iSVM to process the data obtained from BIA; (3) The simulation experiments were designed to evaluate the prediction accuracy and effectiveness of abdominal adiposity classification in two frequency bands.

Subjects
The study was approved by the Ethics Review Board of Lingnan Normal University. Ninety eight (98) men and 102 women were recruited for the study via posted notices and word-of-mouth and informed consent was obtained. The sample consisted of 200 healthy Chinese who were instructed to fast for 4 h and avoid heavy exercise, alcohol, or caffeine for 10 h prior to the visit. Participants have three clinical characteristics: normal, overweight and obese. They were measured human physiological parameters, including BMI, weight, body fat mass, and human electrical impedance, on a human body composition tester based on BIA technology.

Selection of the Key Characteristics of Human Abdominal Adiposity for the Prediction Model
The first step of constructing the prediction model is to select the key characteristics of human abdominal adiposity. We used the body subdivision eight-segment impedance model [21] to obtain the four parameters of abdominal impedance R 1 ∼ R 4 . Let T = (O, F, C) to be the dataset of body composition. We classified the important body composition parameters, such as weight, height, age, sex, abdominal impedance, into the first kind of characteristics. Let the inverse (1/R i ), the square (R i 2 ), the product (R i R j ) of the abdominal segmental impedances to be the second kind of characteristics. We included R 1 , R 2 , R 3 , R 4 , A, H, W from the first kind of characteristics and 1/R 1 , 1/R 2 , 1/R 3 , 1/R 4 , from the second kind of characteristics in a set called the original characteristic parameters, denoted by F = f 1 , f 2 , · · ·, f m . We chose Body Fat Mass (BFM) to be the body composition index. We selected the key characteristics by using a human characteristics selection algorithm combining filtering and clustering [22]. We first used the Filter characteristics selection algorithm to remove those characteristics irrelevant to the body composition (BFM). We then used the M-Chameleon clustering method to remove the redundant characteristics. Finally, the characteristics most suitable to predicting the abdominal adiposity were obtained. The whole process is illustrated by Figure 1. limited testing space. BIA, due to the merits of non-invasiveness, non-radiation, convenience, and low costs, has gradually been applied to researches on human abdominal adiposity. However, most of the current BIA based methods depend on specific samples and are difficult to solve the problem of small sample size, whose prediction accuracy is unsatisfactory.
In this paper, we put forward a method of abdominal adiposity prediction based on BIA and improved support vector machine (iSVM), in order to improve prediction accuracy and solve the problem that the model is affected by the sample size. The main contributions include: (1) The application of particle swarm algorithm (PSA) to optimize the parameters of SVM, which obtains the improved SVM (iSVM); (2) An abdominal adiposity classification model is developed using iSVM to process the data obtained from BIA; (3) The simulation experiments were designed to evaluate the prediction accuracy and effectiveness of abdominal adiposity classification in two frequency bands.

Subjects
The study was approved by the Ethics Review Board of Lingnan Normal University. Ninety eight (98) men and 102 women were recruited for the study via posted notices and word-of-mouth and informed consent was obtained. The sample consisted of 200 healthy Chinese who were instructed to fast for 4 h and avoid heavy exercise, alcohol, or caffeine for 10 h prior to the visit. Participants have three clinical characteristics: normal, overweight and obese. They were measured human physiological parameters, including BMI, weight, body fat mass, and human electrical impedance, on a human body composition tester based on BIA technology.

Selection of the Key Characteristics of Human Abdominal Adiposity for the Prediction Model
The first step of constructing the prediction model is to select the key characteristics of human abdominal adiposity. We used the body subdivision eight-segment impedance model [21] to obtain the four parameters of abdominal impedance We R 、A、 H、 W from the first kind of characteristics and R 、 from the second kind of characteristics in a set called the original characteristic parameters, denoted by . We chose Body Fat Mass (BFM) to be the body composition index. We selected the key characteristics by using a human characteristics selection algorithm combining filtering and clustering [22]. We first used the Filter characteristics selection algorithm to remove those characteristics irrelevant to the body composition (BFM). We then used the M-Chameleon clustering method to remove the redundant characteristics. Finally, the characteristics most suitable to predicting the abdominal adiposity were obtained. The whole process is illustrated by Figure 1.  As mentioned above, the data were included in the set T = (O, F, C), where the sampling data were in the data samples set O = {o 1 , o 2 , · · ·, o n }, the set of the selected characteristics were in the set F = f 1 , f 2 , · · ·, f m , and the human body composition classes were in the set C = {c 1 , c 2 , · · ·, c n }. Because BFM is often used to evaluate adiposity in the abdomen, the BFM body composition index C was used as an input. By characteristic filtering, we deleted those characteristics that are irreverent to the body composition index C. We thus obtained the initial set of characteristics F = f 1 , f 2 , · · ·, f h and the initial dataset T = (O, F , C). We then used the clustering algorithm to pre-process the data. Each cluster is updated by removing those distant characteristics. In the final clusters, we selected those characteristics whose relative interconnection and relative proximity with the center are either smaller or equal to 90%. The selected characteristics were included in the best candidate characteristics set X = f 1 , f 2 , · · ·, f l .

A iSVM Based Algorithm Classifying Abdominal Adiposity
SVM is a classifier with sparsity and robustness which uses hinge loss function to calculate empirical risk and adds regularization terms to the solution system to optimize structural risk and its decision boundary is the maximum margin hyperplane solved for the learning sample. More importantly, SVM can perform non-linear classification by the kernel method, which is one of the common kernel learning methods. Therefore, the classification model is based on the SVM using RBF as the kernel function in this paper. The range of the boundary parameter C depends on the range of tolerable error, which is the mean of the sampling error and the structural risk. The range of the kernel parameter depends on the range of the training samples. At present, the globally optimal kernel parameter is obtained by cross-validation (CV), which requires a lot of time to achieve high prediction accuracy. We thus decided to optimize the SVM parameters by the PSA.

Optimization of the SVM Parameters by the Particle Swarm Algorithm
Particle swarm algorithm is a random search algorithm based on swarm cooperation, which is developed by simulating the foraging behavior of bird swarms. The PSA reflects social sharing of information. The to-be-optimized parameters are called "particles," which constitute a population X = (X 1 , · · · , X n ). As searching starts in a D-dimensional space, the position of the i-th particle is denoted by X i = (X i1 , · · · , X iD ) T . Every particle can remember its best position thus far: gbest = (g 1 , · · · , g D ). For the i-th particle, the best position along its trajectory is denoted by pbest i = (p i1 , · · · , p iD ). To determine the movement of a particle in the D-dimensional space, we determine the particle's direction by adding a position and velocity to the particle. The velocity of the i-th particle is represented by V i = (V i1 , · · · , V iD ), which iterates throughout the optimization according to: where i = 1, 2, · · · , m; d = 1, 2, · · · , D; k is the number of iterations; V k id represents the d-th dimension component of the velocity vector of the particle i at the k-th iteration. Here c 1 , c 2 represent the accelerations used to adjust the particle's largest learning step; r 1 , r 2 are two random numbers in the range of [0,1] to enhance randomness of the search. Here ω is a non-negative number representing the inertia weight, which is used to adjust the domain of searching and to balance the tendency between local optimization and global optimization. It is iteratively updated according to: where ω start is the initial inertia weight, ω end is the final inertia weight, T max is the total number of iterations. As V k id is updated, the position X k id is updated according to:

The Algorithm
In the training of iSVM, every parameter of the training samples are taken from the selected characteristic parameters in Section 2.2. The steps are as follows (see also Figure 2).

The Algorithm
In the training of iSVM, every parameter of the training samples are taken from the selected characteristic parameters in Section 2.2. The steps are as follows (see also Figure 2). Step 1: Import and then normalize the data. Mix the data sampled from the obesity, overweight, and normal populations. From the mixed data randomly pick 100 data to form the training set and another 100 to form the test set.
Step 2: Initialize the particle swarm. Let the number of particles be 100, the same as the number Step 3: Model training in the space of the 100 samples.
Step 4: Check whether or not the desired accuracy has been achieved or the maximal number of iterations has been reached. If not, go to Step 5. If yes, find the optimized boundary parameter C and gamma . Choose a practical integer value for C in the range [1,4]; one practical value for gamma in the range [1,20]. Search for the optimal value of the parameter gamma in the particle swarm with the best speed. Finally the test set is used for the testing.
Step 5: Update the position and velocity of the particles. Update . gbest and pbest Construct a new optimal SVM model. Then go to Step 4. Step 1: Import and then normalize the data. Mix the data sampled from the obesity, overweight, and normal populations. From the mixed data randomly pick 100 data to form the training set and another 100 to form the test set.
Step 3: Model training in the space of the 100 samples.
Step 4: Check whether or not the desired accuracy has been achieved or the maximal number of iterations has been reached. If not, go to Step 5. If yes, find the optimized boundary parameter C and gamma. Choose a practical integer value for C in the range [1,4]; one practical value for gamma in the range [1,20]. Search for the optimal value of the parameter gamma in the particle swarm with the best speed. Finally the test set is used for the testing.
Step 5: Update the position and velocity of the particles. Update gbest and pbest. Construct a new optimal SVM model. Then go to Step 4.

Results
The experimental data consisted of the body composition data and the characteristic parameters of subjects of different sex, height, weight and age groups. For the two frequency bands 1 KHz and 250 KHz, we selected in total 200 individuals from the normal, obesity (BMI ≥ 28), overweight (BMI ≥ 24) groups. The BMI classification was based on the Chinese standard. From the 200 total, we randomly selected 100 to construct the training set and the remaining 100 constituted the test set. We constructed the classification model by letting the optimal C and gamma be 2 and 8, respectively. The simulation program was written by the R language. Table 1 lists the characteristics of the training set and the test set, which were screened in the two frequency bands 1 KHz and 250 KHz. For each of the two frequency band (1 KHz and 250 KHz), we modeled the selected characteristics by using the iSVM. We then fed the 100 test samples into the model for the classification. The simulation results are presented in Figures 3 and 4, where the green, black, and red colors indicate the normal, obesity, and overweight subjects, respectively.

Accuracy of the Prediction
To quantify the prediction accuracy, the actual classification of the obesity, overweight, and normal weight test data and the corresponding predicted classification are listed in Table 2 for a

Accuracy of the Prediction
To quantify the prediction accuracy, the actual classification of the obesity, overweight, and normal weight test data and the corresponding predicted classification are listed in Table 2 for a direct comparison. In Figures 3 and 4, an X or O represents a single sample. The X's together naturally partition the classification plane. They are thus called the support vectors. The O's represent the other samples. From the figures one sees that the iSVM model can well distinguish the three kinds of data according to the colors. If there were only two characteristic parameters, then the classification could be done in a two-dimensional space. However, the 1 KHz and 250 KHz samples are multi-dimensional and they have 10 and 9 dimensional characteristic parameters, respectively. Therefore, the classified samples have different dimensionality.

Accuracy of the Prediction
To quantify the prediction accuracy, the actual classification of the obesity, overweight, and normal weight test data and the corresponding predicted classification are listed in Table 2 for a direct comparison. To compare the accuracy of the sole SVM versus the iSVM model, we classified the prediction data several times, with the data summarized in Table 3. Table 3. Comparison of the classification accuracy in the two frequency bands. One sees that for the 250 KHz frequency band, there exists considerable classification error by using the sole SVM model, which necessitates PSA to obtain more optimized boundary parameter C. In other words, the iSVM can significantly reduce the mistakenly classified samples.

Model Comparison
Waistline measurement and regression analysis are the two conventional methods of obesity measurement. To verify effectiveness of the present method, we applied all the three methods to the test sets, analyzed the results, and compared the three methods. The results are described as follows.
Model 1 is based on the subjects' waistline measurement. The obesity status inferred from the measurement data is then compared with the medical measurement standard. The three methods' time consumptions have some difference, because they depend on the level of operational skills. They are thus irrelevant to the present research and are not considered in the following.   From the tables one sees that the RMS error of the prediction by the iSVM is apparently smaller than that by the traditional regression model and the classification error of the iSVM method is apparently smaller than the simple waistline measurement method. The time consumption of iSVM is a little more, due to the computation and training that were necessary to improve the accuracy.

Discussion
In order to obtain more effective prediction than traditional methods such as waist measurement and BMI, image detection and BIA technology are mainly used to predict abdominal obesity. However, there are some obvious disadvantages to image detection technology. The use of CT is particularly widespread, but the radiation does harm to the human body to a certain degree. MRI has no radiation problem, but it is expensive and has site-specific constraints. Ultrasound imaging is non-invasive and cheap, but its diagnosis precision depends on operating experience and equipment performance. X-ray imaging is simple and inexpensive, but it also has the radiation problem. Therefore, we chose to use BIA technology for abdominal prediction research.
The proposed model based on BIA technology has the advantages of harmlessness to humans, convenience and low cost. What's more, in order to solve the problem that it is difficult to predict invisible obesity directly, we considered the abdominal electrical impedance as a predictor. Using abdominal impedance as a variable in the prediction model can more accurately predict abdominal obesity because the value of abdominal impedance changes as it becomes fatter. From the comparison with the waist circumference model, the proposed model has higher prediction accuracy, which can more effectively predict hidden obesity.
It is well known that support vector machine is often used to solve small sample size problems and PSA is an optimization algorithm. Therefore, we suggested a prediction method based on the combination of particle swarm algorithm and support vector machine for obtaining a prediction model suitable for small sample size and high accuracy. In comparison with separate support vector machine model and traditional regression model, the proposed model shows better prediction results. The reasons why the single SVM model and the traditional regression model show poor performance are that the single SVM model lacks optimized parameters and the sample size for the traditional regression model is insufficient. Accordingly, the method in this paper offers great advantages in terms of safety, cost, hidden abdominal obesity prediction, small sample size prediction and prediction accuracy.

Conclusions
In this paper, we put forward a new model of classifying abdominal adiposity. The parameters of the SVM, which uses the RBF kernel functions, were optimized by the PSA so that the SVM can achieve a higher modeling accuracy. The experimental results demonstrate that the present model has high accuracy and can effectively predict abdominal adiposity. What is more, the method in this paper has the advantages of more effective prediction effects, radiation-free, cheapness and convenience. Therefore, the present study has provided a new model for the practical abdominal adiposity prediction. Finally, a combination of image detection technology and BIA technology will be considered to predict abdominal obesity in the future research work.