A Dynamic Time Warping Based Algorithm to Evaluate Kinect-Enabled Home-Based Physical Rehabilitation Exercises for Older People

Older people face difficulty engaging in conventional rehabilitation exercises for improving physical functions over a long time period due to the passive nature of the conventional exercise, inconvenience, and cost. This study aims to develop and validate a dynamic time warping (DTW) based algorithm for assessing Kinect-enabled home-based physical rehabilitation exercises, in order to support auto-coaching in a virtual gaming environment. A DTW-based algorithm was first applied to compute motion similarity between two time series from an individual user and a virtual coach. We chose eight bone vectors of the human skeleton and body orientation as the input features and proposed a simple but innovative method to further convert the DTW distance to a meaningful performance score in terms of the percentage (0–100%), without training data and experience of experts. The effectiveness of the proposed algorithm was validated through a follow-up experiment with 21 subjects when playing a Tai Chi exergame. Results showed that the algorithm scores had a strong positive linear relationship (r = 0.86) with experts’ ratings and the calibrated algorithm scores were comparable to the gold standard. These findings suggested that the DTW-based algorithm could be effectively used for automatic performance evaluation of an individual when performing home-based rehabilitation exercises.


Introduction
The global healthcare system is under great pressure due to rapid population aging as well as a shortage of healthcare personnel and budget [1]. An increasing proportion of older people is facing serious challenges of impaired physical functions such as muscle strength, balance, and mobility [2]. All these negative changes result in difficulties for older people maintaining independence of daily living, which would further cause anxiety, low self-esteem, and decreased quality of life [3,4]. Epidemiological studies show that a low physical activity level is strongly correlated to functional decline of the elderly. Physical exercise is an effective way to counteract the age-related functional decline [5].
There is strong evidence that appropriate physical rehabilitation exercises can improve physical activity level and activities of daily living of older people [4,6,7]. Conventional exercise therapies for older people are generally conducted in a formal rehabilitation center or clinical setting, which requires direct supervision from a professional therapist. Even though conventional exercise therapies have been shown as effective to increase physical activities as well as improve motor functions and balance, they suffer from low rates of uptake and adherence [8,9] due to a lack of enjoyment, inconvenient transportation, and high cost [10]. For example, Kobayashi et al. [11] examined the effects of physical trainee and the standard motion of the trainer in dance teaching. They utilized existing training data and the experience of experts to determine three boundaries of DTW matching costs for categorizing individual performances into four levels: below average, average, good, and excellent. Most studies require large representative training data of each exercise to convert the matching cost from DTW to the final performance score. This task is resource-intensive and it is difficult to generalize the established conversion criteria for a specific exercise program to different ones. In addition, the final performance evaluation is categorical, and, thus, qualitative and not sensitive to recognize user's gradual progress of exercise interventions. Chatzitofis et al. [29] and Mocanu et al. [30] developed home-based rehabilitation systems for heart health and physical activity. DTW was used to compute the quantitative performance score. However, how to convert the DTW matching cost to a quantitative score was not described and the quantitative scores were not validated with ground-truth ratings. Osgouei et al. [31] recently proposed an objective method for quantitative performance evaluation of rehabilitation exergames. Using shoulder abduction exercise as an example and two angle features (shoulder angle and arm angle), they presented how DTW was applied to compute the motion similarity between the unknown and reference trajectories of the human skeleton joints. A normalization approach with estimated lower and upper bounds for the DTW distance was utilized to further convert any DTW distance to an objective similarity score (0-100). The proposed method is promising since it does not require any training data. However, the proposed objective similarity score was not validated with physicians' evaluation of the performances. In addition, it remains questionable whether the proposed method can be extended from simple repetitive exercises to complex whole-body exercises.
This study aims to develop and validate a DTW-based algorithm for a motion similarity evaluation, in order to support effective Kinect-enabled home-based virtual coaching. We proposed a simple but innovative method to directly convert the DTW matching cost to a meaningful performance score in terms of percentage (0-100%), without training data and experience of experts. We further validated the effectiveness of our algorithm through a follow-up experiment with human subjects performing the complex whole-body exercise (Tai Chi) instead of simple, repetitive exercises, which would show good generalization of our proposed method to different exercise programs. The developed algorithm is expected to provide a similar evaluation on user's performance as domain experts, which could be very promising to apply into home-based physical rehabilitation exercises for better quality of life of the elderly.

3D Pose Comparison
Human motion is the coordinated movement of different body parts, and motion data from the Kinect sensor could be seen as a sequence of frames that comprise 3D coordinates of joint positions of the human skeleton and each bone determined by two connected joints could be seen as a 3D vector in the space. In order to assess the similarity between trainer's motion (coach) and trainee's motion (end user), the first step is to quantify the difference between these two 3D poses at a given frame. In this study, we chose the sum of the angle difference between all corresponding bone vectors of the two 3D poses as the distance measure. Eight major bones of the human skeleton (the upper and lower arms, and the upper and lower legs) were chosen for motion comparison since most human motions during the rehabilitation exercises involve the coordination of upper and lower limb movements. The angle difference (θ) between two corresponding bone vectors of trainer and trainee is illustrated in Figure 1, which can be calculated by the law of cosine (see Equation (1)).

Motion Comparison
Given two sequences of motion data X and Y, each frame of motion data is a 3D pose. We applied DTW to find the optimal matching between the trainer's motion and the trainee's motion while minimizing the effects of shifting and distortion in time [32]. Since we chose eight bone vectors for a motion comparison and the motion of each bone constitutes one dimensional time-series data, both trainer and trainee's data have eight dimensions. The matrix of each dataset has dimension 8-by-n, where n is the total frames of motion data. To explain how DTW works, let us start with the onedimensional case. The motion of a body part can be denoted as S = (s1, s2, …, sn) and T = (t1, t2, …, tm), which correspond to the trainer's motion and trainee's motion, respectively. The element in S and T is a normalized bone vector of that body part in a certain 3D coordinate system. To compare the similarity of sequence S and T by DTW, an n-by-m cost matrix is constructed where the (ith, jth) element denoted as C(si, tj) is the angle difference between si and tj (See Equation 1). A warping path denoted as P defines an alignment between S and T in the cost matrix, which should satisfy three conditions: boundary condition, monotonicity condition, and step size condition [33]. There could be multiple feasible warping paths in the cost matrix and the total matching cost of one warping path P between S and T is defined by the equation below.
where s is the length of the warping path P.
The goal of DTW is to find the optimal warping path, which has the minimal cumulative distance among all the possible warping paths. The DTW distance DTW(S, T) is defined as the total matching cost of the optimal warping path. In order to find the optimal warping path, a dynamic programming method is used. The recursive equation is given by the equation below.
where 1 < i < n and 1 < j < m. D(i, j) represents the matching cost between standard data (S) and testing data (T) from (1, 1) to (i, j). DTW can be generalized from the one-dimensional case to a multi-dimensional case [34]. For a multidimensional case, si and tj are not single bone vector but multiple bone vectors, which represent whole body motion. Multi-dimensional DTW(S, T) is calculated in a similar way as the onedimensional case, except that we need to redefine C(si, tj) as the sum of angle difference among all the dimensions. The output of DTW is the matching cost associated with the cumulative distance along the shortest warping path. Therefore, the lower the matching cost is, the closer the two motion sequences are and the better the motion performance is. In order to quantitatively assess motion performance of a trainee, we further convert the DTW distance (matching cost) to a meaningful performance score in terms of the percentage (0-100%) using the following equation.

Motion Comparison
Given two sequences of motion data X and Y, each frame of motion data is a 3D pose. We applied DTW to find the optimal matching between the trainer's motion and the trainee's motion while minimizing the effects of shifting and distortion in time [32]. Since we chose eight bone vectors for a motion comparison and the motion of each bone constitutes one dimensional time-series data, both trainer and trainee's data have eight dimensions. The matrix of each dataset has dimension 8-by-n, where n is the total frames of motion data. To explain how DTW works, let us start with the one-dimensional case. The motion of a body part can be denoted as S = (s 1 , s 2 , . . . , s n ) and T = (t 1 , t 2 , . . . , t m ), which correspond to the trainer's motion and trainee's motion, respectively. The element in S and T is a normalized bone vector of that body part in a certain 3D coordinate system. To compare the similarity of sequence S and T by DTW, an n-by-m cost matrix is constructed where the (ith, jth) element denoted as C(s i , t j ) is the angle difference between s i and t j (See Equation (1)). A warping path denoted as P defines an alignment between S and T in the cost matrix, which should satisfy three conditions: boundary condition, monotonicity condition, and step size condition [33]. There could be multiple feasible warping paths in the cost matrix and the total matching cost of one warping path P between S and T is defined by the equation below.
where s is the length of the warping path P.
The goal of DTW is to find the optimal warping path, which has the minimal cumulative distance among all the possible warping paths. The DTW distance DTW(S, T) is defined as the total matching cost of the optimal warping path. In order to find the optimal warping path, a dynamic programming method is used. The recursive equation is given by the equation below.
where 1 < i < n and 1 < j < m. D(i, j) represents the matching cost between standard data (S) and testing data (T) from (1, 1) to (i, j). DTW can be generalized from the one-dimensional case to a multi-dimensional case [34]. For a multidimensional case, s i and t j are not single bone vector but multiple bone vectors, which represent whole body motion. Multi-dimensional DTW(S, T) is calculated in a similar way as the one-dimensional case, except that we need to redefine C(s i , t j ) as the sum of angle difference among all the dimensions. The output of DTW is the matching cost associated with the cumulative distance along the shortest warping path. Therefore, the lower the matching cost is, the closer the two motion sequences are and the better the motion performance is. In order to quantitatively assess motion performance of a trainee, we further convert the DTW distance (matching cost) to a meaningful performance score in terms of the percentage (0-100%) using the following equation.
where s is the length of optimal warping path, 8 stands for eight bone vectors selected for motion evaluation, and C(s ik , t jk ) is the element of optimal warping path in the cost matrix, which is the summation of angle differences for eight bone vectors, and DTW distance-DTW(S, T) is a summation of elements (C(s ik , t jk )) along the optimal path. We assume the angle difference between two corresponding bone vectors is within 90 degrees based on an earlier study [18], which results in the maximum DTW(S, T) along the optimal path, which would be 90 × 8 × s. Because the output distance (DTW(S, T)) is a measure of dissimilarity between the two motion time series (the longer the distance, the greater the deviation), the last part of Equation (4) would be a percentage score (0-100%) to measure the level of similarity between the trainee's motion and trainer's motion.

Body Orientation Offset
Calculation of bone vectors for both trainee's motion and trainer's motion based on the world coordinate system could be problematic if the trainee is not oriented as exactly as the trainer. The difference (error) in body orientation would be directly transferred to all eight bone vectors. In order to solve this problem, instead of using the joint position based on the world coordinate system, we calculate the joint position data based on a local coordinate system of the human model, which would be updated in real time. Establishment of the local coordinate system was adopted from Unity3D Mecanim system [35]. The up vector is defined as middle of left/right upper arm and middle of left/right upper leg. The left vector is an average upper body left (a vector defined by the left upper arm and the right upper arm) and lower body left (a vector defined by the left upper leg and the right upper leg). The forward vector is the cross product of the up vector and the left vector. In order to make these three vectors orthogonal to each other, the final left vector is the cross product of the forward vector and the up vector. Then the up vector is aligned to a normal vector of a ground plane as the y-axis. Hence, the final left vector and forward vector would be finalized as the x-axis and z-axis, respectively, based on the rotation matrix. The main purpose of this step is to guarantee that the x-axis and the z-axis would always be in the ground plane. The origin of this local coordinate system is the projection of center of mass on the ground.
As shown in Figure 2, the only difference between motions of two avatars is the body orientation (see z-axis). Under the local coordinate system, all the bone vectors between two avatars are exactly the same. While under the world coordinate system, there are clear angle differences among all corresponding bone vectors of two avatars, which are caused by the different body orientations.
Calculation of angular differences based on the local coordinate system of the human model can remove the accumulative errors on eight bone vectors caused by different body orientations. However, the difference in body orientation between trainee and trainer should also be considered in this case. Hence, we added one more dimension-body orientation into the evaluation of motion similarity. Lastly, we used nine dimensions to calculate the motion similarity: eight bone vectors under the local coordinate system of the human model and the forward vector of the body, which reflects the body (mainly trunk) orientation (z-axis in Figure 2). Then the final performance score can be updated by Equation (5). where all the notations are the same as in Equation (4) and where 9 stands for nine dimensions.
as the y-axis. Hence, the final left vector and forward vector would be finalized as the x-axis and zaxis, respectively, based on the rotation matrix. The main purpose of this step is to guarantee that the x-axis and the z-axis would always be in the ground plane. The origin of this local coordinate system is the projection of center of mass on the ground. As shown in Figure 2, the only difference between motions of two avatars is the body orientation (see z-axis). Under the local coordinate system, all the bone vectors between two avatars are exactly the same. While under the world coordinate system, there are clear angle differences among all corresponding bone vectors of two avatars, which are caused by the different body orientations.

Validation of the Developed DTW-Based Algorithm
In order to validate the developed algorithm, we performed a follow-up experiment where the final performance scores from the algorithm were compared with ratings given by the domain experts. An 8-form Tai Chi exercise was selected to evaluate the proposed algorithm.

Experimental Participants
Twenty-one middle-aged and older subjects (age: 55.2 ± 4.2 years, height: 166.1 ± 7.9 cm, weight: 65.36 ± 8.3 kg) from a local Tai Chi academy participated in the algorithm validation experiment. For the sake of convenience, the experiment was conducted in the same Tai Chi academy instead of each subject's home. All the subjects were in healthy conditions and without musculoskeletal diseases or injuries that may affect their Tai Chi performance. Prior to the participation of the experiment, each subject signed informed consent on the experiment protocol, which was approved by the KAIST Institutional Review Board (IRB-18-070).

Experimental Setup and Procedure
The Tai Chi exergame was developed with Kinect V2 sensor in Unity3D platform (Unity 5.5.2f1) by C# for the real-world application. The main scene of the exergame is implemented with two avatars: virtual trainer and trainee ( Figure 3). The avatar-based rendering of motion preserves the privacy of the user, which is critical for the healthcare systems. The motion of the trainee avatar is updated by the real motion of the user. User's motion is captured by Kinect V2 (Microsoft Corp, Redmond, WA, USA) and mapped to the trainee avatar using the Kinect V2 asset for Unity3D [36]. The motion of the trainer avatar is retargeted by the pre-recorded standard motion from a certified Tai Chi instructor (Master Level).
The whole experimental setting is illustrated in Figure 4. Both Kinect V2 and Xsens motion capture system were used to capture each subject's motion. The Kinect sensor was placed at a height of 0.8 m.
Since subjects should stretch their arms frequently because of the characteristics of Tai Chi motion, subjects were instructed to stand around 3.0-3.5 m away from the Kinect sensor. Considering that some Tai Chi motion with body rotation may not be well captured by the Kinect due to the self-occlusion [37], a wearable inertial sensor-based motion capture system-Xsens MVN BIOMECH (Xsens Technologies B.V., Enschede, The Netherlands) was also utilized to obtain high-quality motion data for the algorithm validation [38,39]. The whole experimental setting is illustrated in Figure 4. Both Kinect V2 and Xsens motion capture system were used to capture each subject's motion. The Kinect sensor was placed at a height of 0.8 m. Since subjects should stretch their arms frequently because of the characteristics of Tai Chi motion, subjects were instructed to stand around 3.0-3.5 m away from the Kinect sensor. Considering that some Tai Chi motion with body rotation may not be well captured by the Kinect due to the selfocclusion [37], a wearable inertial sensor-based motion capture system-Xsens MVN BIOMECH (Xsens Technologies B.V., Enschede, The Netherlands) was also utilized to obtain high-quality motion data for the algorithm validation [38,39]. A smart phone, supported by a tripod, was used to record each subject's motion when he/she was playing the Tai Chi exergame. The motion videos were distributed to three Tai Chi experts for independent performance evaluation. Each expert was provided a 10-cm visual analog scale (VAS) [40] and asked to place a vertical mark on the scale to indicate the performance level of motion for each subject. The anchor statements for VAS in this study are "cannot follow Tai Chi at all" (score of 0) on the left and "master level with standard Tai Chi motion" (score of 100) on the right. The raw scale score is then converted to a 0-100 scale.  The whole experimental setting is illustrated in Figure 4. Both Kinect V2 and Xsens motion capture system were used to capture each subject's motion. The Kinect sensor was placed at a height of 0.8 m. Since subjects should stretch their arms frequently because of the characteristics of Tai Chi motion, subjects were instructed to stand around 3.0-3.5 m away from the Kinect sensor. Considering that some Tai Chi motion with body rotation may not be well captured by the Kinect due to the selfocclusion [37], a wearable inertial sensor-based motion capture system-Xsens MVN BIOMECH (Xsens Technologies B.V., Enschede, The Netherlands) was also utilized to obtain high-quality motion data for the algorithm validation [38,39]. A smart phone, supported by a tripod, was used to record each subject's motion when he/she was playing the Tai Chi exergame. The motion videos were distributed to three Tai Chi experts for independent performance evaluation. Each expert was provided a 10-cm visual analog scale (VAS) [40] and asked to place a vertical mark on the scale to indicate the performance level of motion for each subject. The anchor statements for VAS in this study are "cannot follow Tai Chi at all" (score of 0) on the left and "master level with standard Tai Chi motion" (score of 100) on the right. The raw scale score is then converted to a 0-100 scale. A smart phone, supported by a tripod, was used to record each subject's motion when he/she was playing the Tai Chi exergame. The motion videos were distributed to three Tai Chi experts for independent performance evaluation. Each expert was provided a 10-cm visual analog scale (VAS) [40] and asked to place a vertical mark on the scale to indicate the performance level of motion for each subject. The anchor statements for VAS in this study are "cannot follow Tai Chi at all" (score of 0) on the left and "master level with standard Tai Chi motion" (score of 100) on the right. The raw scale score is then converted to a 0-100 scale.

Statistical Analysis
The intraclass correlation coefficient (ICC) was used to check the inter-rater reliability of experts' subjective ratings [41]. Good consistency and agreement among different experts are the prerequisite to consider experts' rating as the gold standard for validating the developed DTW-based algorithm. ICC is a widely used reliability index and the general guideline of ICC is as follows: ICC < 0.5, poor reliability, 0.5 < ICC < 0.75, moderate reliability, 0.75 < ICC < 0.9, good reliability, and ICC > 0.9, excellent reliability [42].
More importantly, final performance scores from the developed algorithm were compared with the experts' ratings (as a gold standard). The Pearson correlation coefficient (r) between final performance scores from the developed algorithm and those from experts was calculated to assess the strength of a linear relationship between those two evaluation methods. In addition, linear regression was used to calibrate performance scores from the algorithm so that the scores from two evaluation methods could be consistent. Differences between algorithm scores after calibration and experts' ratings were analyzed. The SPSS statistical package version 20 (IBM Corp., Armonk, NY, USA) was used for statistical analysis. Figure 5 shows the subjective ratings of 21 subjects from three independent experts. The ICC value for three experts was 0.861 (95% CI: 0.688~0.942), which was within the range of 0.75 to 0.9 [41,42], which indicates the experts' ratings were consistent and with good inter-rater reliability.

Statistical Analysis
The intraclass correlation coefficient (ICC) was used to check the inter-rater reliability of experts' subjective ratings [41]. Good consistency and agreement among different experts are the prerequisite to consider experts' rating as the gold standard for validating the developed DTW-based algorithm. ICC is a widely used reliability index and the general guideline of ICC is as follows: ICC < 0.5, poor reliability, 0.5 < ICC < 0.75, moderate reliability, 0.75 < ICC < 0.9, good reliability, and ICC > 0.9, excellent reliability [42].
More importantly, final performance scores from the developed algorithm were compared with the experts' ratings (as a gold standard). The Pearson correlation coefficient (r) between final performance scores from the developed algorithm and those from experts was calculated to assess the strength of a linear relationship between those two evaluation methods. In addition, linear regression was used to calibrate performance scores from the algorithm so that the scores from two evaluation methods could be consistent. Differences between algorithm scores after calibration and experts' ratings were analyzed. The SPSS statistical package version 20 (IBM Corp., Armonk, NY, USA) was used for statistical analysis. Figure 5 shows the subjective ratings of 21 subjects from three independent experts. The ICC value for three experts was 0.861 (95% CI: 0.688~0.942), which was within the range of 0.75 to 0.9 [41,42], which indicates the experts' ratings were consistent and with good inter-rater reliability.

Evaluation Comparison between Experts and the Developed Algorithm
The subjective ratings from three experts were averaged and regarded as the gold standard to validate the developed DTW-based algorithm. Figure 6 shows the scatter plot of the algorithm's final performance scores and averaged experts' ratings for 21 subjects. The Pearson correlation coefficient (r) was 0.86 (t = 7.45, p < 0.001), which indicates a strong positive linear relationship between scores from those two evaluation methods. Additional analysis on performance scores between two evaluation methods showed that, under many circumstances, the algorithm would overestimate the performance when compared with the experts' rating. Therefore, we calibrated performance scores of the algorithm using the fitted equation from linear regression (see Figure 6) so that the algorithm would generate similar evaluation scores as the domain experts for practical applications.

Evaluation Comparison between Experts and the Developed Algorithm
The subjective ratings from three experts were averaged and regarded as the gold standard to validate the developed DTW-based algorithm. Figure 6 shows the scatter plot of the algorithm's final performance scores and averaged experts' ratings for 21 subjects. The Pearson correlation coefficient (r) was 0.86 (t = 7.45, p < 0.001), which indicates a strong positive linear relationship between scores from those two evaluation methods. Additional analysis on performance scores between two evaluation methods showed that, under many circumstances, the algorithm would overestimate the performance when compared with the experts' rating. Therefore, we calibrated performance scores of the algorithm using the fitted equation from linear regression (see Figure 6) so that the algorithm would generate similar evaluation scores as the domain experts for practical applications. Figure 7 further demonstrates the calibrated performance scores from the algorithm were comparable to experts' ratings. The score difference between two evaluation methods had a mean of 9.5 and a standard deviation of 7.0 (Maximum: 21.6, Minimum: 0.1). further demonstrates the calibrated performance scores from the algorithm were comparable to experts' ratings. The score difference between two evaluation methods had a mean of 9.5 and a standard deviation of 7.0 (Maximum: 21.6, Minimum: 0.1).

Discussion
We developed a DTW-based algorithm for assessing motion similarity between an individual user and a virtual coach. DTW was designed to handle local changes in timing (due to speed variations) and, therefore, desirable for evaluating rehabilitation exercises for the elderly self-care at home. The effectiveness of the algorithm was validated through a follow-up experiment. In the validation experiment, the Tai Chi exercise was chosen as the representative physical exercise to verify the proposed algorithm due to two major reasons. First, the effectiveness of Tai Chi exercise for improving physical functions has been proven by many previous studies [43,44]. Second, Tai Chi exercise is a complex and whole-body motion. If the algorithm could perform well in terms of evaluating Tai Chi motion, it should be generalizable to other simpler rehabilitation exercises. further demonstrates the calibrated performance scores from the algorithm were comparable to experts' ratings. The score difference between two evaluation methods had a mean of 9.5 and a standard deviation of 7.0 (Maximum: 21.6, Minimum: 0.1).

Discussion
We developed a DTW-based algorithm for assessing motion similarity between an individual user and a virtual coach. DTW was designed to handle local changes in timing (due to speed variations) and, therefore, desirable for evaluating rehabilitation exercises for the elderly self-care at home. The effectiveness of the algorithm was validated through a follow-up experiment. In the validation experiment, the Tai Chi exercise was chosen as the representative physical exercise to verify the proposed algorithm due to two major reasons. First, the effectiveness of Tai Chi exercise for improving physical functions has been proven by many previous studies [43,44]. Second, Tai Chi exercise is a complex and whole-body motion. If the algorithm could perform well in terms of evaluating Tai Chi motion, it should be generalizable to other simpler rehabilitation exercises.

Discussion
We developed a DTW-based algorithm for assessing motion similarity between an individual user and a virtual coach. DTW was designed to handle local changes in timing (due to speed variations) and, therefore, desirable for evaluating rehabilitation exercises for the elderly self-care at home. The effectiveness of the algorithm was validated through a follow-up experiment. In the validation experiment, the Tai Chi exercise was chosen as the representative physical exercise to verify the proposed algorithm due to two major reasons. First, the effectiveness of Tai Chi exercise for improving physical functions has been proven by many previous studies [43,44]. Second, Tai Chi exercise is a complex and whole-body motion. If the algorithm could perform well in terms of evaluating Tai Chi motion, it should be generalizable to other simpler rehabilitation exercises.
Inter-rater reliability analysis revealed that the reliability level of experts' ratings was "good" (0.75 < ICC = 0.861 < 0.90). However, the 95% confidence interval of ICC was wide (0.688-0.942), which indicates that, in the worst case, the reliability level was just "acceptable" (0.5 < ICC = 0.688 < 0.75). The wide confidence interval warned that, even though the overall agreement were high among three experts, there were non-negligible disagreement on their ratings [45]. Paired t-tests also confirmed the significant difference on performance ratings between the third expert and the other two experts ( Figure 5). The inconsistency on subjective ratings from three experts highlighted the potential benefits of applying our developed algorithm to assess the exercise performance automatically and objectively.
Strong linear relationship (r = 0.86) between the algorithm score and experts' evaluation (gold standard) implied the developed algorithm was sensitive in terms of recognizing the performance levels from different subjects as the domain experts. Unexpectedly, a detailed analysis revealed that the algorithm score was significantly higher than the experts' rating. This could be mainly due to different baselines for two evaluation methods. The algorithm evaluation was purely based on the sum of angle differences among nine corresponding body vectors and the subjects with all angle differences at 90 degrees were considered as the worst (performance score = 0). Since even the subjects rated by the experts as the worst in terms of motion performance had most of the angle differences within 45 degrees, the algorithm evaluation would overestimate the subject's performance score due to the ceiling effect [46]. To reduce this overestimation and enable our algorithm to provide similar scores as the domain experts, the linear regression equation (Figure 6) was applied to calibrate the algorithm score. The experimental results showed that the calibrated algorithm scores were comparable to the experts' ratings. Taken together, these findings demonstrated that, even though the developed DTW-based algorithm could be a good evaluation tool to rank the exercise performance among different subjects objectively, the algorithm score should be calibrated by experts' ratings on a small number of representative subjects. In this way, the good consistency between algorithm evaluation and experts' evaluation can be achieved for the practical applications.
Earlier studies used binary classification as well as three-point and four-point Likert scales to obtain experts' ratings for validating their algorithms [24][25][26][27][28]. This kind of validation is rough and likely results in inflated validation accuracy because of the wide performance range between two consecutive points, especially for binary classification and a 3-point Likert scale. To the best of our knowledge, there was only one reported study, which also used 0-100 score as the experts' rating as we did to validate the developed algorithm [21]. However, the highest correlation coefficient between their DTW-based algorithm score and expert's rating was 0.64, which was much lower than ours (r = 0.86). The improved performance from our study could be related to the selection of different motion features. Instead of using simple joint angles as Capecci et al. [21], 3D bone vectors of human skeletons were chosen in our study for better conservation of spatial information of the motion because joint angles could not define spatial information of two bones connected by the same joint. In addition, since there always exist theoretical upper and lower bounds (180 and 0 degrees) for any angle difference between two corresponding bone vectors, converting the DTW matching cost to a final percentage score is straightforward and reasonable in our study. It does not require training data and experience of experts. In this study, we assumed the upper bound was 90 degrees instead of 180 degrees based on an earlier study [18] and our practical exercise scenario.
It is worthwhile to mention that elimination of the confounding effect caused by body orientation offset is a major challenge for the algorithm development. In fact, both bone vector-based and joint position-based algorithms are very sensitive to the body orientation especially when evaluating complex whole-body exercises with rotational motions. Chua et al. [47] also pointed out this issue when they evaluated Tai Chi motion. We calculated the joint positions and bone vectors based on the local coordinate system of the human model instead of the world coordinate system in real-time, which can get rid of the error induced by the body orientation offset during the entire exercise. The compensation of body orientation offset had practical meaning for the elderly because they might not be able to orient themselves precisely as the standard virtual coach during the rehabilitation exercise.
In order to further examine the use and acceptance of exergaming technology for home-based physical rehabilitation by the primary target users (older people), we applied the technology acceptance model [48][49][50][51] and designed a questionnaire with 11 constructs (Appendix A) to evaluate user acceptance of our developed Tai Chi exergaming prototype system (Figure 3). Forty-one older adults (age 77.3 ± 5.4 years, height 159.3 ± 8.5 cm, weight 59.0 ± 9.5 kg) from a local senior welfare center participated in this survey. They were asked to try the prototype system and play the Tai Chi exergame before giving their questionnaire responses in a five-point Likert scale (1 corresponding to "strongly disagree" and 5 corresponding to "strongly agree"). Figure 8 presents a summary of their responses. The results showed that the older people perceived relatively high vulnerability (3.21 out of 5) and severity (3.63) in terms of difficulties in self-care and independent living, and they had high intentions (Behavior intention = 4.08) to use our system in the future. They thought our system was very useful (Perceived usefulness = 4.43), positive (Attitude = 4.29), entertaining (Hedonic motivation = 3.82) and having low privacy risk (Perceived privacy risk = 1.18). Interestingly, even though the older people were somewhat confident in their capabilities to use this system for improving their health conditions (Self-efficacy = 3.75), the expected effort (3.07) and response cost (2.72) were considerably high. Taken all together, these findings implied that our developed Tai Chi exergaming prototype system is useful for the older people performing home-based physical rehabilitation exercises. However, the prototype system needs be improved to make it easy to use and cost-effective. We collected some valuable feedback from the participants to improve our prototype system, which mainly includes the following: (1) Audio effects should be added to make the exergame more entertaining and enjoyable. (2) The standard pace of Tai Chi exergame should slow down and be adjustable by each individual. (3) The size of avatar should be enlarged to be seen clearly and timely feedback for problematic motions should be provided, and (4) social networking functions (such as sharing exergaming performance score with friends) should be further developed.
In order to further examine the use and acceptance of exergaming technology for home-based physical rehabilitation by the primary target users (older people), we applied the technology acceptance model [48][49][50][51] and designed a questionnaire with 11 constructs (Appendix A) to evaluate user acceptance of our developed Tai Chi exergaming prototype system (Figure 3). Forty-one older adults (age 77.3 ± 5.4 years, height 159.3 ± 8.5 cm, weight 59.0 ± 9.5 kg) from a local senior welfare center participated in this survey. They were asked to try the prototype system and play the Tai Chi exergame before giving their questionnaire responses in a five-point Likert scale (1 corresponding to "strongly disagree" and 5 corresponding to "strongly agree"). Figure 8 presents a summary of their responses. The results showed that the older people perceived relatively high vulnerability (3.21 out of 5) and severity (3.63) in terms of difficulties in self-care and independent living, and they had high intentions (Behavior intention = 4.08) to use our system in the future. They thought our system was very useful (Perceived usefulness = 4.43), positive (Attitude = 4.29), entertaining (Hedonic motivation = 3.82) and having low privacy risk (Perceived privacy risk = 1.18). Interestingly, even though the older people were somewhat confident in their capabilities to use this system for improving their health conditions (Self-efficacy = 3.75), the expected effort (3.07) and response cost (2.72) were considerably high. Taken all together, these findings implied that our developed Tai Chi exergaming prototype system is useful for the older people performing home-based physical rehabilitation exercises. However, the prototype system needs be improved to make it easy to use and cost-effective. We collected some valuable feedback from the participants to improve our prototype system, which mainly includes the following: (1) Audio effects should be added to make the exergame more entertaining and enjoyable. (2) The standard pace of Tai Chi exergame should slow down and be adjustable by each individual. (3) The size of avatar should be enlarged to be seen clearly and timely feedback for problematic motions should be provided, and (4) social networking functions (such as sharing exergaming performance score with friends) should be further developed. There were several limitations in the current study. First, the conversion from DTW distance to a percentage score (0-100%) is based on the assumption that the maximum angle difference between two corresponding bone vectors is 90 degrees. Even though this assumption works fine for most body parts, the exact value of 90 is not always appropriate. Second, we focused on motion correctness for the performance evaluation in this study. The rhythm mismatch was not yet considered in the overall performance evaluation [28]. In addition, detailed feedback for problematic motions from certain body parts should be provided in the future study to timely inform the older individual for further improvements in rehabilitation exercises. Third, even though the primary target users for our Figure 8. Results of the user acceptance questionnaire for Tai Chi exergaming prototype system. Remarks: (1) Scores from all older participants were averaged for each construct. (2) Except perceived privacy risk and response cost, all constructs are positively associated with user's intention to adopt the system. There were several limitations in the current study. First, the conversion from DTW distance to a percentage score (0-100%) is based on the assumption that the maximum angle difference between two corresponding bone vectors is 90 degrees. Even though this assumption works fine for most body parts, the exact value of 90 is not always appropriate. Second, we focused on motion correctness for the performance evaluation in this study. The rhythm mismatch was not yet considered in the overall performance evaluation [28]. In addition, detailed feedback for problematic motions from certain body parts should be provided in the future study to timely inform the older individual for further improvements in rehabilitation exercises. Third, even though the primary target users for our developed Tai Chi exergaming system are older adults, 21 participants for validating DTW-based algorithm included both middle-aged and older adults, in order to cover a wide range of Tai Chi proficiency levels under practical constraints. Our next step will be to refine the developed prototype system and test it with a large number of older adults at the home environment for verifying practicality of the system. Last but not least, a single Kinect sensor often generates poor skeleton tracking performance for some rotational motions during rehabilitation exercises due to self-occlusion and limited sensing range [37]. Further research on combining data from multiple Kinect sensors to achieve more accurate and robust skeleton tracking performance is needed.

Conclusions
We developed a DTW-based algorithm to automatically evaluate user's performance during physical rehabilitation exercises. We chose eight bones vectors of the human skeleton and body orientation as the input features and proposed a simple but innovative method to further convert the DTW matching cost to a meaningful performance score in terms of percentage (0-100%), without training data and experience of experts. The effectiveness of the proposed algorithm was tested through a follow-up experiment with 21 subjects when playing a complex whole-body exercise (Tai Chi) instead of simple repetitive exercises. Results showed that the algorithm scores had a strong positive linear relationship (r = 0.86) with experts' ratings and the calibrated algorithm scores were comparable to the gold standard. These findings suggested that our algorithm could be effectively used for automatic performance evaluation of an older individual when performing home-based physical rehabilitation exercises.