Validation of Marker-Less System for the Assessment of Upper Joints Reaction Forces in Exoskeleton Users

This paper presents the validation of a marker-less motion capture system used to evaluate the upper limb stress of subjects using exoskeletons for locomotion. The system fuses the human skeletonization provided by commercial 3D cameras with forces exchanged by the user to the ground through upper limbs utilizing instrumented crutches. The aim is to provide a low cost, accurate, and reliable technology useful to provide the trainer a quantitative evaluation of the impact of assisted gait on the subject without the need to use an instrumented gait lab. The reaction forces at the upper limbs’ joints are measured to provide a validation focused on clinically relevant quantities for this application. The system was used simultaneously with a reference motion capture system inside a clinical gait analysis lab. An expert user performed 20 walking tests using instrumented crutches and force platforms inside the observed volume. The mechanical model was applied to data from the system and the reference motion capture, and numerical simulations were performed to assess the internal joint reaction of the subject’s upper limbs. A comparison between the two results shows a root mean square error of less than 2% of the subject’s body weight.


Introduction
About 90,000 persons each year are made dependent on a wheelchair for mobility due to a Spinal Cord Injury (SCI) [1]. In recent years, wearable robots were developed to make it possible for people with SCI to walk. As proven by scientific studies [2], these solutions have been successful in reducing the development of secondary diseases such as obesity, diabetes, sores, and osteoporosis. In improving the quality and duration of life of people with SCI, and, at the same time, lead to the overall containment of the costs of the health system.
In previous research activities, our research group worked on the experimentation of a commercial exoskeleton, which is a REwalk personal version (P5), with SCI subjects. The present paper is heavily based on these previous works and it is meant to expand the research further. The experimentation took place in facilities dedicated to the recovery of paraplegic subjects. The subjects were trained to use the exoskeleton in gait labs with the support of specialized physiotherapists, acting both as a caregiver and instructor to correct the posture during the walk session. As described in Reference [3], the subjects, after wearing the exoskeleton, were equipped with a pair of crutches that (i) served as an aid to walking along a predefined path and (ii) were instrumented with strain gauge bridges and a triaxial accelerometer to measure both axial and shear forces, tilt angles, and impact time on the ground in real-time [4][5][6]. A simplified biomechanical model, based on the OpenSim platform [7], was designed to represent the system formed by the subject, the exoskeleton, and the crutches. These previous works aimed at providing a quantitative measure of the internal forces acting on the shoulders of the patient to improve the training performance and avoid incorrect use of the exoskeleton that could lead to pathologies at the shoulders. The results of the work were encouraging. The accuracy of the measurement of the internal forces was proven adequate to enhance the therapist's evaluation and the patient's engagement.
In this paper, we want to expand the research further. Monitoring upper joint reaction forces and gait objectively is only the first step for the correct, continuous, and autonomous practice of the subject in living environments. One of the issues is represented by the fact that the acquisition of the joint trajectories is carried out using marker-based optical tracking systems (MBS), which are the gold standard for clinical gait assessment [8,9]. The research in this field has produced several MBS that are currently market available. Among them, the Vicon (by Vicon Motion Systems Ltd, Oxford, OX5 1QU, UK), the Xsens (by Xsens Technologies, Enschede, PR, Netherlands), the Phase Space (by PhaseSpace Inc., San Leandro, CA, USA), the Optitrack (by NaturalPoint Inc., Corvallis, OR, USA) and the Smart-DX (by BTS Bioengineering, Garbagnate, MI, Italy) are marker-based optical systems that provide very accurate human capture. However, they suffer from limitations. In particular, the high cost and computational complexity limit their adoption to the gait labs [10]. Even for those who can have access to the gait labs, the setting available in the laboratory is not representative of realistic living environments, either for the presence of the clinical staff, for the limited training time slots, and for the typologies of paths that can be reproduced in indoor clinical settings.
Wearable IMU sensors are being intensively studied for real-world applications and could represent a reasonable choice for extending the exoskeleton-based training in living environments. They are used to estimate the orientation of the human body in healthcare-related applications, such as fall detection of elderly people [11,12], body and postural orientation [13,14], sports and athlete's limb dynamics [15][16][17], gaming [18], and robotic prosthetic body parts especially upper limb rehabilitation [19,20]. Gait analysis is also successfully carried out using wearable IMUs, such as in References [21,22].
Their use for human motion capture presents the advantages of ubiquitous performance, high accuracy, small size, low cost, and reduced complexity and costs. However, their adoption is not free from limitations. Combining body acceleration and orientation changes in accurate measurements is still an issue since the accelerometers measure both linear and gravitational accelerations. Drift compensation requires fusing different sensors technologies, as in References [23][24][25] or developing sophisticated elaboration algorithms [26]. The presence of the exoskeleton is another issue because it affects the quality of motion recognition. To overcome these limitations, different solutions have been proposed in the literature, such as the use of neural networks-based analysis [27], sensor data fusion to estimate the forces exerted by the foot on the sole [28], and in-shoe pressure mapping to identify the gait phases [29]. However, increasing the number of sensors means increasing the whole system complexity both in terms of system set-up and of data analysis, and does not help its usage in uncontrolled environments. A different alternative could be to directly acquire the signal generated by the exoskeleton joint angular sensors, as in Reference [30], but this solution does not apply to commercial exoskeletons, which usually are completely closed systems.
The examination of the state-of-the-art system has convinced us that these sensors need contact with the exoskeleton (lower limbs) and the body segments of the subject (upper limbs) constitute a significant obstacle to their continuous use since it requires time and precision in positioning. For example, marker setting is time-consuming and small errors in the positioning of either MBS markers or IMUs, as well as soft tissue artifacts, induce large errors in estimating the joint centers [31]. In addition, a significant degree of effort of the subject is necessary. This aspect should not be underestimated because the subject could be unable to collaborate or not compliant to collaborate. Vision-based marker-less systems (MLS) represent a valuable alternative to the above-mentioned systems. Both 2D and 3D cameras can be used to acquire the RGB image or the depth information of the subject and suitable procedures and methods have been developed to track the patient's motion. MLS can provide a reasonable option to develop a motion capture system suitable for functional assessment activities in living environments, provided that the adopted devices guarantee good accuracy of the estimate of both the upper limb forces and the gait quality [32,33]. At present, among the commercial devices suitable for the application, the Microsoft Kinect TM sensor deserve particular attention due to the cost-effectiveness and portability of this device [34,35]. This also provides the subject's skeleton estimation based on 25 joints of the human body [36]. This embedded capability has captured the interest of many research communities in a large number of fields such as multimedia [37], gaming [38], robotics [39], gesture recognition [40,41], and motion analysis [42][43][44]. At present, the Kinect is the most investigated sensor for advanced and affordable applications in the healthcare field such as elderly care and rehabilitation [45][46][47][48][49], functional assessment activities and posture [50][51][52][53], and daily life activities [54]. The accuracy and reliability of the Kinect sensor in gait assessment and motor function has been extensively investigated. Works published in the last five years provide comparative analyses of multi-Kinect systems benchmarked against MBS setups used as the gold standard, and all show good-to-excellent agreement with the gold standard in gait assessment [55][56][57] and motor function [58,59].
In this paper, a multi-Kinect system is proposed for providing the joint trajectories of both upper limb body segments and exoskeleton lower limbs. The devices' position and the orientation have been optimized to maximize the coverage of the training path, which minimizes the occlusions of body/exoskeleton segments. Suitable multi-view calibration, previously developed by some of the authors, has been used to perform the extrinsic calibration of the Kinects, i.e., to estimate the roto-translation matrices that map the skeletons produced by every single Kinect in a common reference system [60]. To comply with the fact that the Kinect-embedded skeletonization is designed to optimally track the front side of the body while non-frontal views show larger errors [61], a suitably developed Kalman-based algorithm is proposed to fuse every single skeleton in a single one, which represents the joint trajectories in the global reference system. The system is benchmarked against the BTS system, used as the gold standard. To this aim, the experimental work presented in this scenario has been carried out using the same setup as the one in Reference [3] where the subject walks in the gait lab, and is simultaneously tracked by the multi-Kinect marker-less (ML) system and by the BTS marker-based (MB) system.
While assessing the impact of external forces on joints is important to prevent injuries, a direct measurement is not feasible. Hence, a field of biomechanical models are being developed to address different issues as well as software solutions to handle this issue. The implementations of these models feed on motion capture and on external forces measurements to assess the moments and forces acting internally. Complex models are not always the best solution since their validation could be difficult [62], and the trade-off between computation-time and results could be complex [63]. In our case, the acquired data feed a simplified mechanical model developed in OpenSim [64] that, using the information from both the instrumented crutches and the force platform, estimates the internal forces exerted at the shoulders. Since this model was used in a previous assessment of the same activity [3], it was used as a reference process for the acquired data.

ML Multi-Kinect Motion Tracking System
The ML multi-Kinect system is composed of four Kinects V2, which provide both an RGB sensor (1920 × 1080 pixels) and a Time of Flight (ToF) depth sensor (512 × 424 pixels) spanning a range of 0.5 m-5 m and 70 • × 60 • field of view at a maximum frame rate of 30 fps and maximum standard uncertainty of 18 mm at 5 m [35]. The Kinects are organized in the measurement space as schematically shown in Figure 1. Their mutual distance along the direction of walking is 5 m, and the perpendicular distance is 2 m. Due to the high amount of data sent by the Kinect sensor, it requires a dedicated USB 3.0 bus and external 12 V power. The Microsoft APIs developed for the Kinect V2 offer the skeletonization functionalities, which are strategic in our application for assessing the human kinematics. However, they are not compliant with the use of multiple Kinects on a single PC. For this reason, in our system, each sensor is connected to a dedicated client unit, an Intel NUC PCs, which provides RGB images, depth images, and the skeleton of the subject. The clients are organized in a local LAN network mastered by a server unit. The clients communicate with the server using an MQTT protocol. The server acquires the skeletons from every single client.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 31 each sensor is connected to a dedicated client unit, an Intel NUC PCs, which provides RGB images, depth images, and the skeleton of the subject. The clients are organized in a local LAN network mastered by a server unit. The clients communicate with the server using an MQTT protocol. The server acquires the skeletons from every single client.   Figure 2 shows the skeletal data provided by each Kinect. In our implementation, nodes 19 and 15 collapse with nodes 20 and 16, respectively, and nodes 11 and 7 collapse with nodes 12 and 8, which resulted in a total of 16 nodes. This choice was motivated by the need to account for the most stable nodes, i.e., those measured with acceptable precision (5 cm, as stated by Reference [59]), and by the fact that the hand is, in our test case, rigidly linked with the lower arm, since they are both strapped to the crutch, which constitutes a single rigid body (lower arm, hand, and crutch). shows the skeletal data provided by each Kinect. In our implementation, nodes 19 and 15 collapse with nodes 20 and 16, respectively, and nodes 11 and 7 collapse with nodes 12 and 8, which resulted in a total of 16 nodes. This choice was motivated by the need to account for the most stable nodes, i.e., those measured with acceptable precision (5 cm, as stated by Reference [59]), and by the fact that the hand is, in our test case, rigidly linked with the lower arm, since they are both strapped to the crutch, which constitutes a single rigid body (lower arm, hand, and crutch).  The server unit implements the three following tasks: (i) skeleton synchronization, (ii) skeleton registration, and (iii) skeleton fusion. Skeleton synchronization and skeleton registration have been previously developed by some of the authors and are exhaustively described in Reference [60], while skeleton fusion has been developed specifically for the application of interest and is detailed in Appendix A.
Skeleton synchronization: when dealing with multi-Kinect systems, skeleton synchronization is mandatory to avoid possible misalignments among the skeletons captured by each device (client). Even if the acquisition frequency is very stable, macro relative time delays greater than the sampling period (∆t = 33 ms) are possible and result in noticeable misalignments among the captured skeletons, which, in turn, hinder their accurate registration in a common reference. To prevent this situation, in Reference [60], a dedicated synchronization software has been implemented on the multi-Kinect network, based on the Network Protocol Time (NTP) by Meinberg. From experimental tests, this manages to provide a temporal synchronization among the clients with an expanded uncertainty of ±5 ms (P = 95%).
Skeleton registration: as described in Reference [60], this is accomplished by a dedicated calibration framework, aimed at estimating the extrinsic parameters. These are the rotation matrix and the translation vector mapping the skeleton node positions from the coordinate system local to each Kinect to the arbitrarily chosen global measurement system. In Reference [60], the authors focused on the development of a simple, fast, and easily reproducible calibration procedure, based on the use of a colored ball as the calibration tool, which can be moved by hand in the Field of View (FoV). Unlike the checkerboards commonly used to calibrate multiple Kinects [42], the calibration tool is lightweight, easy to handle, and suitable for calibration in non-technical environments. The method is based on the acquisition of the RGB and depth images of the target by each local client and on the time synchronization and spatial matching of the 3D trajectories of the calibration tool. In their experiments, a standard uncertainty of the extrinsic parameters was assessed to be lower than 2 mm and 10 −2 radians, which is better than the uncertainty values attainable using state-of-the-art, bulky calibration checkerboards [60].
Skeleton fusion: this procedure was specifically designed for this work and is described in detail in Appendix A. It is carried out using a probabilistic filtering approach. This choice was motivated by the fact that the estimated node position varies discontinuously and is noisy due to occlusions, to the influence of non-frontal views, and to the uncertainty inherently associated to both the measured nodes and the calibration parameters. Among probabilistic filtering methods, the Kalman filter has been chosen [65][66][67]. An example of the fusion process is shown in Figure 3. In this case, green dots represent measurements y m n for m = [1, 2] and the nodes for the considered skeleton model at time k. Orange diamonds correspond to X n p at the time instant 0 and k, respectively. Orange lines are the node trajectories from time 0 to time k.

Hardware
The measurement set-up used to validate the ML multi-Kinect system is shown in the photograph of the gait lab taken before starting the trials (Figure 4). The four Kinects (red round frames) have been positioned following the geometry presented in Figure 1. The gait lab was equipped with the Smart-DX system for motion capture and with four P-6000 load platforms. The Smart-DX system is based on eight wall-mounted infrared video cameras (four cameras are visible in the photograph, framed in the blue rectangles) with a sampling frequency of 2 kHz, a resolution of 4 Mpx, and a standard accuracy of 0.1 mm for the static position for the whole walking range of 6 m. The P-6000 load platforms (framed in the purple rectangle) provide a 6 N overall expanded uncertainty (P = 99%) on each axis. They were split in two separate rows, hidden in the floor, to measure the left and right foot separately. Ground reaction forces at the feet and gait events were captured synchronously with the data set provided by the Smart-DX system. The crutches, framed by the green rectangles in the figure, were instrumented using a set of four 350 Ω strain gauges applied to the external surface of each crutch, in a full bridge configuration, to detect axial force. One Arduino Nano V3.0 board acquires force data as well as accelerations of each crutch thanks to an IMU (LSM9DS1 by STMicroelectronics, 1204 Genève, Switzerland, ±20 m/s 2 ± 4 rad/s ). In this application, we chose to acquire the voltage signal from the bridge with an ADC channel. Therefore, the resolution is 10 bit giving a voltage of about 4.8 mV. These values are applied frequently for similar applications. For example, in Reference [68], an Arduino board is used. The crutches used in the experimentation were improved with respect to those described in References [1,3]. They have a simplified electronic circuit permitting an easier and prolonged functioning and improved accuracy thanks to the compensation of bridge hysteresis and non-linearity.

Hardware
The measurement set-up used to validate the ML multi-Kinect system is shown in the photograph of the gait lab taken before starting the trials ( Figure 4). The four Kinects (red round frames) have been positioned following the geometry presented in Figure 1. The gait lab was equipped with the Smart-DX system for motion capture and with four P-6000 load platforms. The Smart-DX system is based on eight wall-mounted infrared video cameras (four cameras are visible in the photograph, framed in the blue rectangles) with a sampling frequency of 2 kHz, a resolution of 4 Mpx, and a standard accuracy of 0.1 mm for the static position for the whole walking range of 6 m. The P-6000 load platforms (framed in the purple rectangle) provide a 6 N overall expanded uncertainty (P = 99%) on each axis. They were split in two separate rows, hidden in the floor, to measure the left and right foot separately. Ground reaction forces at the feet and gait events were captured synchronously with the data set provided by the Smart-DX system. The crutches, framed by the green rectangles in the figure, were instrumented using a set of four 350 Ω strain gauges applied to the external surface of each crutch, in a full bridge configuration, to detect axial force. One Arduino Nano V3.0 board acquires force data as well as accelerations of each crutch thanks to an IMU (LSM9DS1 by STMicroelectronics, 1204 Genève, Switzerland, 20 / 4 / ). In this application, we chose to acquire the voltage signal from the bridge with an ADC channel. Therefore, the resolution is 10 bit giving a voltage of about 4.8 mV. These values are applied frequently for similar applications. For example, in Reference [68], an Arduino board is used. The crutches used in the experimentation were improved with respect to those described in References [1,3]. They have a simplified electronic circuit permitting an easier and prolonged functioning and improved accuracy thanks to the compensation of bridge hysteresis and non-linearity.  All data were transmitted in real-time using a Bluetooth module (ESD200) with a sampling frequency of 40 Hz and recorded using a custom-made software developed in LabVIEW and running on a Bluetooth-enabled personal computer. Each crutch was calibrated to its full range of 600 N, and evaluated by comparison with industrial precision load cells, which show an expanded uncertainty (1-h drift, hysteresis, non-linearity P = 99%) of 6 N, and a resolution of 1 N. The crutches and their validation are deeply described in Reference [11].
Before the measurement campaign, the calibration of both crutches was verified using an industrial load cell (Gefran TH) and applying a periodic loading. The same test was performed at the end of the test campaign to ensure that usage and transportation did not alter the static sensitivity of the system. The crutches' length is adjustable and was extended to adapt the crutches to the user's preference. Before each test, the crutches were lifted from the ground and any residual offset in the force reading was compensated.

Software
A multibody numerical simulator is used to assess the internal reaction from kinematics and dynamics using a biomechanical model. The biomechanical model has been implemented using the open source OpenSim software developed at the Stanford University [7,64].
The patient is modeled as a passive multibody system reproducing a simplified version of the skeletal portion of the human body. Following the guidelines provided in Reference [62], both muscular components and the soft tissues are neglected to minimize the number of the model parameters. The model chosen was detailed in Reference [5]. As shown in Figure 5, 12 joints (four cylindrical and eight spherical) and 13 rigid bodies are considered (the crutches are rigidly connected to the forearms), which results in 31 rotational degrees of freedom. Inertial properties, mass, and dimension of each segment are based on the anthropometric data taken from Reference [69]. In addition, the length of each body segment is corrected using the kinematic data provided by the vision system (either marker-based or marker-less) [70].
All data were transmitted in real-time using a Bluetooth module (ESD200) with a sampling frequency of 40 Hz and recorded using a custom-made software developed in LabVIEW and running on a Bluetooth-enabled personal computer. Each crutch was calibrated to its full range of 600 N, and evaluated by comparison with industrial precision load cells, which show an expanded uncertainty (1-h drift, hysteresis, non-linearity P = 99%) of 6 N, and a resolution of 1 N. The crutches and their validation are deeply described in Reference [11].
Before the measurement campaign, the calibration of both crutches was verified using an industrial load cell (Gefran TH) and applying a periodic loading. The same test was performed at the end of the test campaign to ensure that usage and transportation did not alter the static sensitivity of the system. The crutches' length is adjustable and was extended to adapt the crutches to the user's preference. Before each test, the crutches were lifted from the ground and any residual offset in the force reading was compensated.

Software
A multibody numerical simulator is used to assess the internal reaction from kinematics and dynamics using a biomechanical model. The biomechanical model has been implemented using the open source OpenSim software developed at the Stanford University [7,64].
The patient is modeled as a passive multibody system reproducing a simplified version of the skeletal portion of the human body. Following the guidelines provided in Reference [62], both muscular components and the soft tissues are neglected to minimize the number of the model parameters. The model chosen was detailed in Reference [5]. As shown in Figure 5, 12 joints (four cylindrical and eight spherical) and 13 rigid bodies are considered (the crutches are rigidly connected to the forearms), which results in 31 rotational degrees of freedom. Inertial properties, mass, and dimension of each segment are based on the anthropometric data taken from Reference [69]. In addition, the length of each body segment is corrected using the kinematic data provided by the vision system (either marker-based or marker-less) [70].  The influence of the exoskeleton segments and components on the corresponding body segment has been accounted for by adding the following masses: 5 kg to each lower leg and 5 kg to each upper leg as well as 3 kg to the torso section (motor and batteries). The influence of the crutches has been modeled by adding 1.5 kg to each forearm. In addition, the center of mass and the inertial properties of the forearms have been computed by simulating numerically the system formed by the forearm, the hand, and the crutch using suitable computer assisted design software (Solidworks, by Dassault Systèmes, Waltham, MA, USA). Upper and lower leg centers of mass have not been corrected. The model formulation clearly hinders from evaluating the interactions between the patient and the exoskeleton such as those induced by muscle contractions, muscle stiffness, and spasticity as well as the influence of the relative movements between the body and the exoskeleton segments. However, this formulation is very simple, and represents a reasonable compromise between computing efficiency and data accuracy.

Protocol
The experimentation has been performed on a single user of a P5 Rewalk exoskeleton, who was considered an expert user, with more than 10 h per week for at least nine months of domestic usage. In Figure 4, the patient is visible at the start position of the test path. The subject was a 58-year-old male (68 kg, 1.78 m), right-hand dominant, with a complete spinal cord injury at level T12. The preparation of the subject was necessary in view of the acquisition of the kinematic data from the marker-based motion capture.
Twenty-five retro-reflective spherical markers (20 mm in diameter) were applied on both the patient and the exoskeleton by following a Davis protocol [71], which have been suitably modified to account for the exoskeleton [72,73]. Hence, this obtains a custom protocol purposely designed for this case. These are the markers represented by red dots in Figure 6. With respect to the standard Davis protocol, markers RK, RP, LK, and LK were positioned about 10 cm laterally with respect to their original position. In addition, the sacral marker was replaced by three markers in triangular configuration placed on the exoskeleton's battery pack. Moreover, to get a complete assessment of each degree of freedom of the model, five markers (yellow dots in the figure) were added in correspondence to the subject forehead, 2 cm higher than the nose (FH) on each elbow (RE, LE) and on the upper arm at the midpoint between the elbow and shoulder (RA, LA). To account for the crutches, six further markers (blue dots) were positioned on the crutches tip (RTIP, LTIP), on the handle end (RHE, LHE), and at the intersection between the handle and the crutch (RHI, LHI).
The influence of the exoskeleton segments and components on the corresponding body segment has been accounted for by adding the following masses: 5 kg to each lower leg and 5 kg to each upper leg as well as 3 kg to the torso section (motor and batteries). The influence of the crutches has been modeled by adding 1.5 kg to each forearm. In addition, the center of mass and the inertial properties of the forearms have been computed by simulating numerically the system formed by the forearm, the hand, and the crutch using suitable computer assisted design software (Solidworks, by Dassault Systèmes, Waltham, MA, USA). Upper and lower leg centers of mass have not been corrected. The model formulation clearly hinders from evaluating the interactions between the patient and the exoskeleton such as those induced by muscle contractions, muscle stiffness, and spasticity as well as the influence of the relative movements between the body and the exoskeleton segments. However, this formulation is very simple, and represents a reasonable compromise between computing efficiency and data accuracy.

Protocol
The experimentation has been performed on a single user of a P5 Rewalk exoskeleton, who was considered an expert user, with more than 10 h per week for at least nine months of domestic usage. In Figure 4, the patient is visible at the start position of the test path. The subject was a 58-year-old male (68 kg, 1.78 m), right-hand dominant, with a complete spinal cord injury at level T12. The preparation of the subject was necessary in view of the acquisition of the kinematic data from the marker-based motion capture.
Twenty-five retro-reflective spherical markers (20 mm in diameter) were applied on both the patient and the exoskeleton by following a Davis protocol [71], which have been suitably modified to account for the exoskeleton [72,73]. Hence, this obtains a custom protocol purposely designed for this case. These are the markers represented by red dots in Figure 6. With respect to the standard Davis protocol, markers RK, RP, LK, and LK were positioned about 10 cm laterally with respect to their original position. In addition, the sacral marker was replaced by three markers in triangular configuration placed on the exoskeleton's battery pack. Moreover, to get a complete assessment of each degree of freedom of the model, five markers (yellow dots in the figure) were added in correspondence to the subject forehead, 2 cm higher than the nose (FH) on each elbow (RE, LE) and on the upper arm at the midpoint between the elbow and shoulder (RA, LA). To account for the crutches, six further markers (blue dots) were positioned on the crutches tip (RTIP, LTIP), on the handle end (RHE, LHE), and at the intersection between the handle and the crutch (RHI, LHI). The subject received no instruction on his movement and was free of walking along the path on his own accord. He reported no difficulty using the device and showed no sign of fatigue during or after the tests. A total of 20 walking tests were performed in both directions. During the tests, the ML multi-Kinect system, the Smart-DX MB system, the load platforms, and the crutches simultaneously acquired the respective data. The skeleton fusion was carried out by following the procedures in Sensors 2020, 20, 3899 9 of 25 Appendix A and the tracking of the physical markers was carried out by following the procedures embedded in the BTS device. The following four data sets were acquired in parallel.

1.
The vector x n P of the trajectories of the N nodes representing the fused skeleton of the subject, provided by our ML multi-Kinect system, 2.
the vector x P−Markers of the trajectories calculated by the BTS Smart-DX vision system of the Physical Markers shown in Figure 6; 3.
the GF RF data set of the measurements provided by the force platforms of both the forces exchanged between the subject feet, the ground, and the gait phases, and 4.
the GC RF data set of the measurements from the crutches of the forces exchanged between the upper limbs of the subject and the ground. In addition to the forces, the impact times of the crutches with the ground are acquired.

Validation Methodology
The methodology used to assess the ability of the ML multi-Kinect motion capture system to estimate the internal forces acting at the patient upper limbs is based on the following steps.

•
Kinematic data set alignment: since the ML multi-Kinect system and the MB BTS System are not synchronized to each other, the skeleton trajectories of vector x n P must be aligned in time with the trajectories of vector x P−Markers . This task is carried out by the procedure presented in Section 4.1. • Mapping skeleton nodes to virtual markers: the skeletal data x n P provide the trajectories of the joints of the body-exoskeleton combination while the BTS data x P−Markers refer to the trajectories of the physical markers placed on the segments of the body-exoskeleton combination. This last set is the one expressed in the correct form and then elaborated on by the biomechanical model. Starting from the joints trajectories, it is mandatory to calculate the trajectories of new points (thereafter called virtual markers), placed where physical markers would be. A suitable procedure presented in Section 4.2. has been designed to perform this task. The output is represented by the data set denoted by x V−Markers .
• Inverse Dynamic Analysis: the core of the process is to solve the set of dynamic equilibrium equations that represent the biomechanical model of the patient body. This model performs the inverse kinematics and the inverse dynamic analysis of the gait, which provides the indirect estimate of the subject internal forces at the upper limbs. The analysis based on the biomechanical model is run twice. The first run is carried out using as input data sets x P−Markers , GF RF , and GG RF . The output is used as the reference measurement of the internal forces at the upper limbs. The second run uses as inputs data sets x V−Markers , GF RF , and GC RF . The output of this run represents the estimate of the internal reaction forces at the upper limbs measured by the ML system.

•
Validation of the M-K marker-less vision system: it is a comparison of those estimated with the reference upper limbs' internal reaction forces. The quantities evaluated for this comparison were the longitudinal, lateral, and vertical components of the internal joint reactions in the right and left shoulders and elbows for a total of 12 forces. To provide an overview of the validation, the root mean square of the difference between reference and estimated values of these 12 forces was computed.

Kinematic Dataset Alignment
The alignment of data set x n P to data set x P−Markers has been achieved by measuring the following.

•
The distance between the right and the left knee: D1 Kinect and D1 BTS for the M-K marker-less system and the BTS system respectively, • the distance between the right knee and its contralateral elbow: D2 Kinect and D2 BTS , • the distance D3 between the left knee and its contralateral elbow: D3 Kinect and D3 BTS .
These distances have been derived using the position of the corresponding nodes of the fused skeleton and the position of the corresponding physical markers of the BTS model (Table 1). The idea is that the patient motion represents an invariant during the walk. Hence, the evolution of the corresponding distances measured by the two systems shows the same periodicity in time. This was used to align the M-K marker-less system to the reference BTS. The correlation function between values was computed, and the delay corresponding to its maximum value was used as a time offset between the systems.
A decimation was then used to sub-sample all collected data to 30 Hz. As an example, the behavior of D1, D2, and D3 measured using the x nodes and the x P−Markers data sets, respectively, displayed as a function of time is shown in Figure 7. The periodicity is evident especially for the plots of values D1.  , LK, RE The idea is that the patient motion represents an invariant during the walk. Hence, the evolution of the corresponding distances measured by the two systems shows the same periodicity in time. This was used to align the M-K marker-less system to the reference BTS. The correlation function between values was computed, and the delay corresponding to its maximum value was used as a time offset between the systems.
A decimation was then used to sub-sample all collected data to 30 Hz. As an example, the behavior of 1, 2, and 3 measured using the and the data sets, respectively, displayed as a function of time is shown in Figure 7. The periodicity is evident especially for the plots of values D1.  Table 1 as a function of time, after alignment using correlation. Dots indicate the reference (BTS) system, which lines the M-Kinect system.  Table 1 as a function of time, after alignment using correlation. Dots indicate the reference (BTS) system, which lines the M-Kinect system.

Mapping Nodes to Virtual Markers
This procedure is based on two steps: the first is to obtain body segments (head, limbs, torso, pelvis, and feet) from skeleton nodes. The second aims at estimating the position of virtual markers placed on each segment.

Kinematic Model Conversion
The conversion of the skeleton nodes in body segments of the reference model is performed by defining a local reference system for each segment, as follows.
• Pelvis: the pelvis structure is built starting from the hip nodes x 18 p , x 13 p . The vector V hips that links nodes x 18 p , x 13 p is set as theŶ principal direction for the local reference system. Its midpoint MP is used as a base to compute the spinal vector V spine by using x 2 p as a reference endpoint node located in the middle of the torso. V spine is used as the temporaryẐ * principal direction, (temporary because V hips and V spine are usually not orthogonal one to the other). Vectors V hips and V spine are normalized (V hips andV spine ) and used to compute theX vector, which points to the frontal direction. Lastly, the cross product ofX andŶ provides the orthogonal vectorẐ, which completes the base for the pelvis reference system H pelvis . The origin of the local reference system is defined by moving the hips midpoint MP along the vectorV spine of a fixed quantity δ spine . Equation (1) reports the full notation, and Figure 8 shows a schematic representation of the geometry.
• Torso: for the torso, similarly to the pelvis, the initial reference vector is computed from the nodes of the shoulders. The spinal vector computed for the pelvis is used in a vertical direction. The same normalization procedure is applied. • Limbs: the reference systems are settled in correspondence with the nodes, from the inner to the outer part of the body. For the upper arm is the shoulder P shoulder , while, for the lower arm, the elbow P elbow is used. The wrist P wrist and the heel represent the endpoints of the respective limbs.X is aligned with the joint whileŶ is computed as the normal vector of the plane defined by the motion of the considered limbs frame-by-frame.Ẑ is derived from the cross product of the previous two. Equation (2) reports the calculation for one of the upper limbs.
• Foot: the foot is defined starting from the definition of the ankle H ankle . The same orientation is applied to two virtual points.
• Crutch: since the patient's forearms are strapped to the crutches, each forearm-crutch couple is considered a single rigid body, including the hand. Two virtual points are defined from the reference system of the wrist H wrist to model the crutch. Given its geometry, a point is defined for the handle and one is defined for the endpoint.
All the numeric values δ spine , α f oot , β f oot , and γ f oot were defined by considering the anthropometric tables [69] scaled to the subject [70].

Virtual Markers Definition
The reference system (by BTS) records the trajectories of the markers. To make sure that both systems were handled by following the same numerical process while simulating the joint reaction, virtual markers were created for the kinetic data measured using the Kinect system. Each segment in the model is considered rigid and each marker trajectory x v−marker could be obtained by multiplying the relative position of the marker in the body segment P v−marker , which is constant in time with the reference system of the body segment H segment , which is time-dependent, as shown in Equation (5).
The relative positions P v−marker of the markers with respect to their parent bodies are input parameters for the OpenSim model. An incorrect definition of these parameters would lead to an inaccurate kinematic analysis [61], which, in turn, would lead to an incorrect comparison between the two systems. To avoid this, the relative position P v−marker of the marker with respect to the reference system was assessed for both systems, starting from the marker trajectories recorded by the reference marker-based system, x p−marker , and the bodies' reference systems measured by the marker-less motion capture system, H segment , as shown in Equation (6).
For each test, an optimization system (genetic algorithm 'ga' in Matlab) solved Equation (6), which minimized the distance e marker = x p−marker − x v−marker between the trajectories of the Kinect virtual marker and its associated BTS marker. To consider the relative rotation between the two systems references, a global roto-translation matrix was included in the guesses on the numerical optimization. Using the transformation matrices, the delay and the virtual markers' trajectories, it was possible to represent the skeleton from the marker-less system in the same reference system (BTS-based), which mimicks marker-based measurement results. Figure 9 shows the graphical definition of the virtual markers' positions.

Inverse Dynamic Analysis
After having scaled the mechanical model to the patient characteristics and to the exoskeleton mass, as reported in Section 3.2, the OpenSim numerical simulator was used to perform two sequential analysis.

1.
the inverse kinematic analysis, which provided the degrees of freedom (DoF) value of the model in time, starting from the markers (either virtual or physical) positions, 2.
the inverse dynamic analysis, to compute the motor torques applied to joints, and the internal joint reaction forces, starting from the kinematic results and the external forces acting on the body (GC RF data and GF RF data from the instrumented crutches and the force platforms, respectively).
based), which mimicks marker-based measurement results. Figure 9 shows the graphical definition of the virtual markers' positions.

Inverse Dynamic Analysis
After having scaled the mechanical model to the patient characteristics and to the exoskeleton mass, as reported in Section 3.2, the OpenSim numerical simulator was used to perform two sequential analysis. 1. the inverse kinematic analysis, which provided the degrees of freedom (DoF) value of the model in time, starting from the markers (either virtual or physical) positions, 2. the inverse dynamic analysis, to compute the motor torques applied to joints, and the internal joint reaction forces, starting from the kinematic results and the external forces acting on the body (GCRF data and GFRF data from the instrumented crutches and the force platforms, respectively). The result of those analyses, which were used for validating the system due to their clinical significance, are the joints' internal reaction forces related to the upper limbs, listed in Table 2. While for kinematic analysis, the whole 6-m walk was used. Only the two central steps of each walk were used for the dynamic analysis since the force platforms could accommodate only two steps (see Figure 1 and Figure 4).
To provide data easily comparable with medical literature, two different normalizations were performed, following standard clinical practice [74].
• Gait phase normalization: ground reaction forces were used to detect the gait events (heel contact, toe off), and time was scaled to be 0% at the first right heel contact and 100% at the second right heel contact on the force platform. • Body weight normalization: results in term of force were divided by the weight of the system composed by the subject, the exoskeleton, and the crutches.
All forces were then decomposed along the three directions of the global reference system: Figure 9. Definition of the virtual markers' position based on the real marker relative position and the local reference system of the body measured using the multi-Kinect system.
The result of those analyses, which were used for validating the system due to their clinical significance, are the joints' internal reaction forces related to the upper limbs, listed in Table 2. While for kinematic analysis, the whole 6-m walk was used. Only the two central steps of each walk were used for the dynamic analysis since the force platforms could accommodate only two steps (see Figures 1 and 4).
To provide data easily comparable with medical literature, two different normalizations were performed, following standard clinical practice [74].
• Gait phase normalization: ground reaction forces were used to detect the gait events (heel contact, toe off), and time was scaled to be 0% at the first right heel contact and 100% at the second right heel contact on the force platform. • Body weight normalization: results in term of force were divided by the weight of the system composed by the subject, the exoskeleton, and the crutches.
All forces were then decomposed along the three directions of the global reference system: • Vertical: axis normal to the gait lab floor, directed upwards, • Longitudinal: axis normal to the vertical one and along the walking direction of the corridor, • Lateral: axis normal to the vertical and longitudinal axis, directed towards the right side. Right Elbow Vertical Force acting on the right upper arm given by the right lower arm, along the vertical axis of the ground reference.

JR 8 Right Elbow Longitudinal
Force acting on the right upper arm given by the right lower arm, along the longitudinal axis of the ground reference.

JR 9
Right Elbow Lateral Force acting on the right upper arm given by the right lower arm, along the lateral axis of the ground reference.

Results
The metric used to validate the system is the root mean squared value of the error (RMSE j ) for each Joint Reaction JR j . In this case, the error is the difference between the Joint Reactions computed starting from the M-Kinect kinematics, JR M−kinect j , and those assessed starting from the BTS kinematics, JR re f erence j . The RMSE j was computed for each joint reaction j, adding up all T n samples of all tests N, following Equation (7). Table 3 summarizes the results of all 20 walking tests performed. As can be noticed, the difference between the joint reactions computed using the M-Kinect system and those computed using the BTS reference system is about 1% and the maximum value is 1.5% in case of the vertical right elbow. Table 3. Comparison between inverse dynamic results. All tests were used. Results are expressed as a percentage of the total weight of subjects and exoskeletons. To ease the comparison, the average value of each joint reaction among all tests at the same gait phase was computed for both the reference BTS system and the M-Kinect system. Results are shown in Figures 10-15. To provide a reference of the variability of the forces between different tests, the standard deviation of the assessments among different tests at the same gait phase is shown as a shaded area.

Elbow
Longitudinal 0.8% 0.5% Vertical 1.5% 1.0% Lateral 0.4% 0.4% To ease the comparison, the average value of each joint reaction among all tests at the same gait phase was computed for both the reference BTS system and the M-Kinect system. Results are shown in Figure 10-15. To provide a reference of the variability of the forces between different tests, the standard deviation of the assessments among different tests at the same gait phase is shown as a shaded area.

Discussion
The internal joint reactions, of paramount interest to the therapist providing training, were computed using the M-Kinect data and showed very good correspondence with the ones computed using the reference BTS system data, as shown in Table 3, with a root mean square error closer to the variability of the movement. The proposed data fusion structure, used to create the kinematic data from multiple Kinect units, achieved the desired performance.
However, a minor issue was highlighted from the field test, which was a non-negligible initialization of the fusion process. As can be seen by looking at Figure (right), the estimate from the M-Kinect system is clearly different from the estimate from the reference system in the early phases of gait (0-20%). At the same time, in those phases, the M-Kinect estimate shows high variability. Since this is also the initial part of the recorded kinematic, the issue could be linked with the Kalman filter initialization. The filter initialization takes around 1 s. This time is required to enable the damping of the initial oscillations resulting from the expansion of the kinematic model from a collapsed set of nodes to the skeleton structure. This time can be reduced by setting more constrained values in the

Discussion
The internal joint reactions, of paramount interest to the therapist providing training, were computed using the M-Kinect data and showed very good correspondence with the ones computed using the reference BTS system data, as shown in Table 3, with a root mean square error closer to the variability of the movement. The proposed data fusion structure, used to create the kinematic data from multiple Kinect units, achieved the desired performance.
However, a minor issue was highlighted from the field test, which was a non-negligible initialization of the fusion process. As can be seen by looking at Figure 13 (right), the estimate from the M-Kinect system is clearly different from the estimate from the reference system in the early phases of gait (0-20%). At the same time, in those phases, the M-Kinect estimate shows high variability.
Since this is also the initial part of the recorded kinematic, the issue could be linked with the Kalman filter initialization. The filter initialization takes around 1 s. This time is required to enable the damping of the initial oscillations resulting from the expansion of the kinematic model from a collapsed set of nodes to the skeleton structure. This time can be reduced by setting more constrained values in the initialization of the filter, at the cost of a more unstable response and possibly no convergence of the initialization. Alternatively, the inclusion of more Kinects would provide a wider measurement area, and, thus, the earlier initialization of the model (before the execution of the target gait). This second option implies higher costs, while the hardware and software complexity remain the same given the modular design of the proposed system. A more practical workaround could be a different positioning of the Kinects, which is off-centered with respect to the force platforms, to allow for the filter to settle before the first contact with the force platform, and having the user walk in one direction only.
Another issue could be related to the mechanical model chosen. Different biomechanical models, especially those based on muscular forces, could lead to different solutions. Due to this fact, these results should not be taken as a global reference for joint reaction assessment with other models, but as a tool for relative assessments, e.g., between a pre-training and a post-training behavior. In this case, the simplified rigid-links model was chosen explicitly to allow a robust comparison, which replaced muscular forces with simplified torques, between the kinematic systems.
A limitation of the current study is the involvement of a single expert user as the tester. The system has not been tested with children or non-collaborative users of exoskeletons. The system will be further tested with the same approach involving more users. Many variables could affect walking patterns such as age, gender, size, lesion level, exoskeleton model, and configuration. To provide a comparison focused mainly on the system, this work was limited to an expert user.

Conclusions
Learning how to use an exoskeleton is not an easy task and teaching a patient how to use it is no less difficult, especially since no quantitative feedback is normally provided. The marker-less motion capture system proposed, in conjunction with instrumented crutches and a simplified mechanical model, provides a low-cost and easy to set-up measuring system, which requires only force platforms on the ground and no instrumentation on the patient himself. This is critical for the application since marker placement on a person not able to stand without robotic support is difficult and time-consuming. The system requires no devices on the exoskeleton or the patient, which simplifies the set-up phase and makes its usage in training, even by non-experts, feasible. The dynamic results show a good reliability of the joints' internal reaction forces with RMS error values compatible with clinical practice in gait lab assessments.
The total cost of the hardware components of the marker-less motion capture is about 3000 €, which makes the solution affordable in most clinical settings. The instrumented crutches are not commercially available yet even though the hardware is based on off-the-shelf and low-cost components. It is foreseeable that the cost of a pair will be limited to less than 3000 € as well.
Furthermore, future improvements will foresee the reorganization of the system. As for the device considered, Microsoft recently withdrew Kinect from the market. However, the motion capture system in this case presented can be based on almost all the RGB-D sensors such as the Intel RealSense D435. RGB frames can be processed by different skeletonization algorithms. One of the most popular algorithms is Open Pose and can be considered as the state-of-the-art approach for estimating a real-time human pose. Furthermore, a future alternative 3D camera can be represented by the Kinect Azure. Kinect Azure is an innovative and promising development kit equipped with the most advanced sensors of artificial intelligence, which is going to be soon released by Microsoft. A further element that such a change would provide is a sharp separation between the device and the skeletonization processing now performed in the same hardware (closed APIs). No relevant changes are expected in the organization and processing flow. It is worth noting that the minimum number of cameras is four because, to properly obtain the subject skeleton during its walk, it is necessary to register it both from behind and from the front (see Figure 1).
A limitation of the proposed system is the dependence on a set of force platforms, which limits the portability of the solution and increases its cost. A solution, to be investigated in future improvements of the system, may include a mechanical model able to assess the ground reaction forces using the kinematic data integrated with instrumented insoles or with the patient weight data. The accuracy of such a system should be investigated, especially concerning its ability to detect the gait phases [53].  x k = Ax k−1 + w k (A1) In Equation (A1), the state transition matrix is A = Diag{A 1 , · · · , A 1 } ∈ R 9N×9N , where: Vector x k = X 1 , · · · , X N T ∈ R 9N is defined as the 3D joint position, velocity, and acceleration at discrete time k, and ∆t is the time between two consecutive sets of data collected from the Kinects. Parameter w k is a random variable representing the process noise. The measurement model for the multi-node tracking is defined below.
In Equation (A3), vector y k = Y 1 , · · · , Y M T ∈ R 3NM represents the measurements from the M Kinects at time k. Hence, Y m = y m 1 , · · · , y m n , · · · y m N T is the vector of the N nodes collected from the mth Kinect. Each element y m n ∈ R 3 is a vector representing the coordinate along x, y, and z of the nth node measured by the mth Kinect at time k. Matrix C ∈ R 3NM×9N is the measurement matrix, Sensors 2020, 20, 3899 20 of 25 dynamically built at each kth iteration filling a null matrix with a I 3×3 matrix in correspondence of the considered node along columns, and the considered Kinect along the rows.
where C m ∈ R 3N×9N is as follows.
In addition, C m n ∈ R 3×9N is a null matrix with the identity matrix I 3×3 positioned at column 9n + 1. For example, if we consider N = 2 nodes and M = 3 Kinects, the model parameters assume the following form.
Parameter v k in Equation (A3) is a random variable representing the measurement noise. The process noise and the measurement noise are assumed to be independent of each other with normal probability distributions characterized by the process noise covariance Sw ∈ R 9N×9N and measurement noise covariance Sv ∈ R 3NM×3NM , respectively. The Kalman filter estimates the process state at time k by means of time update equations and obtains feedback in the form of measurements by means of the measurement update equations. In our case, the time update equations assume the following form.
In Equation (A9), K k is the Kalman gain, a matrix R 9N×3NM , acting as the blending factor that minimizes the a posteriori error covariance P k . Matrix Sv = Diag Sv 1 , · · · , Sv m , · · · , Sv M with sub-matrix Sv m = Diag Sv m 1 , · · · , Sv m n , · · · , Sv m N , and Sv m n = I 3×3 ·σ 2 Y ∈ R 3×3 , expresses the standard uncertainty associated to each y m n measurement. The filter is initialized as follows:x 0 = C T ·y 0 where y 0 is the first set of acquired nodes. Matrix Sw is initialized using Sw i = 0.05 2 (m/s 2 ) 2 and the matrix P is initially set equal to Sw. The matrix Sv is initialized using σ 2 Y = 0.005 2 m 2 . At each kth step,x − k and P − k are estimated. The Kalman gain is calculated and used to estimate the current statex k and its associated error covariance P k .
The measurement noise covariance matrix Sv is updated at every time instant k using a dedicated procedure to minimize the influence of outlier nodes. Figure A1 explains the situation for n = n * and M = 4. In this case, y 4 n * is the outlier node, which decreases the influence of its measurement. We performed the following steps.

1.
We calculate the median value among the four measurements, y n * = median y 1 n * , y m n * , y M n * ; 2.
For each of the m nodes, the distance d m n * from y n * is computed, d m n * = y m n * − y n * ; 3.
The matrix Sv is updated by considering a modified value for σ 2 Y in each node by applying the notation reported in Equation (A10). In this case, Th represents a threshold value set to 1 cm. A d m n * value higher than Th causes a linear amplification of the measurement uncertainty for the mth node, and, thus, its lower influence in the Kalman filter update. The measurement noise covariance matrix is updated at every time instant k using a dedicated procedure to minimize the influence of outlier nodes. Figure A1 explains the situation for n = n * and M = 4. In this case, * is the outlier node, which decreases the influence of its measurement. We performed the following steps. 1. We calculate the median value among the four measurements, * = * , * , * ; 2. For each of the m nodes, the distance * from * is computed, * = * − * ; 3. The matrix is updated by considering a modified value for in each node by applying the notation reported in Equation (A10).
. * = * * * (A10) In this case, represents a threshold value set to 1 cm. A * value higher than causes a linear amplification of the measurement uncertainty for the m th node, and, thus, its lower influence in the Kalman filter update.

Measurements
undergo a suitably developed pre-processing called Orientation Testing before being used in Equation (A9). The Orientation Testing has been developed to consider the fact that some Kinects will see the subject from behind, and that no prior information is available to define which devices or subjects will be affected by this. Therefore, some measurements undergo a Figure A1. Example of the update procedure for Sv with n = n* and M = 4. Th indicates the threshold.