A Walking-in-Place Method for Virtual Reality Using Position and Orientation Tracking

People are interested in traveling in an infinite virtual environment, but no standard navigation method exists yet in Virtual Reality (VR). The Walking-In-Place (WIP) technique is a navigation method that simulates movement to enable immersive travel with less simulator sickness in VR. However, attaching the sensor to the body is troublesome. A previously introduced method that performed WIP using an Inertial Measurement Unit (IMU) helped address this problem. That method does not require placement of additional sensors on the body. That study proved, through evaluation, the acceptable performance of WIP. However, this method has limitations, including a high step-recognition rate when the user does various body motions within the tracking area. Previous works also did not evaluate WIP step recognition accuracy. In this paper, we propose a novel WIP method using position and orientation tracking, which are provided in the most PC-based VR HMDs. Our method also does not require additional sensors on the body and is more stable than the IMU-based method for non-WIP motions. We evaluated our method with nine subjects and found that the WIP step accuracy was 99.32% regardless of head tilt, and the error rate was 0% for squat motion, which is a motion prone to error. We distinguish jog-in-place as “intentional motion” and others as “unintentional motion”. This shows that our method correctly recognizes only jog-in-place. We also apply the saw-tooth function virtual velocity to our method in a mathematical way. Natural navigation is possible when the virtual velocity approach is applied to the WIP method. Our method is useful for various applications which requires jogging.


Introduction
Virtual Reality (VR) has gained popularity, providing immersive experiences beyond those of three-dimensional (3D) desktop-based games. Additionally, smartphone-based head-mounted displays (HMDs), such as USC ICT/MRX laboratory's FOV2GO introduced in 2012 [1], Google Cardboard introduced in 2014 [2], and Samsung Gear VR introduced in 2015 [3], have been rapidly adopted. These HMDs allow the user to experience VR anytime and anywhere by attaching a smartphone to the adapter. The limited touch input to the smartphone-based HMD itself can be replaced by a highly usable motion controller that works with Bluetooth. Also, new systems have wider the field-of-view (FOV) and improves the resolution of the smartphone display, thus providing a good visual impression. This progress is useful when playing simple VR games or watching 360-degree videos. However, the performance limitations of smartphone-based VR HMDs are obvious. Improving rendering quality lowers the refresh rate, which does not create the best experience for the user. For this reason, PC-based system recognizes the foot motion and step pace and length by measuring the friction of the feet as the user moves [27][28][29][30]. Unfortunately, the hardware is so bulky and expensive that it is hard to commercialize.
The WIP technique is a method that imitates walking [11], providing a higher presence than existing controllers and less simulator sickness [9,13]. The WIP technique can be used in two ways: a march-in-place method in which the HMD is not shaken [31], and a jog-in-place method in which the HMD is shaken [32]. The march-in-place method stably recognizes steps with additional sensors on the heels [14], ankles [33], shins [34], or ground [35,36]. The jog-in-place method moves the HMD with large motions, instead of having an additional sensor [9,10,16,37]. Methods of attaching a magnetic or a beacon sensor with external trackers to the user's knees [14,34] or attaching smartphones to the ankles [33] have also been introduced, but these methods are too cumbersome to use practically. A method was developed that uses a neural net that takes the head-tracker height signal as its input [6]. The author pointed out latency as the disadvantage of this method, but the latency might be improved by deep learning [38]. Methods for recognizing WIP steps in VR include using a floor pad [36] or Wii board [35]. The walking pad and Wii board require additional equipment and have the disadvantage of restricting movement to the specific use of the hardware. Because the IMU can be attached to the body to track its position [39][40][41] or to recognize posture [42], a method of recognizing the WIP steps using a built-in IMU in the HMD was shown to be useful [9,10,15,16]. However, the accuracy was not proven, and unintended steps occur even when performing motions other than WIP. If navigation is started when WIP is not started, this will lead to nausea due to information mismatch between the vestibular and visual sensory organs [7,43] and the user may collide with virtual objects or walls.

Methods
The goal was to navigate in a virtual environment via WIP, while not actually moving forward. We used a position and orientation tracking output to achieve this goal. We obtained the HMD's x, y, and z axial positions and rotation from the external tracker. These variables were X pos (m), Y pos (m), Z pos (m), X rot ( • ), Y rot ( • ), and Z rot ( • ), respectively ( Figure 1). We usually used Y pos to represent the position above the ground and X rot to represent the head pitch. measuring the friction of the feet as the user moves [27][28][29][30]. Unfortunately, the hardware is so bulky and expensive that it is hard to commercialize. The WIP technique is a method that imitates walking [11], providing a higher presence than existing controllers and less simulator sickness [9,13]. The WIP technique can be used in two ways: a march-in-place method in which the HMD is not shaken [31], and a jog-in-place method in which the HMD is shaken [32]. The march-in-place method stably recognizes steps with additional sensors on the heels [14], ankles [33], shins [34], or ground [35,36]. The jog-in-place method moves the HMD with large motions, instead of having an additional sensor [9,10,16,37]. Methods of attaching a magnetic or a beacon sensor with external trackers to the user's knees [14,34] or attaching smartphones to the ankles [33] have also been introduced, but these methods are too cumbersome to use practically. A method was developed that uses a neural net that takes the head-tracker height signal as its input [6]. The author pointed out latency as the disadvantage of this method, but the latency might be improved by deep learning [38]. Methods for recognizing WIP steps in VR include using a floor pad [36] or Wii board [35]. The walking pad and Wii board require additional equipment and have the disadvantage of restricting movement to the specific use of the hardware. Because the IMU can be attached to the body to track its position [39][40][41] or to recognize posture [42], a method of recognizing the WIP steps using a built-in IMU in the HMD was shown to be useful [9,10,15,16]. However, the accuracy was not proven, and unintended steps occur even when performing motions other than WIP. If navigation is started when WIP is not started, this will lead to nausea due to information mismatch between the vestibular and visual sensory organs [7,43] and the user may collide with virtual objects or walls.

Methods
The goal was to navigate in a virtual environment via WIP, while not actually moving forward. We used a position and orientation tracking output to achieve this goal. We obtained the HMD's x, y, and z axial positions and rotation from the external tracker. These variables were (m),  We demonstrate how to recognize WIP steps in two phases ( Figure 2). In the calibration phase (Section 3.1), we estimate the central axis of the quasi-sinusoidal trace of log of the , and a range that covers the trace is set. The central axis, which depends on the user's eye level height, allows our method to be used by people of various heights. The range is used to ignore the input value of the non-WIP motions. In the recognition phase (Section 3.2), we show how to recognize steps and determine virtual initial velocity and virtual velocity based on the user-adjusted WIP recognition range.  We demonstrate how to recognize WIP steps in two phases ( Figure 2). In the calibration phase (Section 3.1), we estimate the central axis of the quasi-sinusoidal trace of log of the Y pos , and a range that covers the trace is set. The central axis, which depends on the user's eye level height, allows our method to be used by people of various heights. The range is used to ignore the input value of the non-WIP motions. In the recognition phase (Section 3.2), we show how to recognize steps and determine virtual initial velocity and virtual velocity based on the user-adjusted WIP recognition range. measuring the friction of the feet as the user moves [27][28][29][30]. Unfortunately, the hardware is so bulky and expensive that it is hard to commercialize. The WIP technique is a method that imitates walking [11], providing a higher presence than existing controllers and less simulator sickness [9,13]. The WIP technique can be used in two ways: a march-in-place method in which the HMD is not shaken [31], and a jog-in-place method in which the HMD is shaken [32]. The march-in-place method stably recognizes steps with additional sensors on the heels [14], ankles [33], shins [34], or ground [35,36]. The jog-in-place method moves the HMD with large motions, instead of having an additional sensor [9,10,16,37]. Methods of attaching a magnetic or a beacon sensor with external trackers to the user's knees [14,34] or attaching smartphones to the ankles [33] have also been introduced, but these methods are too cumbersome to use practically. A method was developed that uses a neural net that takes the head-tracker height signal as its input [6]. The author pointed out latency as the disadvantage of this method, but the latency might be improved by deep learning [38]. Methods for recognizing WIP steps in VR include using a floor pad [36] or Wii board [35]. The walking pad and Wii board require additional equipment and have the disadvantage of restricting movement to the specific use of the hardware. Because the IMU can be attached to the body to track its position [39][40][41] or to recognize posture [42], a method of recognizing the WIP steps using a built-in IMU in the HMD was shown to be useful [9,10,15,16]. However, the accuracy was not proven, and unintended steps occur even when performing motions other than WIP. If navigation is started when WIP is not started, this will lead to nausea due to information mismatch between the vestibular and visual sensory organs [7,43] and the user may collide with virtual objects or walls.

Methods
The goal was to navigate in a virtual environment via WIP, while not actually moving forward. We used a position and orientation tracking output to achieve this goal. We obtained the HMD's x, y, and z axial positions and rotation from the external tracker. These variables were (m), (m), (m), (°), (°), and (°), respectively ( Figure 1). We usually used to represent the position above the ground and to represent the head pitch. We demonstrate how to recognize WIP steps in two phases ( Figure 2). In the calibration phase (Section 3.1), we estimate the central axis of the quasi-sinusoidal trace of log of the , and a range that covers the trace is set. The central axis, which depends on the user's eye level height, allows our method to be used by people of various heights. The range is used to ignore the input value of the non-WIP motions. In the recognition phase (Section 3.2), we show how to recognize steps and determine virtual initial velocity and virtual velocity based on the user-adjusted WIP recognition range.

Calibration
We find the eye level height of the user so that our algorithm recognizes only WIP steps exactly. The same posture is required each time the tracker reads the position and orientation of the HMD.
3.1.1. Central Axis of WIP When a user wearing HMD faces forward and performs WIP (for example jog-in-place), the quasi-sinusoidal trace of log can be obtained on the y-axis perpendicular to the ground. Since the pattern of Y pos is similar to a sine wave, its central axis can be inferred ( Figure 3). If we know this central axis, we can determine the range for recognizing WIP steps. We called the central axis of WIP H (m). H approximates the user's eye level height, if the user does not intentionally bow. When a person walks, they do not only look ahead, but also up and down. Therefore, H should not be fixed at one point, but should be changed according to the head pitch. This is because the sensor that detects Y pos is inside the HMD [44] rather than in the center of the user's head. The method of calculating H using is as follows: where H corresponds to the pitch change with reference to the front (X rot = 0 • ), Y pos at this time is referred to as H initial (m), and C up (m) and C down (m) are constants used as ratios.

Calibration
We find the eye level height of the user so that our algorithm recognizes only WIP steps exactly. The same posture is required each time the tracker reads the position and orientation of the HMD.

Central Axis of WIP
When a user wearing HMD faces forward and performs WIP (for example jog-in-place), the quasi-sinusoidal trace of log can be obtained on the y-axis perpendicular to the ground. Since the pattern of is similar to a sine wave, its central axis can be inferred ( Figure 3). If we know this central axis, we can determine the range for recognizing WIP steps. We called the central axis of WIP H (m). H approximates the user's eye level height, if the user does not intentionally bow. When a person walks, they do not only look ahead, but also up and down. Therefore, H should not be fixed at one point, but should be changed according to the head pitch. This is because the sensor that detects is inside the HMD [44] rather than in the center of the user's head. The method of calculating H using is as follows: where H corresponds to the pitch change with reference to the front ( = 0°), at this time is referred to as (m), and (m) and (m) are constants used as ratios. becomes negative when the user tilts their head down, and becomes positive when it is raised. Therefore, when the user tilts the head up or down, we multiply sin( 2 360°) by or to find H, corresponding to head pitch around . This means that when WIP is X rot becomes negative when the user tilts their head down, and X rot becomes positive when it is raised. Therefore, when the user tilts the head up or down, we multiply sin (X rot 2π 360 • ) by C up or C down to find H, corresponding to head pitch around H initial . This means that when WIP is performed, the central axis of the cycle can be changed from the maximum H initial + C up to the minimum H initial − C down . However, the user cannot actually tilt their head 90 degrees. We experimentally obtained these parameters as C up = 0.06 and C down = 0.13 and, as we would expect, this seems to be related to the structure of the human neck.

Walking in Place Recognition Range
Once H is specified, a range can cover the WIP pattern of Y pos . The range is the WIP step recognition range that operates around H. H is used because it corresponds to the user's eye level height and head pitch. When WIP is performed, the HMD moves up and down due to the repetitive motion of the lower body, which has different amplitudes depending on the length and posture of the legs. In order to specify the range, specifying the WIP motion in detail is necessary. WIP motion itself may be ambiguous to the user. WIP motion can be classified as "jog-in-place" motion [32] and "march-in-place" motion [31] (Figure 4). Jog-in-place causes a large change in Y pos . This makes it easier for our algorithm, which we propose later, to recognize WIP steps [45]. A motion that provides a completely different result is march-in-place. Detecting steps in march-in-place is difficult because the change in Y pos during a step is much smaller than for jog-in-place. For this reason, additional or more sensitive sensors should be attached to the body to recognize the march-in-place steps [14,34]. We call jog-in-place "intentional motion" and other motions "unintentional motion". This indicates that our method recognizes only steps of jog-in-place. "Unintentional motion" refers to all motion that our method does not recognize such as march-in-place and non-WIP motions. We performed intentional motion for a certain period of time to check the pattern, and we obtained δ (m), which is the difference between the top peak of the Y pos and H. We set the spacing of H ± δC spacing to cover the WIP pattern properly ( Figure 5). C spacing is a constant greater than 1. If the spacing of the range was too narrow, not all the cycles caused by intentional motion were included. If it was too wide, WIP steps were recognized in an unwanted situation. The method used for recognizing the WIP step is explained below. performed, the central axis of the cycle can be changed from the maximum + to the minimum − . However, the user cannot actually tilt their head 90 degrees. We experimentally obtained these parameters as = 0.06 and = 0.13 and, as we would expect, this seems to be related to the structure of the human neck.

Walking in Place Recognition Range
Once H is specified, a range can cover the WIP pattern of . The range is the WIP step recognition range that operates around H. H is used because it corresponds to the user's eye level height and head pitch. When WIP is performed, the HMD moves up and down due to the repetitive motion of the lower body, which has different amplitudes depending on the length and posture of the legs. In order to specify the range, specifying the WIP motion in detail is necessary. WIP motion itself may be ambiguous to the user. WIP motion can be classified as "jog-in-place" motion [32] and "march-in-place" motion [31] (Figure 4). Jog-in-place causes a large change in . This makes it easier for our algorithm, which we propose later, to recognize WIP steps [45]. A motion that provides a completely different result is march-in-place. Detecting steps in march-in-place is difficult because the change in during a step is much smaller than for jog-in-place. For this reason, additional or more sensitive sensors should be attached to the body to recognize the march-in-place steps [14,34]. We call jog-in-place "intentional motion" and other motions "unintentional motion". This indicates that our method recognizes only steps of jog-in-place. "Unintentional motion" refers to all motion that our method does not recognize such as march-in-place and non-WIP motions. We performed intentional motion for a certain period of time to check the pattern, and we obtained (m), which is the difference between the top peak of the and H. We set the spacing of ± to cover the WIP pattern properly ( Figure 5). is a constant greater than 1. If the spacing of the range was too narrow, not all the cycles caused by intentional motion were included. If it was too wide, WIP steps were recognized in an unwanted situation. The method used for recognizing the WIP step is explained below.     Figure 5. This shows the , H, and WIP recognition range ( ± ), which is symmetric around H. This range is applied to cover the pattern that appears when a user with a 1.78 m height performs WIP. The reason the initial position of H is 1.64 m is because the position of the user eye level is measured.
is included in the range even if the user's head is shaken up and down intentionally. We set = 1.36, the WIP recognition range is ± 0.08 m.

Step Recognition
Once the range is determined, the WIP steps can be recognized using the periodic pattern. This pattern can be specifically identified by the WIP cycle. The authors of GUD-WIP [34] introduced the WIP cycle inspired by the biomechanics of the real walking cycle [46]. They used the march-in-place method, considered unintentional motion in our method, with sensors attached to the shins, but some of their ideas are applicable to our method (jog-in-place). This cycle repeats the order of foot off-(initial swing period)-maximum step height-(terminal swing period)-foot strike-(initial double support period)-opposite foot off-(initial swing period)-maximum step height-(terminal swing period)-opposite foot strike-(second double support period). We use the jog-in-place method, so we explain our method in detail, inspired by the biomechanics of the real running cycle [47]. The cycle of our method repeats the order of right toe strike-mid stance-toe off-double limb unsupported-left toe strike-mid stance-toe off-double limb unsupported-right toe strike ( Figure 6). When performing our method (intentional motion), reaches its bottom peak at the moment when the knee is bent the most (mid stance, midfoot strike), and increases when pushing on the ground (toe off). When a user push harder on the ground, the maximum step height increase (double limb unsupported, maximum step height). Then, it repeats. Our method changes the "Maximum step height" to "Maximum HMD height". This is because the height of the HMD is more important to than the height of the steps in our method.
(a) Figure 5. This shows the Y pos , H, and WIP recognition range (H ± δC spacing ), which is symmetric around H. This range is applied to cover the pattern that appears when a user with a 1.78 m height performs WIP. The reason the initial position of H is 1.64 m is because the position of the user eye level is measured. Y pos is included in the range even if the user's head is shaken up and down intentionally. We set C spacing = 1.36, the WIP recognition range is H ± 0.08 m.

Step Recognition
Once the range is determined, the WIP steps can be recognized using the periodic pattern. This pattern can be specifically identified by the WIP cycle. The authors of GUD-WIP [34] introduced the WIP cycle inspired by the biomechanics of the real walking cycle [46]. They used the march-in-place method, considered unintentional motion in our method, with sensors attached to the shins, but some of their ideas are applicable to our method (jog-in-place). This cycle repeats the order of foot off-(initial swing period)-maximum step height-(terminal swing period)-foot strike-(initial double support period)-opposite foot off-(initial swing period)-maximum step height-(terminal swing period)-opposite foot strike-(second double support period). We use the jog-in-place method, so we explain our method in detail, inspired by the biomechanics of the real running cycle [47]. The cycle of our method repeats the order of right toe strike-mid stance-toe off-double limb unsupported-left toe strike-mid stance-toe off-double limb unsupported-right toe strike ( Figure 6). When performing our method (intentional motion), Y pos reaches its bottom peak at the moment when the knee is bent the most (mid stance, midfoot strike), and increases when pushing on the ground (toe off). When a user push harder on the ground, the maximum step height increase (double limb unsupported, maximum step height). Then, it repeats. Our method changes the "Maximum step height" to "Maximum HMD height". This is because the height of the HMD is more important to than the height of the steps in our method. is included in the range even if the user's head is shaken up and down intentionally. We set = 1.36, the WIP recognition range is ± 0.08 m.

Step Recognition
Once the range is determined, the WIP steps can be recognized using the periodic pattern. This pattern can be specifically identified by the WIP cycle. The authors of GUD-WIP [34] introduced the WIP cycle inspired by the biomechanics of the real walking cycle [46]. They used the march-in-place method, considered unintentional motion in our method, with sensors attached to the shins, but some of their ideas are applicable to our method (jog-in-place). This cycle repeats the order of foot off-(initial swing period)-maximum step height-(terminal swing period)-foot strike-(initial double support period)-opposite foot off-(initial swing period)-maximum step height-(terminal swing period)-opposite foot strike-(second double support period). We use the jog-in-place method, so we explain our method in detail, inspired by the biomechanics of the real running cycle [47]. The cycle of our method repeats the order of right toe strike-mid stance-toe off-double limb unsupported-left toe strike-mid stance-toe off-double limb unsupported-right toe strike ( Figure 6). When performing our method (intentional motion), reaches its bottom peak at the moment when the knee is bent the most (mid stance, midfoot strike), and increases when pushing on the ground (toe off). When a user push harder on the ground, the maximum step height increase (double limb unsupported, maximum step height). Then, it repeats. Our method changes the "Maximum step height" to "Maximum HMD height". This is because the height of the HMD is more important to than the height of the steps in our method.
(a) We find the bottom peak of the step recognition cycle caused by WIP (midfoot strike). Our method recognizes the WIP step at the bottom peak because that is the moment of pushing the floor. To recognize this WIP step, we use the queue, which has a first-in-first-out structure, to recognize the bottom peak of the data. It is possible to hold n data with a structure in which the first data is released first. When a central datum among the inserted n data is the smallest, it is recognized as a WIP step. The accuracy and latency vary depending on where we find the smallest value. If the location to find the smallest value is close to the input data, the latency and accuracy are lowered. If the location of finding the smallest value is far from the input data, the latency increases but the accuracy is not guaranteed. We check the central datum among the inserted n data for this reason. If n is large, the accuracy of recognizing the step can be improved, but latency may also increase. If the appropriate n is set, high accuracy can be expected with low latency. If there is noise in the input data, the WIP step recognition accuracy may decrease. This problem can be solved by using a moving average filter with k size. This filter can be optionally used before inserting data into the queue. The filter uses a method of averaging k data (Figure 7). The step latency l (ms) is determined by the filter size of the moving average filter and the queue size. The step latency l is determined as follows: where k ( ≥ 1) is the filter size, n ( ≥ 3) is the queue size, and f is the frame refresh rate of the system. We find the bottom peak of the step recognition cycle caused by WIP (midfoot strike). Our method recognizes the WIP step at the bottom peak because that is the moment of pushing the floor. To recognize this WIP step, we use the queue, which has a first-in-first-out structure, to recognize the bottom peak of the data. It is possible to hold n data with a structure in which the first data is released first. When a central datum among the inserted n data is the smallest, it is recognized as a WIP step. The accuracy and latency vary depending on where we find the smallest value. If the location to find the smallest value is close to the input data, the latency and accuracy are lowered. If the location of finding the smallest value is far from the input data, the latency increases but the accuracy is not guaranteed. We check the central datum among the inserted n data for this reason. If n is large, the accuracy of recognizing the step can be improved, but latency may also increase. If the appropriate n is set, high accuracy can be expected with low latency. If there is noise in the input data, the WIP step recognition accuracy may decrease. This problem can be solved by using a moving average filter with k size. This filter can be optionally used before inserting data into the queue. The filter uses a method of averaging k data (Figure 7). The step latency l (ms) is determined by the filter size of the moving average filter and the queue size. The step latency l is determined as follows: where k (k ≥ 1) is the filter size, n (n ≥ 3) is the queue size, and f is the frame refresh rate of the system. Where is input data, f is a frame number, and ̅ is filtered data. The yellow circle is the input data and the green circle is the filtered data through the moving average filter. The blue diagram shows the process of recognizing a WIP step based on the filtered n data.

Virtual Velocity Decision
VR-STEP [9], a WIP study using IMU, determines only virtual velocity for each step using the step frequency. But we determine both of the virtual initial velocity and decreasing virtual velocity for each step to simulate more natural locomotion. VR-STEP only uses the time interval of the step; however, we used the difference in between the steps to determine the virtual initial velocity, 0 (m/s). The method of determining 0 using linear interpolation is as follows: where s (m) is the difference between the top peak (maximum HMD height) and the bottom peak (midfoot strike) of the WIP pattern, which is located in the (m) and (m) thresholds. The step interval (s) is measured between two bottom peaks, which is located in the (s) and (s) thresholds.
(m/s) and (m/s) are the minimum and maximum values by which 0 can change, respectively (Figure 8).
(a) Figure 7. The step recognition process when the moving average filter (k) = 3 and the size of the queue (n) = 5. Where R is input Y pos data, f is a frame number, and R is filtered data. The yellow circle is the input data and the green circle is the filtered data through the moving average filter. The blue diagram shows the process of recognizing a WIP step based on the filtered n data.

Virtual Velocity Decision
VR-STEP [9], a WIP study using IMU, determines only virtual velocity for each step using the step frequency. But we determine both of the virtual initial velocity and decreasing virtual velocity for each step to simulate more natural locomotion. VR-STEP only uses the time interval of the step; however, we used the difference in Y pos between the steps to determine the virtual initial velocity, v 0 (m/s). The method of determining v 0 using linear interpolation is as follows: where s (m) is the difference between the top peak (maximum HMD height) and the bottom peak (midfoot strike) of the WIP pattern, which is located in the S min (m) and S max (m) thresholds. The step interval i step (s) is measured between two bottom peaks, which is located in the I min (s) and I max (s) thresholds. V min (m/s) and V max (m/s) are the minimum and maximum values by which v 0 can change, respectively ( Figure 8). Where is input data, f is a frame number, and ̅ is filtered data. The yellow circle is the input data and the green circle is the filtered data through the moving average filter. The blue diagram shows the process of recognizing a WIP step based on the filtered n data.

Virtual Velocity Decision
VR-STEP [9], a WIP study using IMU, determines only virtual velocity for each step using the step frequency. But we determine both of the virtual initial velocity and decreasing virtual velocity for each step to simulate more natural locomotion. VR-STEP only uses the time interval of the step; however, we used the difference in between the steps to determine the virtual initial velocity, 0 (m/s). The method of determining 0 using linear interpolation is as follows: where s (m) is the difference between the top peak (maximum HMD height) and the bottom peak (midfoot strike) of the WIP pattern, which is located in the (m) and (m) thresholds. The step interval (s) is measured between two bottom peaks, which is located in the Finding the top peak is similar to recognizing a step. The top peak is obtained in the ste recognition cycle. When a central datum among the inserted n data in the queue is the largest, it recognized as the top peak between two steps. The bottom peak represents the smallest per ste recognition. In most cases, the value of s is located between and threshol ( ∈ [ , ] ) ( Figure 9). These thresholds distinguish between intentional motion an unintentional motions.
is determined by the smallest s that can be obtained when the targ Finding the top peak is similar to recognizing a step. The top peak is obtained in the step recognition cycle. When a central datum among the inserted n data in the queue is the largest, it is recognized as the top peak between two steps. The bottom peak represents the smallest Y pos per step recognition. In most cases, the value of s is located between S min and S max thresholds (s ∈ [S min , S max ]) ( Figure 9). These thresholds distinguish between intentional motion and unintentional motions. S min is determined by the smallest s that can be obtained when the target user performs the intentional motion, which ignores small head movement such as gait, roll, yaw and small pitch movement. S max is determined by the largest s that can be obtained when the target user performs the intentional motion, which ignores large head movement such as large pitch movement. S max does not necessarily match 2δC spacing . This is because the Y pos WIP pattern may not symmetry around H depending on the target user (S min < s ≤ S max ≤ 2δC spacing ). The s does not exactly represent the virtual initial velocity when recognizing the current step, but it is expected to correspond to the previous the virtual initial velocity. The reason for using this method is that our algorithm cannot accurately estimate the virtual velocity of the current WIP step. We solved this problem by being inspired by the behavior of people who gradually change their pace. Our algorithm only recognizes WIP steps based on Y pos difference. Thus, the virtual initial velocity is estimated using s. Since the value of s is fairly small, we calculate v 0 using linear interpolation. v 0 is located between V min and V max (v 0 ∈ [V min , V max ]). V min is the minimum virtual initial velocity correlate with S min , and V max is the maximum virtual initial velocity correlate with S max . The reason for determining the initial value of the virtual velocity is that the person's speed is not constant like a machine. As we studied the WIP method, we found that, in addition to the latency, virtual velocity is related to immersion and motion sickness. In the LLCM-WIP [14], the authors suggested that a saw-tooth function provides a more natural feel to the user than an impulse function and a box function when modeling velocity. They used march-in-place with sensors on their heels, but we expected it to be useful for our method. The virtual velocity v (m/s) that enables natural navigation is obtained as follows: We used i step to prevent v 0 from being updated by unintentional motions. When the target user performs intentional motion, a minimum step interval I min (s) and a maximum step interval I max (s) thresholds are determined between two bottom peaks. When the intentional motion is performed correctly, i step is located between I min and I max thresholds and v 0 is updated. i step between bottom peaks in adjacent WIP steps satisfies this condition. i step due to unintentional motion is less likely to satisfy this condition (e.g., shaking the head). If this condition is not satisfied, v 0 is not updated and this means that the motion is not regarded as intentional motion. The threshold ignores even the WIP steps that are too fast or too slow. We expect the target user not to do this. i step is used as a condition for WIP on the time axis. This serves to reduce step recognition caused by unintentional motions.
The reason for determining the initial value of the virtual velocity is that the person's speed is not constant like a machine. As we studied the WIP method, we found that, in addition to the latency, virtual velocity is related to immersion and motion sickness. In the LLCM-WIP [14], the authors suggested that a saw-tooth function provides a more natural feel to the user than an impulse function and a box function when modeling velocity. They used march-in-place with sensors on their heels, but we expected it to be useful for our method. The virtual velocity v (m/s) that enables natural navigation is obtained as follows: where t nav (s) is the time variable between each WIP step from 0 to i step and a (m/s 2 ) is the acceleration to reduce v. t nav increases until v 0 is updated ( Figure 10). a can be appropriately set according to the virtual environment. For example, if the floor is as slippery as ice, we recommend setting a small. In environments with winds blowing from the front of the user, a should be set larger. The smaller the a, the longer the navigation time; the larger the a, the shorter the navigation time. There are two situations to consider when determining a [14]: when the user continues WIP and when the WIP is stopped. When continuing WIP at a constant speed of motion, a user should not experience a visually stalled condition because they are still moving. However, when the user stops WIP, they should not experience visual movement. These experiences reduce the user's immersion and can cause motion sickness. v represents a saw-tooth waveform due to a. The user experiences a impact on midfoot strike, where v 0 is updated. In the double limb unsupported period, the user experiences a deceleration vertically in the vestibular organ and horizontally in the visual organs ( Figure 11). These experiences provide a feeling of walking in response to the user's step through the optic flow.
Sensors 2018, 18, 2832 11 of 19 Figure 10. This shows the virtual velocity when using WIP: v (m/s). This is the result when 0 ∈ [4,12], a = 9 m/s 2 . If a step is detected, v is updated to new 0 . Figure 10. This shows the virtual velocity when using WIP: v (m/s). This is the result when v 0 ∈ [4,12]. If a step is detected, v is updated to new v 0 . Figure 10. This shows the virtual velocity when using WIP: v (m/s). This is the result when 0 ∈ [4,12], a = 9 m/s 2 . If a step is detected, v is updated to new 0 . Figure 11. One part of Figure 10. This shows how the virtual velocity is synchronized with the user's motion. The midfoot strike shows the largest change in physical acceleration, where 0 is updated. In the double limb unsupported period, both the physical velocity and the virtual velocity are reduced. The saw-tooth function can be applied to our method.
WIP is a unidirectional navigation method. In previous studies [8,16], navigating backward was performed by lifting the head up to compensate for the disadvantages of unidirectional WIP. We also used the backward navigation method. If the user tilts their head up more than T degrees and WIP is a unidirectional navigation method. In previous studies [8,16], navigating backward was performed by lifting the head up to compensate for the disadvantages of unidirectional WIP. We also used the backward navigation method. If the user tilts their head up more than T degrees and performs WIP, the direction of v is reversed. We have experimentally found that the user experiences the least burden when T is 30 degrees.

Evaluation
We analyzed the efficacy of the above methods through evaluation.

Instrumentation
We used an HTC Vive [18], which provides a room-scale position and orientation tracking system. It consists of HMDs, two controllers, and two infrared laser emitter units. However, we did not use the two controllers in the evaluation. The HMD supports 110 • FOV with a resolution of 1080 × 1200 in each eye at the frame refresh rate of 90 Hz. The HTC Vive's tracker works using the inside-out principle. It is operated by two emitters, called lighthouses [48]. When the laser hits 32 photodiodes located on the HMD surface, HMD's position and orientation are tracked via the reflection time difference [44]. The lighthouse can cover up to a 4 × 4 m 2 play area. We used a 2.4 × 2.4 m 2 play area for the evaluation (Figure 12).

Instrumentation
We used an HTC Vive [18], which provides a room-scale position and orientation tracking system. It consists of HMDs, two controllers, and two infrared laser emitter units. However, we did not use the two controllers in the evaluation. The HMD supports 110° FOV with a resolution of 1080 × 1200 in each eye at the frame refresh rate of 90 Hz. The HTC Vive's tracker works using the inside-out principle. It is operated by two emitters, called lighthouses [48]. When the laser hits 32 photodiodes located on the HMD surface, HMD's position and orientation are tracked via the reflection time difference [44]. The lighthouse can cover up to a 4 × 4 m 2 play area. We used a 2.4 × 2.4 m 2 play area for the evaluation (Figure 12).

Virtual Environment
To demonstrate the performance of the positional tracker-based WIP, navigation tasks were performed with straight trajectories included by most other WIP studies [14,19,35,49]. We used the Unity5 game engine [50] to construct the virtual environment, which is a space of 20 × 400 m 2 , surrounded by a wall without obstacles ( Figure 13). The floor and the wall are uneven to provide visual cues for the user to navigate. The user interface (UI) provides the subjects with numerical calibration progress. This allows a subject to calibrate automatically if they maintain the proper posture for a certain period (about 2 s). This process can be replaced by operating the trigger. This calibration process is necessary when setting the optimal H at eye level height for the user. The parameters used in Equation 2 during evaluation were k = 3, n = 5, f = 90 Hz. These result in a latency of 44 ms.

Virtual Environment
To demonstrate the performance of the positional tracker-based WIP, navigation tasks were performed with straight trajectories included by most other WIP studies [14,19,35,49]. We used the Unity5 game engine [50] to construct the virtual environment, which is a space of 20 × 400 m 2 , surrounded by a wall without obstacles ( Figure 13). The floor and the wall are uneven to provide visual cues for the user to navigate. The user interface (UI) provides the subjects with numerical calibration progress. This allows a subject to calibrate automatically if they maintain the proper posture for a certain period (about 2 s). This process can be replaced by operating the trigger. This calibration process is necessary when setting the optimal H at eye level height for the user. The parameters used in Equation 2 during evaluation were k = 3, n = 5, f = 90 Hz. These result in a latency of 44 ms.

Subjects
We recruited nine subjects, two women and seven men, aged 24 to 33 (mean = 28.56, SD = 2.96) for our evaluation. We asked the subjects to complete a study consent form before evaluation. We explained that if the subjects experience severe simulator sickness during the evaluation, or if they become too fatigued even after taking a break, they may stop immediately. We informed the subjects that tracking data with six degrees of freedom would be recorded and that video would be taken during the whole process. We advised them to wear lightweight clothing before the evaluation and provided a pair of sandals if their shoes were uncomfortable.

Subjects
We recruited nine subjects, two women and seven men, aged 24 to 33 (mean = 28.56, SD = 2.96) for our evaluation. We asked the subjects to complete a study consent form before evaluation.
We explained that if the subjects experience severe simulator sickness during the evaluation, or if they become too fatigued even after taking a break, they may stop immediately. We informed the subjects that tracking data with six degrees of freedom would be recorded and that video would be taken during the whole process. We advised them to wear lightweight clothing before the evaluation and provided a pair of sandals if their shoes were uncomfortable.

Interview
We briefly interviewed the subjects before and after evaluation. We wanted to know about the usability of our method, even though this interview is not related to the WIP step accuracy evaluation. The pre-evaluation interview was conducted to know the prior information of the subjects. All subjects mentioned that they had played 3D games during the past year. Five of them had experience playing VR games. Four subjects answered that they usually wear glasses. Only one of them wore glasses during the evaluation. We interviewed about the usability of our method after evaluation. We asked whether the subjects experienced motion sickness. Although there is a questionnaire to measuring the simulator sickness [51], it was not administered because it is important to evaluate the accuracy of the WIP step. We also verbally asked if the navigation was natural. This is to ensure that the saw-tooth function virtual velocity applies to our method.

Procedure
We evaluated the accuracy of WIP step recognition with two tasks and evaluated the error rate of unintentional motion with one task (Figure 14). The accuracy of WIP steps is considered locomotion recognized as steps that were not intended as steps and locomotion intended as steps that were not recognized as steps [6]. The first task (task 1) to measure the accuracy of WIP was forward navigation. The second task (task 2) was a backward navigation task when the head was tilted up over 30 • (T = 30 • ). This was to ensure that the WIP recognition range works well, even when the user's head tilts up. Finally, the task used to evaluate the error rate of unintentional motion was the squat (task 3). The movement begins with a standing posture and then moving subject's hips back, bending their knees and hips, lowering their body, and then returning to an upright posture. The squat evaluation was required because it is only one of the motions that can be performed within the tracking area but also it was perceived as a step in the WIP study using IMU. Subjects completed tasks 1 and 2 in five segments, for a total of 10 segments. One segment had 100 steps, with a 1 min break between segments. Finally, the subjects completed 10 squats for one segment without a break. We provided each subject sufficient explanation about the postures required for each task, along with demonstrating intentional motion [32] and unintentional motion [31] videos. We also allowed each subject more than 1 min of practice time. The subjects were well informed about the methods, and they fixed the headband so that the HMD did not fall during the evaluation, and then performed the calibration task. The subjects kept a straight posture for a few Figure 14. The order in which the evaluation occurred. From left to right: calibration, task 1 (forward navigation), task 2 (backward navigation), and task 3 (squat). Subjects completed tasks 1 and 2 in five segments, for a total of 10 segments. One segment had 100 steps, with a 1 min break between segments. Finally, the subjects completed 10 squats for one segment without a break. We provided each subject sufficient explanation about the postures required for each task, along with demonstrating intentional motion [32] and unintentional motion [31] videos. We also allowed each subject more than 1 min of practice time. The subjects were well informed about the methods, and they fixed the headband so that the HMD did not fall during the evaluation, and then performed the calibration task. The subjects kept a straight posture for a few seconds. After a few seconds, the UI informs the subject that calibration is complete. Because the subjects could lose balance when wearing the HMD and performing WIP [10], we provided something to hold onto, if needed ( Figure 15). Figure 14. The order in which the evaluation occurred. From left to right: calibration, task 1 (forward navigation), task 2 (backward navigation), and task 3 (squat).
Subjects completed tasks 1 and 2 in five segments, for a total of 10 segments. One segment had 100 steps, with a 1 min break between segments. Finally, the subjects completed 10 squats for one segment without a break. We provided each subject sufficient explanation about the postures required for each task, along with demonstrating intentional motion [32] and unintentional motion [31] videos. We also allowed each subject more than 1 min of practice time. The subjects were well informed about the methods, and they fixed the headband so that the HMD did not fall during the evaluation, and then performed the calibration task. The subjects kept a straight posture for a few seconds. After a few seconds, the UI informs the subject that calibration is complete. Because the subjects could lose balance when wearing the HMD and performing WIP [10], we provided something to hold onto, if needed ( Figure 15). Figure 15. When performing WIP, a user could grab or lean on something to maintain balance. A subject preparing to complete a task. Using a chair with handles, the subject maintains balance. The chair is fixed so it cannot be pushed. Table 1 shows the results of task 1, (forward navigation task), task 2 (backward navigation task) and task 3 (squat task) for the nine subjects.

Results
was measured to be about 0.15 m shorter than the subject's actual height. The table shows the average error rate (%) and Standard Deviation (SD) of tasks 1 and 2. The average error rate shown in the table is the average of the results of the five Figure 15. When performing WIP, a user could grab or lean on something to maintain balance. A subject preparing to complete a task. Using a chair with handles, the subject maintains balance. The chair is fixed so it cannot be pushed. Table 1 shows the results of task 1, (forward navigation task), task 2 (backward navigation task) and task 3 (squat task) for the nine subjects. H initial was measured to be about 0.15 m shorter than the subject's actual height. The table shows the average error rate (%) and Standard Deviation (SD) of tasks 1 and 2. The average error rate shown in the table is the average of the results of the five segments. The error includes both the recognition failure and the additional recognition. For example, even if the number of steps amounted to 101, when the first segment of the first task was completed, the step error was three if there were one fail and two additional recognition errors ( Figure 16). The total number of steps (task 1 + task 2) obtained from the evaluation was 9 × 100 × 5 × 2 (number of subjects × steps × segments × tasks) = 9000. The average of step accuracy was 99.32%. In the task 3, no WIP step was recognized with any of the subjects. segments. The error includes both the recognition failure and the additional recognition. For example, even if the number of steps amounted to 101, when the first segment of the first task was completed, the step error was three if there were one fail and two additional recognition errors ( Figure 16). The total number of steps (task 1 + task 2) obtained from the evaluation was 9 × 100 × 5 × 2 (number of subjects × steps × segments × tasks) = 9000. The average of step accuracy was 99.32%. In the task 3, no WIP step was recognized with any of the subjects.

Discussion
Our evaluation results show the high WIP accuracy (99.32%) using the position and orientation data only. Our method recognizes the WIP steps well regardless of head tilt. This is comparable to or slightly more accurate than their informal evaluation results (>98%) of previous WIP studies using IMU [9,17,52]. Additionally, our method follows an evaluation process that has not been used in previous studies. We also confirmed the appropriateness of the WIP range by evaluating unintentional motion through the squat task. When the number of steps exceeded the expected

Discussion
Our evaluation results show the high WIP accuracy (99.32%) using the position and orientation data only. Our method recognizes the WIP steps well regardless of head tilt. This is comparable to or slightly more accurate than their informal evaluation results (>98%) of previous WIP studies using IMU [9,17,52]. Additionally, our method follows an evaluation process that has not been used in previous studies. We also confirmed the appropriateness of the WIP range by evaluating unintentional motion through the squat task. When the number of steps exceeded the expected amount, the subjects restarted after resting. Through this process, one woman and one man had difficulty, but the evaluation was successful. To determine the accuracy of steps per task, two researchers cross-checked data sets to avoid human errors for data analysis (Figure 16). We also compared the video with the subject's log.
To avoid confusing the subject, we did not show the number of steps recognized by the algorithm and, we did not provide audio feedback.
Previously, a WIP study was performed using an IMU inside the HMD [9]. When the subject was instructed to jog-in-place during the evaluation, high accuracy was reported. Based on this, we classified WIP into two categories: march-in-place [31] being the motion where the recognition rate is bad, and jog-in-place [32], which is the motion where the recognition rate is good. Jog-in-place facilitated the recognition of WIP steps. Our algorithm does not detect a step when performing the unintentional motion. Our method guarantees a higher step recognition rate than other jog-in-place methods and has a robust advantage in unintentional motions [9]. We showed the videos to the subjects and explained intentional motion and unintentional motion. As a result of describing the motions specifically to the subject, the results of the low error rate were obtained as in task 1 (forward navigation) and 2 (backward navigation) of Table 1.
In the interview, the subjects talked about their experiences during the evaluation. The second subject said they felt a little lost during the task but did not feel nausea because they were holding onto a chair. We provided the subjects something to hold onto during the evaluation in case the subjects lost balance [10]. We usually used a ring-shaped platform, but since we only evaluated one-way navigations and squat motion, we provided chairs if necessary. The eighth subject said it would be more convenient to lift the head and perform WIP, but the resultant difference was insignificant. The ninth subject felt that the WIP motion was awkward on its own. No subjects in task 1 experienced simulator sickness, but in task 2, two subjects complained of dizziness, stating that it was unfamiliar to navigate backward. This backward navigation method has been proposed to overcome the drawbacks of WIP in previous studies [8,16]. The disadvantage is that the user cannot look back, which can be solved by creating a virtual rear-view mirror [16]. If we use a virtual rear-view mirror, there is a possibility that these two subjects would not have felt dizziness. We could also hear some mentions about the virtual velocity. All subjects who had previously played VR games with a motion controller said that the speed change using WIP motion is very natural. Others said they did not feel any discomfort in terms of the speed. We did not receive any comments from subjects about mismatching between visual feedback and real head motion. The saw-tooth function virtual velocity can be used for the jog-in-place as well as the march-in-place method. This means that the user does not feel uncomfortable even if the virtual velocity is determined according to the step frequency or the up-down difference of the HMD. However, the authors of LLCM-WIP [14] said that the saw-tooth function is still not a good approximation to the rhythmic phase of human walking. In the evaluation process, we found that the frame refresh rate dropped from 90 Hz (l = 44 ms) to 60 Hz (l = 67 ms). Because of this, we expected subjects to feel dizziness, but there was no such mention, but this posed a problem when storing the user's log for analysis of the evaluation. This issue could be solved by removing the code that stores the logs. Fortunately, we found that even when our method was used at 60 Hz, the users were able to use it comfortably.
We report the limitations of our method. In general, WIP techniques are known to be more fatiguing than other hand-based methods [53]. One of the most significant limitations of our method is that it is too tiring compare to the march-in-place methods [14,34]. A large number of subjects felt tired in the evaluation and lowered the temperature of the evaluation space at the request of one subject. Another limitation of our method is that it does not reflect the first step as a virtual velocity. This is a result of recognizing the first step and then identifying the HMD difference between the next steps. This may occur a problem when creating and during a real game. Another problem is that the first WIP step is rarely recognized (Figure 16). This happens when the user starts WIP weakly, so it can be solved when the user consciously starts up strongly. Our method has the limitation that we cannot give the best experience to users because the locomotion direction and the view direction coincide.

Conclusions
In this paper, we proposed a novel WIP method using position and orientation tracking. Our method is more accurate than the existing WIP method using IMU. We distinguished jog-in-place as "intentional motion" and others as "unintentional motion". This indicates that our method only recognizes "intentional motion" correctly. Our method is more stable for unintentional motion within the tracking area. We applied the saw-tooth function virtual velocity to our method in a mathematical way. This velocity provided subjects with a natural navigation experience. We expect our method to be used as a useful way to walk the infinite virtual environment in VR applications such as VR military training and VR running exercise that require a variety of motions.
In a future study, we will continue our research in three directions. First, we will evaluate the robustness of our method for many other non-WIP motions. Second, we will develop an algorithm which can analyze the difference of view and locomotion direction when performing WIP without additional sensors on the body. Third, we will combine our WIP recognition methods and redirected walking methods to present new methods to provide a better experience within the room-scale tracking area.