Validation of a Wireless Bluetooth Photoplethysmography Sensor Used on the Earlobe for Monitoring Heart Rate Variability Features during a Stress-Inducing Mental Task in Healthy Individuals

Heart rate variability (HRV), using electrocardiography (ECG), has gained popularity as a biomarker of the stress response. Alternatives to HRV monitoring, like photoplethysmography (PPG), are being explored as cheaper and unobtrusive non-invasive technologies. We report a new wireless PPG sensor that was tested in detecting changes in HRV, elicited by a mentally stressful task, and to determine if its signal can be used as a surrogate of ECG for HRV analysis. Data were collected simultaneously from volunteers using a PPG and ECG sensor, during a resting and a mentally stressful task. HRV metrics were extracted from these signals and compared to determine the agreement between them and to determine if any changes occurred in the metrics due to the stressful task. For both tasks, a moderate/good agreement was found in the mean interbeat intervals, SDNN, LF, and SD2, and a poor agreement for the pNN50, RMSSD|SD1, and HF metrics. The majority of the tested HRV metrics obtained from the PPG signal showed a significant decrease caused by the mental task. The disagreement found between specific HRV features imposes caution when comparing metrics from different technologies. Nevertheless, the tested sensor was successful at detecting changes in the HRV caused by a mental stressor.


Introduction
Heart rate variability (HRV) is the fluctuation over time of consecutive heartbeats and is accepted as a non-invasive biomarker of the activity of the autonomous nervous system [1][2][3]. The analysis of the HRV has been used as a diagnosis and a clinical research tool, since changes in HRV have been associated with several cardiovascular, metabolic, and mental disorders [2][3][4][5]. This marker has also shown potential for the monitoring of stress and pain responses and has been increasingly used in the sports field, as a tool to improve athletic performance [3,[6][7][8][9][10].
The ECG signal is considered the gold standard from which the R-peaks from the QRS-complex can be identified using automatic computerized algorithms. The distance between these peaks is then used to create time-series of intervals between successive heartbeats (RR intervals) [4,11,12]. In clinical

Experimental Setup and Data Acquisition
The main steps of the experimental setup and data acquisition are represented in Figure 1. The experiment consisted of two five-minute tasks, separated by a one-minute break. Both tasks were performed with the participants in a sitting position. Participants could breathe freely. During the first task, the volunteers were asked to rest for five minutes. After a one-minute break, the participants were requested to perform a computerized version of the SCWT, developed specifically for the task. This test was used because it is easy to implement and has been shown to be an adequate mental stressor by previous studies, in which feelings of increased distress, heart rate, and the galvanic skin response have been reported [3,[28][29][30].

Participants
Twenty-two volunteers were initially recruited. Exclusion criteria included the use of medication or any health condition that could induce abnormal changes in the heart rate signal. Data from four participants had to be excluded, as the collected signal had artifacts that would render the HRV analysis impossible, leaving our study with a total of 18 participants (age 31.1 ± 7.19 years old, 11 females and 7 males).

Software and Scripts Development
All the software and scripts developed for the data acquisition, visualization, processing, statistical analysis, and the computerized version of the SCWT were created using the Python programming language (Python Software Foundation, Version 3.7.4, available at https://www.python.org/) and the packages and modules that are part of the SciPy ecosystem [23][24][25][26][27].

Experimental Setup and Data Acquisition
The main steps of the experimental setup and data acquisition are represented in Figure 1. The experiment consisted of two five-minute tasks, separated by a one-minute break. Both tasks were performed with the participants in a sitting position. Participants could breathe freely. During the first task, the volunteers were asked to rest for five minutes. After a one-minute break, the participants were requested to perform a computerized version of the SCWT, developed specifically for the task. This test was used because it is easy to implement and has been shown to be an adequate mental stressor by previous studies, in which feelings of increased distress, heart rate, and the galvanic skin response have been reported [3,[28][29][30]. Representation of the experimental setup that was used for data acquisition. The experiment entailed two tasks with a duration of five minutes, separated by a one-minute break. Data was acquired and transmitted to a computer via Bluetooth low energy (BLE) using the Polar H10 and James One heart rate sensors, simultaneously.
Data acquisition took place during both tasks and was performed simultaneously using a Polar H10 ECG sensor with the original Pro Strap and the James One PPG sensor ( Figure 2). Both devices have a sampling rate of 1 kHz and proprietary algorithms embedded in their microcontrollers, that can detect when a heartbeat occurs, conveying directly via BLE the inter-beat intervals. To obtain the PP Figure 1. Representation of the experimental setup that was used for data acquisition. The experiment entailed two tasks with a duration of five minutes, separated by a one-minute break. Data was acquired and transmitted to a computer via Bluetooth low energy (BLE) using the Polar H10 and James One heart rate sensors, simultaneously.
Data acquisition took place during both tasks and was performed simultaneously using a Polar H10 ECG sensor with the original Pro Strap and the James One PPG sensor ( Figure 2). Both devices have a sampling rate of 1 kHz and proprietary algorithms embedded in their microcontrollers, that can Sensors 2020, 20, 3905 4 of 18 detect when a heartbeat occurs, conveying directly via BLE the inter-beat intervals. To obtain the PP intervals, the James One sensor initially filters the PPG signal using a low-pass filter at 5 Hz (Butterworth, 2nd order) to avoid high-frequency noise. Its interval detection algorithm is applied to the first derivative of the filtered PPG signal to avoid baseline drifts. Then, the derivative signal is squared to increase the signal-to-noise ratio. The intervals are detected between derivative peaks, which are identified as the maximum values of the squared derivative signal above a dynamic threshold.
Following the guidelines provided in the Polar H10 instructions manual, the electrode area of the strap was moistened to improve signal acquisition and was adjusted under the chest. To minimize movement artifacts, the James One sensor was placed in the left earlobe, held in place with a magnet. The data from both devices was transmitted via BLE to a computer containing a software application specifically developed for the task. This application includes a graphical user interface and allows data to be recorded, timestamped, and visualized in real-time.
Sensors 2020, 20, x FOR PEER REVIEW  4 of 19 intervals, the James One sensor initially filters the PPG signal using a low-pass filter at 5 Hz (Butterworth, 2nd order) to avoid high-frequency noise. Its interval detection algorithm is applied to the first derivative of the filtered PPG signal to avoid baseline drifts. Then, the derivative signal is squared to increase the signal-to-noise ratio. The intervals are detected between derivative peaks, which are identified as the maximum values of the squared derivative signal above a dynamic threshold. Following the guidelines provided in the Polar H10 instructions manual, the electrode area of the strap was moistened to improve signal acquisition and was adjusted under the chest. To minimize movement artifacts, the James One sensor was placed in the left earlobe, held in place with a magnet. The data from both devices was transmitted via BLE to a computer containing a software application specifically developed for the task. This application includes a graphical user interface and allows data to be recorded, timestamped, and visualized in real-time.

Intervals Synchronization
Although the same application was used to receive and timestamp the PP and RR intervals, realtime synchronization of the data was not possible. One of the possible factors for this asynchronization is the different way these devices transmit data using the BLE protocol. Every second, the Polar H10 sends an array that contains the RR intervals that occurred in that time window. On the other hand, James One sends a PP interval each time a heartbeat occurs. This, allied with other uncontrollable factors, like potential delays in data transmission, the fact that BLE protocol does not provide timestamped intervals (data is timestamped when it is received on the acquisition software), and the delay originated from the different nature of the signals (PTT), makes it necessary to align the intervals before data analysis takes place. A script was created that performs this task automatically, by finding the position where the minimum variance and the maximal crosscorrelation between the two time-series (RR and PP intervals) are present.

Artifacts correction
Software was developed to detected and correct potential artifacts, either originated from technical problems (e.g., motion artifacts) or from a physiological origin (e.g., ectopic beats), in the obtained intervals to improve the overall quality of the HRV analysis. A PP or RR interval would be considered abnormal if its value was outside the 350-1350 milliseconds range or if it deviated 20% from the mean of the preceding and the subsequent interval [31,32]. Following other studies' recommendations,

Intervals Synchronization
Although the same application was used to receive and timestamp the PP and RR intervals, real-time synchronization of the data was not possible. One of the possible factors for this asynchronization is the different way these devices transmit data using the BLE protocol. Every second, the Polar H10 sends an array that contains the RR intervals that occurred in that time window. On the other hand, James One sends a PP interval each time a heartbeat occurs. This, allied with other uncontrollable factors, like potential delays in data transmission, the fact that BLE protocol does not provide timestamped intervals (data is timestamped when it is received on the acquisition software), and the delay originated from the different nature of the signals (PTT), makes it necessary to align the intervals before data analysis takes place. A script was created that performs this task automatically, by finding the position where the minimum variance and the maximal cross-correlation between the two time-series (RR and PP intervals) are present.

Artifacts Correction
Software was developed to detected and correct potential artifacts, either originated from technical problems (e.g., motion artifacts) or from a physiological origin (e.g., ectopic beats), in the obtained intervals to improve the overall quality of the HRV analysis. A PP or RR interval would be considered abnormal if its value was outside the 350-1350 milliseconds range or if it deviated 20% from the mean of the preceding and the subsequent interval [31,32]. Following other studies' recommendations, abnormal intervals were replaced with values interpolated from adjacent intervals using linear interpolation ( Figure 3) [2,32,33].

HRV Features Calculations
Time and frequency domain and non-linear HRV measurements were calculated from the PP and RR intervals of each participant. The following time-domain features were used to quantify the amount of HRV in the five-minute time windows (short-term measurement): the mean of the intervals, the standard deviation of the intervals (SDNN), the percentage of adjacent intervals that differ more than 50 milliseconds (pNN50), and the root mean square of successive differences between heartbeats (RMSSD) "obtained by first calculating each successive time difference between heartbeats in ms. Then, each of the values is squared and the result is averaged before the square root of the total is obtained." [13]. All the chosen time-domain features are expressed in milliseconds, except for the pNN50, which is represented in percentage [1][2][3]13,34]. To obtain the frequency-domain features, power spectral density was estimated using the Lomb-Scargle periodogram method. Only two of the three distinguishable core spectral components were used as features: the low frequency (LF; 0.04-0.15 Hz) and the high frequency (HF; 0.15-0.4 Hz) bands. The very-low-frequency band (0.003-0.04 Hz) was not included since short-term records were used. All the frequency domain measures are expressed in milliseconds squared (ms 2 ) [1][2][3]13,35]. For the non-linear HRV analysis, where the unpredictability of the five-minute records is evaluated, scatter plots were created by plotting the current interval against the next interval-Poincaré plot. An ellipse centered on the average of the intervals was plotted. From this ellipse, the following HRV parameters were extracted: standard deviation 1 (SD1) that represents the ellipse width and standard deviation 2 (SD2) that represents its length. All the non-linear metrics are expressed in milliseconds [13,36]. Some studies use RMSSD and SD1 as independent metrics, apparently unaware that one can be obtained from the other using a constant multiplication. Even though these metrics produce similar results, we decided to include both, but we will refer to these two metrics as (one) RMSSD|SD1 in the discussion [37].

HRV Features Calculations
Time and frequency domain and non-linear HRV measurements were calculated from the PP and RR intervals of each participant. The following time-domain features were used to quantify the amount of HRV in the five-minute time windows (short-term measurement): the mean of the intervals, the standard deviation of the intervals (SDNN), the percentage of adjacent intervals that differ more than 50 milliseconds (pNN50), and the root mean square of successive differences between heartbeats (RMSSD) "obtained by first calculating each successive time difference between heartbeats in ms. Then, each of the values is squared and the result is averaged before the square root of the total is obtained." [13]. All the chosen time-domain features are expressed in milliseconds, except for the pNN50, which is represented in percentage [1][2][3]13,34]. To obtain the frequency-domain features, power spectral density was estimated using the Lomb-Scargle periodogram method. Only two of the three distinguishable core spectral components were used as features: the low frequency (LF; 0.04-0.15 Hz) and the high frequency (HF; 0.15-0.4 Hz) bands. The very-low-frequency band (0.003-0.04 Hz) was not included since short-term records were used. All the frequency domain measures are expressed in milliseconds squared (ms 2 ) [1][2][3]13,35]. For the non-linear HRV analysis, where the unpredictability of the five-minute records is evaluated, scatter plots were created by plotting the current interval against the next interval-Poincaré plot. An ellipse centered on the average of the intervals was plotted. From this ellipse, the following HRV parameters were extracted: standard deviation 1 (SD1) that represents the ellipse width and standard deviation 2 (SD2) that represents its length. All the non-linear metrics are expressed in milliseconds [13,36]. Some studies use RMSSD and SD1 as independent metrics, apparently unaware that one can be obtained from the other using a constant multiplication. Even though these metrics produce similar results, we decided to include both, but we will refer to these two metrics as (one) RMSSD|SD1 in the discussion [37].

Interbeat Intervals Description
To get a better understanding of the obtained intervals using the James One and the Polar H10 during the execution of the required tasks, all the participants' intervals were pooled together and spread into four groups, called James One-Rest, Polar H10-Rest, James One-SCWT, and Polar H10-SCWT. For each of these groups, a histogram with the intervals was plotted, and descriptive statistics were performed.

Determination of the Agreement between HRV Features
For the analysis of the agreement between the HRV features extracted from the intervals we followed a mixed approach and combined a set of different methods to determine the agreement.
First, we compared the relative variability between the HRV metrics extracted from the intervals using the James One and the Polar H10. This comparison was performed using the differences between the coefficients of variation (CV) calculated from the means and standard deviations (SD) of each pair of HRV features (e.g., SDNN of James One at rest is paired with SDNN of Polar H10 at rest for comparisons).
As a measure of effect size, we included Cohen's d. This value was calculated using the differences between the means for each pair of HRV metrics divided from their pooled SDs. Since we had a small sample size, we had to apply a correction factor, in what is normally called the Hedges's g (H g ). Using Cohen's guidelines (that depend on the situation), an H g value equal to 0.2 represents a small effect, 0.5 a medium effect, and 0.8 a large effect [38,39].
Lin's Concordance Correlation Coefficient (LCCC) was used to measure the agreement between the HRV metrics when normal distribution was present (Shapiro-Wilk test, p > 0.05) [40,41]. The following criteria for the interpretation of the LCCC was proposed by McBride [42]: almost perfect agreement if LCCC > 0.99, substantial agreement if 0.99 > LCCC > 0.95, moderate agreement if 0.95 > LCCC > 0.90, and poor agreement if LCCC < 0.9 [41,42]. However, and since these are merely a suggestion, we applied a more conservative approach, and considered that there was a poor agreement if LCCC < 0.95.
Another recommended procedure to assess the agreement between two methods is the Bland-Altman analysis [4,[43][44][45]. This method does not require that the measurements obtained from the methods follow a normal distribution, but it assumes that the differences between the measures do [44,45]. Not all the differences between the HRV metrics obtained followed a normal distribution (Shapiro-Wilk test, p < 0.05). Even though a non-parametric approach is described by Bland et al. and used by the British Hypertension Society, its implementation was not possible as we could not find reference values for the maximum acceptable values to build the limits for each of HRV metric [40,45,46]. Nevertheless, Bland et al. suggest that "a non-normal distribution of differences may not be as serious here as in other statistical contexts", so we decided to include the parametric approach of the Bland-Altman analysis [45]. In this approach, the average of the HRV metrics from both devices was plotted against the difference between methods. The mean of the differences between measurements, also known as bias or systematic error, was calculated and plotted, as well as the associated upper and lower limits of agreement (LoA, bias ± (1.96 × SD of the bias)) and the associated confidence intervals (CI, 95%) [43][44][45]. The Bland-Altman ratio (BA ratio) was calculated dividing half the range of the LoA by the mean of the pair of means of each HRV metric. Some authors suggest that a ratio lower than 0.1 indicates a good agreement, values higher than 0.1 and lower than 0.2 a moderate agreement, and higher than 0.2 a poor agreement [13,34,47].
An additional employed strategy was to plot all the pairs of HRV metrics against a 45 • line (that would represent a perfect agreement) to visually evaluate how they deviate from it.

Comparison of HRV Metrics during Rest and SCWT
To determine if James One could be used to detect changes in the HRV metrics caused by a mental stressor, we compared the data obtained from the resting task with the one from the SCWT. This comparison was also performed with the data obtained from the Polar H10, making a total of two comparison groups: (1) James One-Rest vs SCWT; and (2) Polar H10-Rest vs SCWT.
All the time and frequency domain features described in Section 2.4.3 were used. The use of absolute values for HF and LF alone can lead to an incorrect interpretation of results, so we also included the LF/HF ratio [30,48]. Only the nonlinear feature SD2 was included as using both RMSSD and SD1 is redundant [37].
Box plots were used to display the calculated HRV features (Rest vs SCTW). Each box plot is divided by a bar that represents the median (50th percentile). The spaces between the middle bar and the top and the bottom of the box indicate the 75th and 25th percentiles, respectively. The interquartile range (IQR) is the distance between the 75th and 25th percentiles. The whiskers that extend from the boxes represent the maximum (75th + 1.5 × IQR) and the minimum (25th − 1.5 × IQR) limits, in which a value is not regarded as an outlier. Values plotted outside these whiskers are considered outliers and are shown as . Wilcoxon signed rank was used to compare the HRV metrics, as not all of them had a normal distribution. A p < 0.05 was considered significant. As a measure of effect size, we included the common-language effect size (CLES), that will tell us "the probability that a score sampled at random from one distribution will be greater than a score sampled from some other distribution." [49].

Results
All the participants completed the required tasks. After the data was processed, four of the 22 participants had to be excluded due to anomalies (signal artifacts and abnormal PP/RR intervals) in the collected data that would render the HRV analysis and comparison impossible. Two of the participants had one gap at the end of their records using the James One, which could not be fixed with interpolation, so these gaps and the corresponding RR intervals obtained from the Polar H10 were removed so both data records could have the same length. For one of the participants, this gap was located at the end of task 1 (rest) and accounted for 20 s. The other's participants' gap was present at the end of task 2 (SCWT) and accounted for 30 s. From the 18 participants, a total of 14,296 intervals were obtained using the devices during the required tasks, 6976 at rest, and 7320 during the SCWT.
As shown in Table 1, the developed software detected 12 (0.17%) and 7 (0.1%) abnormal intervals in the James One and Polar H10 recordings during the task 1 (rest), respectively. During task 2 (SCWT), 9 (0.12%) errors were found in the James One intervals and 2 (0.03%) in the Polar H10. The minimum, maximum, mean, SD, skewness, kurtosis, and the CVs of the obtained PP and RR intervals are reported in Table 2. Both devices displayed similar CVs during the proposed tasks. The histograms in Figure 4 show the density of the distribution of the intervals from both devices during the required tasks. The intervals acquired at rest appear to have a unimodal distribution while during the SCTW the plot gives the idea of a bimodal distribution.   Table 3 shows the mean and the associated SD, differences between the CVs of the James One and the Polar H10, Hg value, LCCC, the mean of the differences between measurements (bias) and the associated SD, LoA from the Bland-Altman analysis, and the BA ratio for all the pairs of HRV metrics obtained at rest. The criteria used to interpret the data are also shown. Figure 5 shows the pair of HRV metrics at rest plotted against a 45° line (that would represent a perfect agreement). The Bland-Altman plots are shown in Figure A1 and Figure A2 (Appendix A). Table 3. Mean and the associated SD, differences between the CVs of the James One (CVJ) and the Polar H10 (CVP), Hg value, mean differences (bias) and the associated SD, for all the pairs of HRV metrics obtained at rest. Using Cohen's guidelines, an Hg value equal to 0.2 represents a small effect, 0.5 a medium effect, and 0.8 a large effect [38,39]. The agreement between the parameters is shown as LCCC. Values higher than 0.99 represent an almost perfect agreement, substantial and moderate agreement when lower than 0.99 and higher than 0.95, and a poor agreement if lower than 0.95 [41,42]. BA ratios lower than 0.1 represent a good agreement, higher than 0.1 and lower than 0.2 represent a moderate agreement, and higher than 0.   Table 3 shows the mean and the associated SD, differences between the CVs of the James One and the Polar H10, H g value, LCCC, the mean of the differences between measurements (bias) and the associated SD, LoA from the Bland-Altman analysis, and the BA ratio for all the pairs of HRV metrics obtained at rest. The criteria used to interpret the data are also shown. Figure 5 shows the pair of HRV metrics at rest plotted against a 45 • line (that would represent a perfect agreement). The Bland-Altman plots are shown in Figures A1 and A2 (Appendix A). Table 3. Mean and the associated SD, differences between the CVs of the James One (CV J ) and the Polar H10 (CV P ), H g value, mean differences (bias) and the associated SD, for all the pairs of HRV metrics obtained at rest. Using Cohen's guidelines, an H g value equal to 0.2 represents a small effect, 0.5 a medium effect, and 0.8 a large effect [38,39]. The agreement between the parameters is shown as LCCC. Values higher than 0.99 represent an almost perfect agreement, substantial and moderate agreement when lower than 0.99 and higher than 0.95, and a poor agreement if lower than 0.95 [41,42]. BA ratios lower than 0.1 represent a good agreement, higher than 0.1 and lower than 0.2 represent a moderate agreement, and higher than 0.2 a poor agreement [13,34,47].  Figure 5. Pairs of HRV metrics obtained at rest plotted against a 45° line (that would represent a perfect agreement). The HRV metrics obtained from the James One are represented on the xaxis and the ones from the Polar H10 on the y-axis. Table 4 shows the mean and the associated SD, differences between the CVs of the James One and the Polar H10, Hg value, LCCC, the mean of the differences between measurements (bias) and the associated SD, LoA from the Bland-Altman analysis, and the BA ratio for all the pairs of  Table 4 shows the mean and the associated SD, differences between the CVs of the James One and the Polar H10, H g value, LCCC, the mean of the differences between measurements (bias) and the associated SD, LoA from the Bland-Altman analysis, and the BA ratio for all the pairs of HRV metrics obtained during the SCWT. The criteria used to interpret the data are also shown. Figure 6 shows the pair of HRV metrics during the SCWT plotted against a 45 • line (that would represent a perfect agreement). The Bland-Altman plots are shown in Figures A3 and A4 (Appendix A). Table 4. Mean and the associated SD, differences between the CVs of the James One and the Polar H10, H g value, mean differences (bias) and the associated SD, for all the pairs of HRV metrics obtained during the SCWT. Using Cohen's guidelines, an H g value equal to 0.2 represents a small effect, 0.5 a medium effect, and 0.8 a large effect [38,39]. The agreement between the parameters is shown as LCCC. Values higher than 0.99 represent an almost perfect agreement, substantial and moderate agreement when lower than 0.99 and higher than 0.95, and a poor agreement if lower than 0.95 [41,42]. BA ratios lower than 0.1 represent a good agreement, higher than 0.1 and lower than 0.2 represent moderate agreement, and higher than 0.2 a poor agreement [13,34,47].  Figure 6. Pairs of HRV metrics obtained during the SCWT plotted against a 45º line (that would represent a perfect agreement). The HRV metrics obtained from the James One are represented on the x-axis and the ones from the Polar H10 on the y-axis. Figure 7 and Table 5 show the results of the comparison between the HRV metrics obtained during the rest task and the SCWT. Overall, there was a decrease in all of the metrics for both devices when comparing the values obtained from the resting task with the values from the SCWT, except for the LF/HF ratio, which increased in James One. For James One, this decrease was significant for the mean intervals, SDNN, pNN50, RMSSD, HF, and SD2. As for the Polar H10, the decreases were significant for the mean intervals, SDNN, HF, and SD2. Figure 7. Box plots of the comparison between the HRV metrics obtained during the rest task and the SCWT. Each boxplot is divided by a bar that represents the median (50th percentile). The spaces between the middle bar and the top and the bottom of the box indicate the 75th and 25th percentiles, respectively. The IQR is the distance between the 75th and 25th percentiles. The whiskers that extend from the boxes represent the maximum (75th + 1.5 × IQR) and the minimum (25th -1.5 × IQR) limits, in which a value is not regarded as an outlier. Values plotted outside these whiskers are considered outliers and are shown as ♦.  Figure 7 and Table 5 show the results of the comparison between the HRV metrics obtained during the rest task and the SCWT. Overall, there was a decrease in all of the metrics for both devices when comparing the values obtained from the resting task with the values from the SCWT, except for the LF/HF ratio, which increased in James One. For James One, this decrease was significant for the mean intervals, SDNN, pNN50, RMSSD, HF, and SD2. As for the Polar H10, the decreases were significant for the mean intervals, SDNN, HF, and SD2.

Comparison between HRV Metrics at Rest and SCWT
Sensors 2020, 20, x FOR PEER REVIEW 10 of 19 Figure 6. Pairs of HRV metrics obtained during the SCWT plotted against a 45º line (that would represent a perfect agreement). The HRV metrics obtained from the James One are represented on the x-axis and the ones from the Polar H10 on the y-axis. Figure 7 and Table 5 show the results of the comparison between the HRV metrics obtained during the rest task and the SCWT. Overall, there was a decrease in all of the metrics for both devices when comparing the values obtained from the resting task with the values from the SCWT, except for the LF/HF ratio, which increased in James One. For James One, this decrease was significant for the mean intervals, SDNN, pNN50, RMSSD, HF, and SD2. As for the Polar H10, the decreases were significant for the mean intervals, SDNN, HF, and SD2. Figure 7. Box plots of the comparison between the HRV metrics obtained during the rest task and the SCWT. Each boxplot is divided by a bar that represents the median (50th percentile). The spaces between the middle bar and the top and the bottom of the box indicate the 75th and 25th percentiles, respectively. The IQR is the distance between the 75th and 25th percentiles. The whiskers that extend from the boxes represent the maximum (75th + 1.5 × IQR) and the minimum (25th -1.5 × IQR) limits, in which a value is not regarded as an outlier. Values plotted outside these whiskers are considered outliers and are shown as ♦. Box plots of the comparison between the HRV metrics obtained during the rest task and the SCWT. Each boxplot is divided by a bar that represents the median (50th percentile). The spaces between the middle bar and the top and the bottom of the box indicate the 75th and 25th percentiles, respectively. The IQR is the distance between the 75th and 25th percentiles. The whiskers that extend from the boxes represent the maximum (75th + 1.5 × IQR) and the minimum (25th − 1.5 × IQR) limits, in which a value is not regarded as an outlier. Values plotted outside these whiskers are considered outliers and are shown as . Table 5. Comparison of the HRV metrics obtained during the resting stage and the SCWT. A ↓ indicates a decrease in the HRV metrics and ↑ an increase. A p-value lower than 0.05 indicates that the differences between the rest and the SCTW were significant. The effect size is reported as CLES.

Discussion
The need for unobtrusive, simple, inexpensive methodologies for cardiovascular system monitoring, as well as the recent boom in the wearables market, has made the PPG methodology resurface in the last years [3,4,10,15,50]. Even though several commercially available wearable devices can monitor the heart rate signal using the PPG technology, they present one or several of the following drawbacks: low quality of the acquired data, lack of access to the data or no information of the used algorithms to process the data, privacy concerns (e.g., data sold to third party companies), high cost, proprietary software required to obtain the data, and technical problems [3,[50][51][52]. The James Ones, however, has a low price and it allows direct access to its data, which can be obtained using standard BLE protocols giving the developers the freedom to create their algorithms for data processing and analysis.
The pulse wave signal generated by the PPG technology has been studied as a potential surrogate of the ECG signal for HRV analysis, but a consensus about its validity has an alternative has still not been reached. At resting conditions and using young and healthy participants, the available studies propose that this technology is a viable alternative, while others suggest that this technology as a tendency to overestimate some short-term variability metrics (pNN50, RMSSD, and HF) [4]. Several technical and physiological factors, on which the execution of mentally stressful tasks is included, have been suggested as potential sources of disagreement between the PRV and HRV metrics [4,16].
In our work, we aimed to validate for the first time the HRV measurements calculated from the PP intervals obtained from the heart rate sensor, James One, under two different conditions, at rest and during a mental stress-inducing task. As a reference, we used the RR intervals provided by a Polar H10 chest strap, an improved version of previously validated heart rate chest straps [1,3,6,11,22]. Regardless of the presence or absence of agreement between the HRV metrics obtained from these two devices, we tried to determine if the data extracted from the James One could be used to detect changes caused by the execution of a mentally stressful task.
Overall, the signal obtained from the James One and the Polar H10 appears to be good for both tasks, as a low number of abnormal intervals were detected. This was expected for the data collected from the Polar chest straps, where high signal quality is reported, even on physically demanding activities [1,11,22]. The motion generated artifacts are one of the most common problems in the pulse wave signal, but other factors like skin tone or temperature can hurt the signal quality [4,15,19,20]. Due to the nature of our experiment (lack of physical exertion), alike experimental conditions (e.g., similar skins tones and ambient temperatures), and the location of the sensor (left earlobe as opposed to the wrist or finger), this problem seems to have been mitigated.
Person's Correlation Coefficients or mean comparisons (e.g., paired t-tests) are commonly used in the literature for the assessment of the agreement between the intervals and the HRV metrics obtained from two different devices. It is important to note however that the results obtained from these approaches can be highly misleading and their use to evaluate the agreement between methods has been deflated in previous works, where better alternatives are suggested (Bland-Altman analysis and Lin's Concordance Correlation Coefficient) [4,41,[43][44][45]. The use of these (occasionally wrong) methodologies makes the comparison of the results between articles hard, as sometimes the authors suggest that there is a good agreement between methods when in fact, there is only a good correlation. As we mentioned in our methodology, we decided to combine different methodologies to study the agreement between the HRV features extracted from the PP and RR intervals as we felt that a single simple approach could give misleading results.
At rest task and looking at the plots of the pairs of HRV metrics in Figure 5, it is possible to see that some features like the pNN50, RMSSD|SD1, and HF are overestimated by the James One, in comparison to the correspondent features calculated from Polar H10. The SDNN and SD2 seem to be slightly overestimated, the mean intervals somewhat underestimated, and the LF (although higher values look slightly overestimated) appears to fit the 45 • . These observations are corroborated by the values of the bias, reported in Table 3. The mean intervals, LF and SD2 display small differences in their CVs, low H g values, and, when possible to calculate, LCCC values that represent almost perfect agreement, and BA ratios that report a good agreement (except for the LF feature that indicates a moderate agreement). The pNN50 and HF metrics appear to have considerable differences in the CVs and H g values. Their BA ratios seem to indicate a poor agreement. Although not high as the previous metrics, the differences in the CVs of the RMSSD|SD1 are still relevant when compared to the values from the mean intervals, LF, and SD2 metrics. This metric also shows an H g value higher than 0.2. Its LCCC value suggests a poor agreement while its BA ratio a moderate agreement. The SDNN metric shows relatively low differences in the CVs, a small but still considerable H g value, substantial to a moderate agreement in its LCCC, and a good agreement in its BA ratio. Taking all of this into account we considered that the mean intervals, LF, and SD2 metrics show good agreement between the James One and Polar H10. The SDNN appears to have a moderate agreement and the pNN50, RMSSD|SD1, and HF a poor agreement.
Regarding the HRV metrics calculated from the PP and RR intervals obtained during the SCWT, is once again possible to see (using the plots in Figure 6) that the James One tends to overestimate the pNN50, RMSSD|SD1 and HF, slightly overestimate the SDNN and SD2, and marginally underestimate the mean intervals. Again, the LF band appears to fit the 45 • (even though higher values look slightly overestimated). The values of the bias from Table 4 seem to support these observations. The mean intervals, SDNN, LF, and SD2 appear to have similarly low differences in their CVs, H g values lower than 0.2, LCCC values that indicate a perfect agreement (when available), and BA ratios that suggest good agreement (except for the LF that is branded as a moderate agreement). Although not as noticeable as in the rest task, the pNN50, RMSSD|SD1, and HF show noticeable differences in their CVs (in comparison to the mean intervals, SDNN, LF, and SD2 metrics) and H g values. The RMSSD|SD1 LCCC value suggests a poor agreement. The BA ratios show a poor agreement for the pNN50 and the HF, and a moderate agreement for the RMSSD|SD1. Taking all of this into account we considered that the mean intervals, SDNN, LF, and SD2 metrics show a good agreement between the James One and Polar H10 and a poor agreement for the pNN50, RMSSD|SD1, and HF metrics.
Our findings are in line with other studies that suggest a tendency for the overestimation of certain HRV metrics when using a PPG sensor, a disagreement between short-term variability parameters such as the pNN50, RMSSD|SD1, and HF, and an agreement in metrics like the mean intervals, SDNN, LF, and SD2 [4,16,53]. It is unlikely that the disagreement found between the metrics is due to technical factors, as the overall quality of the signal obtained from the James One appears to be good (as we have previously reported) and the sensor has a high sampling rate (1 kHz). The lack of agreement in the short-term variability metrics could be explained by the variations in the PTT caused by the unconstrained breathing rates of our participants, as suggested by Schäfer and Vagedes [4]. This idea is supported by other studies, like the one conducted by Chen et al. [53], where a comparison of the impact of different respiration modes on the agreement between PRV and HRV metrics was performed. They found that during paced breathing, almost all the tested metrics (mean intervals, SDNN, RMSSD|SD1, LF, HF, and SD2) had a moderate/good agreement. However, with intermittent breath holding, short-term variability metrics like the RMSSD|SD1 and HF had an insufficient agreement. They suggested that the maintenance of a steady breathing pace was translated into a reduction in the variation of the PTT, potentially decreasing the differences between the PRV and HRV parameters. They concluded that, regardless of the tested respiratory mode, parameters like means intervals, SDNN, LF, and SD2 showed a satisfactory agreement, as opposed to short-term metrics [53]. Weinschenk et al. [34] also reported stronger agreements between PRV and HRV metrics when the breathing conditions were controlled in comparison to spontaneous breathing rates. Although the execution of a mentally stressful task had an impact on the values of some of the tested HRV features (Table 5), the agreement between the metrics did not appear to suffer any alteration apart from the SDNN metric, which improved from a moderate to a good agreement. As previously shown in another study and has seen in our results, reporting RMSSD and SD1 is redundant, as both metrics show similar results [37].
As shown in Table 5 and Figure 7, the HRV metrics extracted from the PP intervals obtained from the James One seem to present adequate sensitivity to detect changes in the HRV caused by a mental stressor. Almost all the tested metrics had a decrease that was deemed significant (mean intervals, SDNN, pNN50, RMSSD, HF, and SD2). For these metrics, an average of CLES ≈ 0.63 indicates that there is a 63% chance that if a value is randomly taken from the rest task, it will be greater than a value randomly sampled from the SCTW task [49]. Although there was a decrease in the LF and an increase in the LF/HF ratio, these alterations were not significant. These results are in line with evidence from other studies that use mental stressors, where decreases in the heartbeat intervals, SDNN, pNN50, RMSSD, absolute values of HF and LF and increases of LF/HF ratios are reported [8,30,54,55]. Even though the data obtained from the Polar H10 indicates a decrease in all the tested metrics due to the execution of the SCWT, these changes were only significant for the mean intervals, SDNN, HF, and SD2 (CLES ≈ 0.62). Oddly enough, the LF/HF suffered a slight (non-significant) decrease. Interestingly, although all metrics with moderate to good device agreement present similar differences between tasks, the features pNN50 and RMSSD, poorly congruent among devices, seems to discriminate tasks clearly when calculated from James One.
It is worth noticing that a poor agreement among some of the tested metrics does not mean that the signal obtained from the James One sensor lacks quality or is devoid of use for the monitoring of changes in HRV metrics with poor concordance. This only tells us that these metrics cannot be used interchangeably with the ones obtained from the Polar H10. Furthermore, the poor agreement in some metrics seems to result from the nature of the signal itself, as other studies report similar results [4,16]. In fact, considering that anxiety and stress states have been described as having a decreasing effect on HRV, the metrics calculated from James One seem to be often more sensitive to the stress-induction of the SCWT task, as seen in the results of Table 5, were the James One was able to significantly detect changes in the pNN50 and RMSSD, whilst the Polar H10 was not [8]. Nevertheless, some features extracted from the intervals provided from the James One, like the mean intervals, LF, and SD2, and, to a certain degree, the SDNN, can be used interchangeably with the features obtained from the Polar H10, that uses ECG.

Conclusions
At rest and during the execution of the SCTW, there was good agreement between the mean intervals, LF, and SD2 metrics extracted from the PP intervals provided by the James One and the RR intervals obtained from the Polar H10. As for the SDNN metric, there was a good agreement during the SCTW and a moderate agreement at rest. Metrics that reflect short-term variability, like the pNN50, RMSSD|SD1, and HF appear to have a poor agreement between sensors and seem to be overestimated when using PP intervals, in comparison to the correspondent features calculated from RR intervals. The execution of a mental task did not appear to negatively affect the agreement.
As previously reported, some incongruence was observed between specific HRV metrics calculated from RR and PP intervals which imposes some caution when comparing HRV metrics calculated from different technologies. Nevertheless, the data extracted from James One could be successfully used to detect changes in the HRV caused by the execution of a mentally stressful task. The mental stressor caused a significant decrease in the mean intervals, SDNN, pNN50, RMSSD|SD1, HF, and SD2.
Considering its low-cost and usage flexibility, the reported results suggest that James One may be a promising, yet robust, sensor for measuring stress induction through HRV metrics. for providing the James One heart rate sensor and the documentation and firmware upgrades necessary to make it work.

Conflicts of Interest:
Nuno Dias is the co-founder of the MindProber Labs, the company responsible for the creation and development of the James One.
Ethical Statements: All subjects gave their informed consent for inclusion before they participated in the study.
The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Subcommittee for the Life and Health Sciences (SECVS) (Document ID: SECVS 011/2018).