Assessing Map-Reading Skills Using Eye Tracking and Bayesian Structural Equation Modelling

: Map reading is an important skill for acquiring spatial information. Previous studies have mainly used results-based assessments to learn about map-reading skills. However, how to model the relationship between map-reading skills and eye movement metrics is not well documented. In this paper, we propose a novel method to assess map-reading skills using eye movement metrics and Bayesian structural equation modelling. We recruited 258 participants to complete ﬁve map-reading tasks, which included map visualization, topology, navigation, and spatial association. The results indicated that map-reading skills could be reﬂected in three selected eye movement metrics, namely, the measure of ﬁrst ﬁxation, the measure of processing, and the measure of search. The model ﬁtted well for all ﬁve tasks, and the scores generated by the model reﬂected the accuracy and efﬁciency of the participants’ performance. This study might provide a new approach to facilitate the quantitative assessment of map-reading skills based on eye tracking. studies, the expertise of the map user was reﬂected by eye movement metrics. These studies indicate that eye tracking has potential to assess users’ map-reading skills.


Introduction
Map reading is a fundamental skill required by many daily tasks, such as map-based navigation [1], the acquisition of spatial information from a map [2], and map-based education [3][4][5]. It is very important to assess map-reading skills, which can contribute to identifying the current status and evaluating the outcomes of geographical education.
There are different components and levels of competence regarding map reading. Board identified navigation, measurement, and visualization as the three main groups of map-reading tasks that should be addressed when assessing map-reading skills [6]. Clarke defined "functional map literacy" as "the ability to understand and use maps in daily life, for work and in the community" and identified levels of competence concerning map literacy [2]. Research has been conducted to measure these skills for purposes such as assisting education and aiding navigation.
Previous studies have developed various standardized tests to assess map-reading skills. Topographic map reading has been a major subject in map-reading tests. Carswell used the devised and validated Test of Topographic Map Skills (TTMS) to investigate the topographic map-reading abilities of children [7]. The TTMS assesses the ability to read symbols, direction, scale, elevation, grid systems, and map-based information interpretation. Gilhooly et al. used a 7-item contour-map-reading test to assess the skill level of test subjects [8]. The questions included spot height, intervisibility, cross-section identification, direction of flow, and distance. Pederson et al. implemented an 8-question test to examine the outcome of topographic map-reading sessions [9]. The test involved knowledge In this study, we used eye tracking and structural equation modelling to quantitatively assess map-reading skills. We collected eye movement data from 258 undergraduate students at Beijing Normal University. We used measures related to the first fixation, processing and search to evaluate map-reading tasks that included map visualization, topology, navigation, and spatial association.

Participants
We recruited 272 participants (aged 20 ± 2 years) from Beijing Normal University, all of whom were undergraduate students in the Faculty of Geographical Science. Participants with myopia were allowed to participate with glasses. However, 14 individuals were excluded from the experiment due to failures during the calibration phase, while the remaining 258 participants successfully continued with the experiment.

Apparatus
We used a Tobii 120 eye tracker with a 17-inch monitor and a sample rate of 60 Hz. The eye tracker recorded with an accuracy of 0.5 • and a spatial resolution of 0.2 • . The monitor had a resolution of 1280 × 1024 pixels and display colours of 16.7 M (true 8-bit) to display the stimuli. Data were recorded with Tobii Studio. The experiment was conducted in a well-lit room on campus, and no disruptions occurred during the experimental period. Recordings with sample rates lower than 70% were removed before further analysis.

Materials
Five tasks were presented to the participants in sequence, and all tasks included multiple-choice questions that were based on a map. The descriptions of the tasks and the sample maps can be found in Table 1. All materials were written in Chinese, which was the native language of the participants. In this study, we used eye tracking and structural equation modelling to quantitatively assess map-reading skills. We collected eye movement data from 258 undergraduate students at Beijing Normal University. We used measures related to the first fixation, processing and search to evaluate map-reading tasks that included map visualization, topology, navigation, and spatial association.

Participants
We recruited 272 participants (aged 20 ± 2 years) from Beijing Normal University, all of whom were undergraduate students in the Faculty of Geographical Science. Participants with myopia were allowed to participate with glasses. However, 14 individuals were excluded from the experiment due to failures during the calibration phase, while the remaining 258 participants successfully continued with the experiment.

Apparatus
We used a Tobii 120 eye tracker with a 17-inch monitor and a sample rate of 60 Hz. The eye tracker recorded with an accuracy of 0.5° and a spatial resolution of 0.2°. The monitor had a resolution of 1280 × 1024 pixels and display colours of 16.7 M (true 8-bit) to display the stimuli. Data were recorded with Tobii Studio. The experiment was conducted in a well-lit room on campus, and no disruptions occurred during the experimental period. Recordings with sample rates lower than 70% were removed before further analysis.

Materials
Five tasks were presented to the participants in sequence, and all tasks included multiple-choice questions that were based on a map. The descriptions of the tasks and the sample maps can be found in Table 1. All materials were written in Chinese, which was the native language of the participants. In this study, we used eye tracking and structural equation modelling to quantitatively assess map-reading skills. We collected eye movement data from 258 undergraduate students at Beijing Normal University. We used measures related to the first fixation, processing and search to evaluate map-reading tasks that included map visualization, topology, navigation, and spatial association.

Participants
We recruited 272 participants (aged 20 ± 2 years) from Beijing Normal University, all of whom were undergraduate students in the Faculty of Geographical Science. Participants with myopia were allowed to participate with glasses. However, 14 individuals were excluded from the experiment due to failures during the calibration phase, while the remaining 258 participants successfully continued with the experiment.

Apparatus
We used a Tobii 120 eye tracker with a 17-inch monitor and a sample rate of 60 Hz. The eye tracker recorded with an accuracy of 0.5° and a spatial resolution of 0.2°. The monitor had a resolution of 1280 × 1024 pixels and display colours of 16.7 M (true 8-bit) to display the stimuli. Data were recorded with Tobii Studio. The experiment was conducted in a well-lit room on campus, and no disruptions occurred during the experimental period. Recordings with sample rates lower than 70% were removed before further analysis.

Materials
Five tasks were presented to the participants in sequence, and all tasks included multiple-choice questions that were based on a map. The descriptions of the tasks and the sample maps can be found in Table 1. All materials were written in Chinese, which was the native language of the participants.

Procedure
Participants were guided to complete and sign an entry form with their basic information once they entered the room. Then, they were required to participate in a calibration to guarantee the accuracy of the recordings. They were next instructed to submit their answers and advance to the next task using the computer mouse and keyboard. They were also told that the test had no time limit and that they could not return to the previous page once they had advanced. The participants then proceeded to finish the map-reading tasks.

Eye Movement Metrics
Previous research reported that expert and novice map readers show differences in terms of fixations and scan paths, indicating differences in both the processing and searching of information [26][27][28]. To assess the performance of the participants, we selected the following metrics ( Table 2).

Procedure
Participants were guided to complete and sign an entry form with their basic information once they entered the room. Then, they were required to participate in a calibration to guarantee the accuracy of the recordings. They were next instructed to submit their answers and advance to the next task using the computer mouse and keyboard. They were also told that the test had no time limit and that they could not return to the previous page once they had advanced. The participants then proceeded to finish the map-reading tasks.

Eye Movement Metrics
Previous research reported that expert and novice map readers show differences in terms of fixations and scan paths, indicating differences in both the processing and searching of information [26][27][28]. To assess the performance of the participants, we selected the following metrics (Table 2). Table 2. Selected eye movement metrics used to assess the performance of participants.

Measure of First Fixation
Measure of Processing Time to first fixation (AOI 1 ) First-fixation duration (AOI)

Procedure
Participants were guided to complete and sign an entry form with their basic information once they entered the room. Then, they were required to participate in a calibration to guarantee the accuracy of the recordings. They were next instructed to submit their answers and advance to the next task using the computer mouse and keyboard. They were also told that the test had no time limit and that they could not return to the previous page once they had advanced. The participants then proceeded to finish the map-reading tasks.

Eye Movement Metrics
Previous research reported that expert and novice map readers show differences in terms of fixations and scan paths, indicating differences in both the processing and searching of information [26][27][28]. To assess the performance of the participants, we selected the following metrics (Table 2).

Procedure
Participants were guided to complete and sign an entry form with their basic information once they entered the room. Then, they were required to participate in a calibration to guarantee the accuracy of the recordings. They were next instructed to submit their answers and advance to the next task using the computer mouse and keyboard. They were also told that the test had no time limit and that they could not return to the previous page once they had advanced. The participants then proceeded to finish the map-reading tasks.

Eye Movement Metrics
Previous research reported that expert and novice map readers show differences in terms of fixations and scan paths, indicating differences in both the processing and searching of information [26][27][28].
To assess the performance of the participants, we selected the following metrics (Table 2).  1 AOI is used as an abbreviation for "area(s) of interest".
• Measure of first fixation: Metrics related to first fixation are additional important metrics that illustrate visual behaviour. Two metrics were selected regarding first fixation: the time to the first fixation of the AOIs and the first-fixation duration within the AOIs. The time to the first fixation of an area can indicate the saliency of that area [31]. Quickly fixating on a particular area also indicates deliberately directed attention that might be a result of expertise [27]. First-fixation duration indicates the interest of the participant in a particular area, and it also indicates potential difficulties in interpretation.

•
Measure of processing: Two fixation-related metrics were selected as the measures of processing: percentage of total fixation duration and percentage of fixation count. The total fixation duration (dwell time) is the sum of the duration of all fixations; the total fixation on an area could indicate interest or difficulty in interpretation [27]. Fixation count can also represent the interest of the participant [22]. Because we selected areas of interest (AOIs) for each map, the percentage of total fixation duration and the fixation count inside the AOIs were calculated as measures of processing. • Measure of search: The measure of search included two saccade-related metrics, i.e., saccade count and scan-path length (in screen pixels). Saccade count is the number of saccades recorded.
More saccades indicate that more effort was spent on searching. Scan-path length is the sum of all scan (saccade) paths, and a longer scan path suggests a less efficient search [31]. (It is noteworthy that the saccade count and scan-path length are not limited to the AOIs; they are calculated from the entire stimuli.) • General performance: Effectiveness (accuracy) and efficiency are the two major measures used to assess the general performance of a participant [23]. In our case, accuracy was measured based on whether the answer submitted by the participant was correct. Efficiency was measured based on the response time. These metrics were not included in the modelling process, but they were used to evaluate the model (see Section 5).

Bayesian Structural Equation Modelling and Data Imputation
We used structural equation modelling to model the relationship between eye movement metrics and map-reading skills. The structural equation modelling was conducted using the AMOS program [32].
The proposed model to associate eye movement metrics and map-reading skills is presented as "Proposed model" in Figure 1. The ellipses represent the latent variables (e.g., measure of procession, measure of search, measure of first fixation, and map-reading skill), and the rectangles represent the observed variables (e.g., eye movement metrics). We proposed a reflective model where variance in map-reading skills would result in variance in the three measures, and each measure had two eye movement metrics as indicators. Before we fitted the model, all observed variables were standardized. In addition, for ease of interpretation, three variables (i.e., time to first fixation, saccade count, and saccade length) were transformed (by changing the sign of the variables); thus, higher scores indicated better performance [33].
To assess performance, we calculated scores on latent variables for each participant. This was achieved using data imputation. Before the data imputation, we toggled the latent variables into observed variables with missing values [32], as shown in "Model for data imputation" in Figure 1. Bayesian imputation was applied to address the non-numeric variables. Figure 1. From the proposed model to the model for data imputation using toggling [32].
After imputation, we combined the multiple-imputed results before the regression weights (i.e., path coefficients) were calculated. The multiple-imputed estimation of regression weights was the mean of the estimated regression weights of all completed datasets [32,34].
We used several indices to evaluate the fit of the models. While the most common index is the 2 statistic, as a significant 2 suggests a poor fit, the 2 statistic is correlated with the sample size; thus, this statistic will suggest significance when a large sample size is used, even when the difference between the data and the model is small [35]. Thus, the 2 / statistic was adopted (where stands for "degree of freedom"), and a value less than 2.0 suggested a good fit. Another fit measure is the standardized root mean square error of approximation (RMSEA), where a value less than 0.08 suggests a fair fit, and a value less than 0.05 suggests a good fit [36]. The normal fit index (NFI) compares the proposed model against the null model, and a value above 0.95 is considered to indicate good model fit [37].

Model Fit
The fit measures for each model are shown in Table 3. For all the models, the 2 / statistics were less than 2, the NFI was greater than 0.95, and the RMSEA was below 0.05. These results suggest good fits in all five models. After imputation, we combined the multiple-imputed results before the regression weights (i.e., path coefficients) were calculated. The multiple-imputed estimation of regression weights was the mean of the estimated regression weights of all completed datasets [32,34].
We used several indices to evaluate the fit of the models. While the most common index is the χ 2 statistic, as a significant χ 2 suggests a poor fit, the χ 2 statistic is correlated with the sample size; thus, this statistic will suggest significance when a large sample size is used, even when the difference between the data and the model is small [35]. Thus, the χ 2 /d f statistic was adopted (where d f stands for "degree of freedom"), and a value less than 2.0 suggested a good fit. Another fit measure is the standardized root mean square error of approximation (RMSEA), where a value less than 0.08 suggests a fair fit, and a value less than 0.05 suggests a good fit [36]. The normal fit index (NFI) compares the proposed model against the null model, and a value above 0.95 is considered to indicate good model fit [37].

Model Fit
The fit measures for each model are shown in Table 3. For all the models, the χ 2 /d f statistics were less than 2, the NFI was greater than 0.95, and the RMSEA was below 0.05. These results suggest good fits in all five models.

Path Coefficients
According to the proposed model, map-reading skills are reflected by the measure of first fixation, the measure of processing, and the measure of search. The path coefficients for the model after imputation can be found in Table 4. The imputed models with coefficients can be found in the Appendix A (Figures A1-A5). The path coefficients from map-reading skill to measure of first fixation varied within the five tasks, ranging from 0.341 to 0.725 (mean = 0.482, SD = 0.139).
The path coefficients from map-reading skill to measure of processing also fluctuated but had greater variance (mean = 0.619, SD = 0.183). These path coefficients had larger absolute values than those from map-reading skill to measure of first fixation and to measure of search, although the differences were not significant.
Regarding the path coefficients from map-reading skill to measure of search, a negative value was reported for task #5 (β5 = −0.300), while tasks #1, #2, #3, and #4 all had positive values.
Regarding the measurement model, there was a clear pattern concerning the path coefficients from measure of processing and measure of search to their own indicators. The path coefficients from measure of processing to its indicators (fixation count percentage and fixation duration percentage) were above 0.92 for all five tasks. This was a consistent pattern, as all path coefficients from measure of search to its indicators were greater than 0.96. However, the path coefficients from measure of first fixation to its indicators showed another pattern. The path coefficient from measure of first fixation to time to first fixation was approximately 0.65 (β1 = 0.618, β2 = 0.708, β3 = 0.641, β4 = 0.658, β5 = 0.665), but the path coefficient from measure of first fixation to first-fixation duration was approximately 0 (β1 = −0.249, β2 = −0.034, β3 = −0.165, β4 = −0.104, β5 = −0.117).

Imputed Scores
To evaluate the scores generated by the models, we tested them against the answers and response times of the participants.

Imputed Scores and Response Times
No strong correlation was observed between map-reading skill scores and response times ( Table 7). The absolute values of the correlations between the map-reading skill scores and the response times for all five questions were less than 0.6, and both positive and negative values were reported (r 1 = −0.073; r 2 = −0.358; r 3 = −0.531; r 4 = −0.431; r 5 = 0.273).

Eye Movement Metrics and Map-Reading Skills: Path Coefficients
The participants' map-reading skills were primarily reflected by the measure of processing. This result suggests that participants with better map-reading skills focused more on important information, which is consistent with the finding of Ooms et al. that experts tended to fixate more on major structural elements [27].
The path coefficients from map-reading skill to measure of first fixation differed among the tasks, ranging from 0.341 to 0.725. This result suggested that being able to locate important information quickly contributed to better performance. This was supported by the results for task #2 (the topology task), where the path coefficient was 0.725. This result could possibly be explained by the ability of participants with better map-reading skills to quickly identify the wrong answers and select the correct answer. Ooms et al. also suggested that novices were more easily distracted at the beginning of the task and thus were not able to fixate on key information immediately [27]. Additionally, inability to locate key information might suggest that a participant was confused by the map [38].
The path coefficients from map-reading skills to measure of search varied greatly among the five tasks, indicating that participants with better map-reading skills did not necessarily search with higher efficiency. For task #1 (the visualization task), the path coefficient was 0.097, indicating that map-reading skill was not reflected by the measure of search. This result might be explained by the findings of Ooms et al. [27], who concluded that people with different skill levels might still have a similar scan path during certain map tasks. Tasks #2 (the topology task), #3, and #4 (navigation tasks) reported similar path coefficients. This result suggests that for these three tasks, the participants with better skills searched with relatively high efficiency. A negative path coefficient was found for task #5 (the spatial association task). To solve task #5, participants had to determine the spatial association between the two maps. To achieve this, they had to view each map multiple times to match the pattern on one map with that on the other. Frequent switching between maps could result in more saccades and longer saccade paths, which could lead to lower search efficiency. In short, the difference in path coefficients might be explained by differences in the tasks themselves and the skills involved. Stofer and Che suggested that when experts were asked specifically about the visualization, they tended to have more fixations per visualization [28], which indicated greater meaning making [19]. Since more fixations per visualization result in more saccades, this might explain why participants with better map-reading skills did not always search with higher efficiency.
The path coefficients from the measure of processing and measure of search to the eye movement metrics that measured them were all above 0.9. This result is consistent with previous studies showing that large numbers of saccades and lengthy scan paths suggest inefficient searches. However, the two metrics for the measure of first fixation had very different path coefficients. The path coefficients from the measure of first fixation to time to first fixation were above 0.6 for all five tasks, while the path coefficients from the measure of first fixation to first-fixation duration hovered around approximately 0. This difference in the path coefficients suggests that while these two metrics are both related to first fixation, they may not share a common nature [39].

Reflections of Accuracy and Efficiency: Imputed Scores
Both the accuracy and efficiency of the participants were reflected in the scores generated by the model. In our model, accuracy weighed more than efficiency in the score, as the path coefficient from the map-reading skill to measure of processing was larger than that from the map-reading skill to measure of search in all tasks. This difference in weight is sensible in practice because selecting the correct answer is typically weighted more than completing the task in a short time.
Instead of the map-reading skill score, the efficiency of the participant was better reflected by the score related to the measure of search. A strong negative correlation was observed between the measure of search score and response time. This result suggested that the efficiency in solving the task was related to the efficiency in searching for information. The results from tasks #3 and #4 (the navigation tasks) are consistent with the finding of Liao et al. that when participants spent less effort searching, they completed the navigation task more quickly [23].
The correlation between the map-reading skill score and response time was weak for task #1, implying that participants who finished the task faster were not necessarily rewarded with higher scores. This result might be explained by the path coefficient from the map-reading skill to measure of search in task #1 (the map visualization task), in which the participants scanned the maps in similar ways.
For task #5 (the spatial association task), the positive correlation between the map-reading skill and response time revealed that participants who completed the task in a shorter time actually received lower scores. A closer examination of their answers revealed that among all 117 participants who submitted an incorrect answer, 68 of them had chosen the incorrect statement (population falls as precipitation decreases), and 44 out of those 68 participants had response times that were shorter than the average. However, while the correct solution (population first rises but then falls as precipitation continues to increase) reflected the spatial association for the whole area, this incorrect statement only reflected a part of it. Therefore, we could infer that participants needed to study the details of the map more closely, rather than scanning it briefly, to select the correct answer. Thus, for task #5, many participants answered quickly but incorrectly, ultimately selecting an incorrect answer and achieving a lower map-reading score.

Conclusions
This study proposed a model that quantitatively associated map-reading skills with eye movement metrics using structural equation modelling. An eye tracking study was conducted with 258 participants who were responsible for completing five map-reading tasks, including visualization, topology, map-based navigation, and spatial association. Map-reading skills were indicated by the measure of first fixation, measure of processing, and measure of search. The map-reading skill scores for each task were calculated for each participant using Bayesian imputation. The scores were then tested against the answers provided by the participants. The model fitted well for all five tasks. The path coefficients indicated that map-reading skills could be reflected by measures related to first fixation, processing and search. It is noteworthy that eye movement metrics used to assess map-reading skills can be applied to further examples. The scores generated by the model generally reflected the performance of the participants in terms of both accuracy and efficiency.
It should be noted that the eye movement metrics are not limited to the ones we selected. Further analysis with more eye movement metrics (such as mean fixation duration), and perhaps new grouping methods, may contribute to the accuracy of the model. Future work would also include the analysis of gender effects and the possible effect of the different courses participants have taken. This could be achieved with data from the entry forms of the participants. Additionally, modelling a task with multiple (for example, 3-5) stimuli would increase consistency and confidence. We intend to address this in future experiments, along with the validation of the model with more stimuli. suggestions.

Conflicts of Interest:
The authors declare no conflicts of interest.     Acknowledgments: The authors would like to thank all the reviewers for their helpful comments and suggestions.

Conflicts of Interest:
The authors declare no conflicts of interest.    Acknowledgments: The authors would like to thank all the reviewers for their helpful comments and suggestions.

Conflicts of Interest:
The authors declare no conflicts of interest.