Extracting Spatial Patterns of Intercity Tourist Movements from Online Travel Blogs

Spatial patterns of tourist mobility are important for tourism management and planning. A large number of traveler-generated content accumulated on the internet provide a unique opportunity for revealing comprehensive spatial patterns of tourist movements. Instead of concentrating on a single city or attraction in previous research, this work investigates the intercity travel flows extracted from the online travel blogs in China from 2012 to 2016. The descriptive statistics of travel flows are first analyzed. The distribution of travel volume is found to satisfy the power-law distribution. Based on the intercity travel flows, a network structure is then constructed to investigate tourism interactions between cities. After four communities and 14 sub-communities being detected from the network, a tourism spatial layout with regional agglomeration effects are recognized. This research concludes that distance is essential in determining tourist movements based on a spatial interaction model. Intercity travel flows decline with distance under a power-law function. These results reveal the spatial patterns of tourist movements at an intercity scale. It will be helpful for arranging tourism resources, predicting tourist flows, and maintaining sustainable tourism.


Introduction
The tourism market in the world has been booming in recent years. The consequent influence on the environment drives the focus of sustainable tourism. Tourism mobility is an important component of sustainable strategies for tourism [1,2]. The destination choices, travel trips and transportation in tourists' mobility are essential for effective policy interventions to address the impacts of tourism on the environment and destinations [3,4]. Supported by emerging information technologies, people enjoy sharing their travel experiences at tourism websites. Many high-quality travel blogs and reviews that have accumulated online provide new approaches and abundant resources for investigating tourism mobility. By extracting information from tourists' experiences, their travel flows can be reconstructed. The tourist movements are not only the direct manifestation of their behavior and perception, but also the interaction between travelers and the attractions. An in-depth understanding of the inherent mechanism and spatial patterns will benefit tourism planning and sustainable development.
There has been research focused on tourist movements. The concept of tourism flow was first proposed by Mercer [5] and Rajotte [6]. They concluded that the spatial scale of tourism flows was related to the leisure time of tourists. Distance is found as another essential factor to determine travel activities. Using questionnaires of self-driving travelers in Nanning, Liu et al. found that all spatial characteristics of the travel flows were consistent with the general distance decay pattern, but some attractive attractions caused irregular changes [7]. Tourist experience is determined by various factors [8]. What affects the selection of tourist destinations, such as travelers' age, cognition psychology, price, spatial configuration and urban soundscape, were also investigated in detail [9][10][11][12][13][14][15].
Benefiting from big data in recent years, researchers have gained deeper insight into spatial activities of tourists. GPS traces, transit smart card data, social media, and other online contents generated by tourists have become the most popular and convenient resources. By integrating GPS technologies and surveys, movement patterns of various tourists were found in a city or around a tourist attraction [16][17][18]. It was found from their mobile phone positioning data that the distribution of tourist volume satisfied the attenuation effect of distance [19]. Survey questionnaires and travelers' active participation (in-vehicle GPS data and transit smart card data) have improved the precision of travel models [20]. Social media, tourism websites, and other online platforms provide numerous opportunities for tourists to present their travel experience. People also create their expectations about the tourism destination by trusting received social media contents [21]. These online user-generate contents (UGC) contain rich information for discovering tourist movement patterns. Wise et al. assessed UGC using an interpretative framework when the Facebook page "See You in Iran" is used to promote the tourism of Iran [22]. Hu et al. crawled tourists' tweets with geo-tags, and then applied DBSCAN and network analysis methods to detect tourist movement patterns in New York [23]. Ahani et al. developed a method to predict spa hotel segmentation and travel choice by applying machine learning approaches based on online reviews and ratings [24]. By collecting information from an open tourism web service, the temporal heterogeneity in intracity tourist movements were explored, and the power law of distance decay of tourist mobility was confirmed in Nanjing, China [25]. From posts on the NAVER blog in Korea, it was found that London, Paris, Venezia, and Firenze were key cities where Korean backpackers tended to enter Europe [26]. Tourist trajectories were extracted from Flicker's geo-tagged photos and then motif tourist mobility patterns were detected [27]. Wu et al. introduced a tourism hotspot network approach to investigate travel patterns from social media data for tourism resources planning [28]. Moreover, Travel topics or sentiments were discovered from online traveler-generated content, such as Flicker's geo-tagged photos [29,30], tweets [31], or multi-source travelogues [32]. Based on various data sources, travelers' profiles and movement patterns were depicted, tourism market segmentation and travel choice were predicted, and finally the tourism was promoted. Multi-source data provide power for tourism researches and applications.
Most of the previous works focused on individual travel mobility at small scales for attraction descriptions or tourism recommendations. The investigation of tourist movements is mainly confined to a single city or attraction. However, tourists usually travel more often among cities within their country. They travel from the city where they live to the destination city, constructing a flow between the two cities. Intercity tourism is more valuable for the national tourism market. Furthermore, collective intercity travels imply specific tourist mobility at a large spatial scale, which is different to intracity movements. Intercity tourism involves more long-distance transportation, accommodation, entertainment, shopping and so on. These are all primary elements of sustainable tourism. Characteristics of the departure and destination cities, travel modes and travel distances affect the activities of tourists which in turn make impacts on the local environment and economy. Revealing the spatial arrangement and patterns of intercity travels will provide the local and national governments with necessary insights into the tourism mobility. It can help planners assess the attractiveness and carrying capacity of tourism destinations and optimize tourism policies based on collective travel movements. Transportation can also be improved to reduce environmental impacts according to the tourist distributions and travel distances, and thus maintaining the sustainable development.
Under the framework of social sensing [33], online travelogues are utilized to extract intercity tourism flows. These tourist-generated contents provide a new perspective to investigate tourist spatial behaviors which are different from what were found in other data sources. Disciplines of statistical physics, complex network approaches, and spatial interaction theory are integrated to provide the theoretical basis and methods for investigating the spatial structures, interactions and patterns of travel mobilities [34][35][36]. By exploring intercity tourism flows, city-level movement patterns of tourists are revealed from multiple perspectives and the community structure of tourist cities are discovered in multi-scale. These knowledge can contribute to tourism resources arrangement, tourist flows prediction, and tourism recommendation.

Data
Baidu Travel (https://lvyou.baidu.com/) is one of the most popular tourism websites in China to share travel experiences through blogs. For each trip, the origin city and the destination city described in a blog can be extracted to the form with triple attributes (user, origin city, destination city). Each travel entry represents one movement from one city to another. Domestic tourism is different from the overseas because of the tourism motives and travel modes. To simplify the problem, only the travels in mainland China were considered. The travel entries whose departures or destinations are beyond the research area were filtered out. Finally, 1,105,928 travel entries of 72,999 users from 2012 to 2016 in China were collected. The valid travel flows between two cities in mainland China are represented as bright red lines in Figure 1. The brighter the color in a location, the more flows are from and to the corresponding city. Note that the amount of users cannot exactly represent the tourist distribution in cities due to their uneven distribution in different cities. Penetration, which is proposed for human mobility in Twitter [37], was employed for the representativeness of users in a city. The travel penetration was defined as the proportion of users to the total population of a city. As expected, the more developed the economy of a city, the greater the penetration was. It was similar to the popularity diffusion of social media such as Weibo and Twitter. For the representativeness, the cities with fewer than 20 users or less than 0.001% penetration were filtered out. Finally, 259 cities were retained.

Methods
Cities are connected by a large number of travel flows to form a network structure. This structure contains rich information about the relationship between cities, revealing the topological property and spatial arrangement of tourism movements. At the same time, the flows and structures are distributed in space, so spatial effects, especially distance, have impacts on them. Therefore, quantitative, structural and spatial factors were the three primary aspects of investigating tourism movements. Statistical methods were first utilized to explore the distributions and disparities of tourists, travel inflows and outflows. Then, a complex network structure of intercity travels was constructed based on the travel origin and destination pairs to discover the topological and structural properties of flow patterns. Finally, the distance decay of intercity tourism interaction was modeled to investigate the spatial effects on tourist mobility.

Statistical Analysis
Statistical analysis was used to investigate the distributions of travels described in travel blogs, thus discovering the macroscopic patterns of collective tourist mobility between cities. The correlation between the effective users and the rank of a city was calculated by Equation (1) using log-log transformations, where r represents the volume rank of a city, P r is the user volume of the r − th city, P 1 is the theoretical value of the top-ranked city, and q represents the degree of decline trend in the city's user volume with its rank increase. In the same way, the accumulative travels of all users in a city against its rank were fitted by log-log transformations as well. Equation (1) verifies whether the correlations satisfy the rank-size law that is subjected to a power law distribution. It characterizes the disparity that a small number of cities contribute or attract most tourists and most cities have a small number of tourists. Additionally, the average gyration radius of all users in each city was calculated to measure the collective travel pattern. The gyration radius indicates the average movement extent of all travelers from a city. The larger the gyration radius is, the longer distance the preferred travels are.

Complex Network Analysis
Complex network analysis was used to analyze human mobility or behavior to reveal spatial structures or interactions. Two cities are considered to interact with each other if users travel from one to the other. All cities are connected by travel flows to construct a network structure in which a vertex denotes a city and an edge denotes the interaction relationship. Then, the intercity interaction characteristics were investigated by complex network methods.
Generally, a city interaction network of tourism is an indicted weighted graph represented by a where V i and V j are two vertices in V connected by the edge, and W ij ∈ W is the weight of the edge. W ij is defined by Equation (2), where F ij represents the travel flows from city i to city j. The number of vertices is N = |V| and the number of edges is M = |E|. The structural characteristics of the city interaction network were evaluated by statistical parameters including centrality, small-world property, degree distribution and assortativeness. Degree distribution can be described by the distribution function P(k), which is the probability that the degree of a random node is exactly k. In the proposed network of intercity travel flows, however, the degree spectrum is discrete with P(k) = 0 at some ks, so the cumulative degree distribution function [38], which is the probability distribution of vertices with degrees no less than k, is employed instead here as Equation (3), If the degree distribution satisfies the power law P(k) ∝ k −γ , the cumulative distribution follows the power law with an exponent of γ − 1. If the degree distribution satisfies an exponential distribution which is P(k) ∝ e −γ * k where γ > 0 is a constant, the cumulative distribution also satisfies the exponential distribution with the same exponent.
Network assortativity describes the tendencies of nodes in a network to connect to nodes with the same degree, measured by assortativity coefficient Γ and k nn (k). The assortativity coefficient Γ is represented by Equation (4): where j and k represent the degrees of nodes at both ends of the edge, respectively. The range of Γ is [−1, 1]. if Γ > 0, the network is assortative, i.e., the nodes with larger degrees tend to connect to the nodes with larger degrees in the network. On the contrary, the network is disassortative, indicating that the nodes with larger degrees tend to connect to the nodes with smaller degrees. k nn (k) is the average degree of the adjacent nodes of the nodes with a degree of k, as calculated by Equation (5): If k nn (k) ∝ k −µ , µ > 0, is satisfied for any k, the network is disassortative and µ is disassortativity index. The disassortativty of a network can be proved by Γ and µ.
The city interaction network was partitioned by a community detection method to find clustering structures. Then, the spin-glass model based on modularity optimization [39] was employed to investigate the community structure of the city interaction network.

Spatial Interaction Model
Intercity interactions are described by travel flows of tourists, in which some cities are primarily determined as sources with more outflows and less inflows and others are sinks with more inflows and less outflows. The interactions reveal not only the travel patterns of each city but also the relations between cities. The former were investigated by incoming or outgoing flows of a city as the travel origin or destination, respectively. The latter were evaluated by the interaction model that measures distance decay.
Intercity travel patterns were discovered from the similarity of travel flows between cities. Let F ij denote the frequency of travels from city i to city j, and N is the number of cities, then the outflows of city i are represented as a vector out f low i =< F i1 , F i2 , ..., F iN >, and its inflows are in f low i =< F 1i , F 2i , ..., F Ni >. The spatial distributions of the outflows or inflows indicate the choice patterns of tourist destinations from a city or tourist origins to a city, respectively. Pearson's correlation coefficient was employed to measure the tourist similarity of two cities, as Equation (4), where R ij is the correlation coefficient between city i and city j, F it is the flow from i to t, and F i is the average flow of city i. From the perspective of the tourism source, R ij measures the similarity of the destination choices of travelers from the two cities if both F i and F j are average outflows. From the perspective of a sink, on the contrary, R ij measures the similarity of tourist origins to the two cities if both F i and F j are average inflows. Based on the similarity measurement, all cities were clustered by a hierarchical method to discover the spatial patterns of the tourist flow distributions. The intercity interaction of tourist mobility was modeled by the gravity model [40]. Constrained by the distance decay effect, generally, the spatial interaction intensity of two cities was negatively correlated with the distance. The negative power-law function was employed as the distance decay function to fit the following gravity model, where F ij represents the flow frequency from city i to city j, p i represents the volume of users traveling from city i, p j represents the number of users traveling to city j, d ij is the geographic distance between the two cities, and k is a constant factor. α, β and γ are parameters to be estimated, in which γ is the distance friction coefficient evaluating the impact of distance on flows. This model was solved by linear regression of the inverse gravity model. Equation (7) is transformed to Equation (8) by logarithmic transformation as Finally, the optimal solution of the parameters α, β and γ are solved under a certain condition by the least square method.

Tourist Distributions
By fitting the effective tourist volumes of cities to Equation (1), it was found that user volumes satisfied the power-law distribution against their rank with P 1 = 8259 and q = −0.84, as shown in Figure 2a. The amounts of users in different cities were significantly different. A small number of cities concentrated most of the travelers, while most cities accounted for a small proportion of travelers. The first-ranked city was Beijing with 8143 users which was very close to the estimated P 1 . The R 2 reached 0.981 under the 0.01 significance level, indicating that the fitting result could explain almost all the variance of the traveler counts from all cities. The spatial disparity of tourists was significant. The total travels of all users in each city were accumulated. Then, in the same way, the travel volumes and ranks of the cities were fitted by Equation (1) after log-log transformations, as shown in Figure 2b. It was estimated that P 1 and q were 168,700 and 0.93, respectively, with R 2 = 0.987 under the 0.01 significance level. The travel volume in a city presented a negative power-law decay with its rank. The spatial disparity of travel volumes was significant, too.

Flow Distributions
To reveal the tourism sources and sinks, all travel inflows and outflows that were extracted from the travel blogs in every city were quantified. The top 20 cities are shown in Figure 3. The cities with large inflows indicated tourism sources, and the cities with large outflows were usually sinks. Among the top 20 cities of outflows, most were provincial capitals or in eastern China with developed economic, for example, Beijing, Shanghai, Guangzhou, and Xi'an. In these cities, the residents had relatively higher incomes, which led to more tourists. Because of more mature social networks, people were more willing to share their travel experience on the Internet. The top 30 travel destinations from four representative tourism source cities are shown in Figure 4. It was obvious that most travelers preferred big tourism cities or surrounding cities. Short-distance travel was favored as well when time and cost were taken into account.  Among the top 30 cities of inflows, many well-known tourism cities ranked relatively high, for instance, Guilin, Sanya, Aba Prefecture, and Qingdao. Rich tourist resources in these cities attracted a large number of visitors all over the country, thus forming representative tourist sinks. Top cities where travelers went to four tourist-sink cities are shown in Figure 5. Distance was no longer a constraint factor for travels in inflows than in outflows.

Interaction Network Features
The city interaction network of tourism was constructed as shown in Figure 6. The network contains 259 city nodes and 9283 interaction edges. The colors of the edges are the same as the colors of the departure cities. The cities with more connections are closer to the center, while the cities with less connections are more likely on the periphery. Table 1 shows the basic statistics of the network.
The average degree of nodes in the network was 71.7, which represented the average number of interactions between cities. The diameter of the network was 2. Almost all cities connected to Beijing that was the main hub city. Subsequently, any two cities could be associated indirectly through Beijing.  Centrality indicates the importance and influence of a city in the tourism network. A city's prominence, transit capacity, and accessibility are measured by degree centrality, betweenness centrality, and closeness centrality respectively. The top 20 cities of degree centrality, betweenness centrality and closeness centrality in the city interaction network of tourism are shown in Table 2. The top 20 cities mostly the first or second-tier cities in China, including state capital (Beijing), municipality directly under the central government (e.g., Shanghai, Shenzhen and Chongqin) and provincial capitals (e.g., Guangzhou, Chengdu and Wuhan) with developed economic. The correlations between these indicators and the ranks of all cities are shown in Figure 7. They are all subjected to exponential distributions, which are plotted as the red lines. The values of R 2 , which equal to 0.97, 0.99 and 0.97 under the 0.01 significance level, indicate that the correlations are strong and significant. The degree centrality satisfied the exponential decay with its rank. Beijing ranked first with the normalized degree centrality equaling 1. All other cities connected to Beijing directly, which was also the reason that the diameter of the network was 2. Forty-four cities had values greater than 0.5, which meant that these cities interacted directly with more than half of the nodes in the network. Cities with a better economy and developed tourism industry had higher degree centrality. The betweenness centrality satisfied the exponential decay and heavy-tailed distribution. It was positively related with degree centrality. The descending trend of the closeness centrality satisfied the exponential decay. Cities with a more developed economy and tourism industry had relatively higher closeness centralities as well.

Small-World Property
Small world property measures the interconnectivity between cities. The average shortest path length of the interaction network was 1.72, which was very close to 1.71 of the random network. The aggregation coefficient was 0.812, which was much higher than 0.277 for the random network. Therefore, the city interaction network of tourism had small-world network properties and was relatively compact. A city with more tourism resources would drive surrounding small cities to form a large tourism region.

Degree Distribution
Degree distribution measures the disparity of the connections between cities. The degree of the interaction network satisfied the exponential distribution as shown in Figure 8a. The disparity of travel flows was obvious, although it was less strong than that under a power-law distribution that was more common in complex networks. A small number of famous cities attracted plenty of tourists from most cities, while most cities only had tourists from a few cities. Less than 13% of cities interacted more than half of other cities.

Disassortativity
Disassortativity indicates the connectivity between dissimilar cities. The correlation between k and k nn (k) is plotted in Figure 8b. The network conformed to heterozygosity because the index satisfied the power-law decay, i.e., k nn (k) ∝ k −µ with the heterogeneity index µ = 0.297. As a result, cities with high degrees were intermediary hub nodes, and they were dispersed in the network rather than clustered together. Conversely, cities with low degrees were usually at the periphery of the network. On the other side, the assortativity coefficient Γ was −0.43. The negative exponent verified the disassortativity of the network. Cities with high degrees tended to connect the low ones, and vice versa. Therefore, travels were more popular between big cities and small cities. Short tours around cities were the main choice on short holidays because of the convenience and limited time. Cities with more tourism resources promoted the tourism of the adjacent small cities.

Community
Modularity is designed to measure the strength of division of a network into communities. Higher modularity represents a closer connection between nodes in a community and sparser connection between communities. By iteratively setting the classification numbers of the spin-glass model, the modularity of the city interaction network peaked at four, so the network was partitioned into four communities as shown in Figure 9a. There were more travel flows between cities inside a community.
The four communities formed four significantly clustered regions in China mainland. Community 1 was the northeastern region centered on Beijing, mainly including the Beijing-Tianjin-Hebei-Shanxi-Inner Mongolia region, the three northeastern provinces, and Shandong Peninsula. Community 2 was the eastern region centered on Shanghai, mainly including Jiangsu, Zhejiang, Shanghai, and Fujian. Community 3 was the southern region mainly consisting of Hunan, Hubei, Guangdong, Guangxi, and Hainan. Community 4 was the western region including Xinjiang, Qinghai, and some northwestern and southwestern cities. To reveal more details, the network was further partitioned into 14 sub-communities as shown in Figure 9b based on the previous four communities. For example, community 1 was divided into three sub-communities including the Shandong Peninsula, the three northeastern provinces, and Beijing-Tianjin-Hebei-Shanxi-Inner Mongolia. Each sub-community contained a core city with higher degree centrality, betweenness centrality, and close centrality. The core city drove the tourism development of its adjacent cities together.
Regional boundaries of the communities were consistent with the administrative boundaries. For example, the Shandong Peninsula region was completely consistent with the provincial boundary of Shandong Province. It implied provincial administrations had a great influence on tourism arrangement. As the city interaction network of tourism was a small-world network, cities within a community had more travel flows between each other, but less travel flows occurred between communities. Actually, the communities consisted of the tourism regions resulted in by its small-world property.

Similarity of Travel Flows
Travel flows of a city can reveal its tourism patterns. The outflows indicate a source city and its choices of tourist destination. Similarly, the inflows manifest a sink city and its tourist origins. Through the outflow and inflow vectors of cities, tourism choices about destinations or origins were discovered.
For the tourism sources, the similarity between a given city and others was measured by the Pearson's correlation coefficients of their outflows. Four representative source cities, including Beijing, Shanghai, Guangzhou, and Xi'an, were selected to illustrate the spatial distribution of outflow similarities as shown in Figure 10. In each map, the green point represents the location of the specific city, and the gradient colors denote the similarities between this specific city and the corresponding regions. Cities with shorter distances usually exhibit higher travel similarities with more similar travel destinations.
Hierarchical clustering was employed to investigate the global travel similarities of all cities. In the hierarchical clustering dendrogram, several cutting levels were artificially chosen from the bottom to the top according to its tree structure. After trials and errors to obtain good cluster separation and visualization, four cluster numbers, 29, 14, 7 and 2, were selected to explore spatial aggregation trends in near cities. Four level hierarchical results, i.e., 29, 14, 7, and 2 clusters, are shown in Figure 11. The results exhibited the similarity of travel destination choices of all cities. In the process of bottom-up aggregation, cities with a shorter distance tended to merge into one region. In the end, two regions of the north and the south separated by the Yangtze River were distinguished.
In the same way, the similarities of inflows, as well as of tourist origins, were measured to investigate the sink cities. Four famous tourism cities including Aba, Chengdu, Dali, and Guilin are selected to plot the spatial distribution of inflow similarities in Figure 12, in which the green points represent the four cities. The clustering results are shown in Figure 13. It was found that distance had less effect on tourist origins than on destination choices. Even though the trend of geographical aggregation still existed, similar cities usually attracted similar tourists. For example, developed cities and tourism cities were usually clustered into their respective groups.

Distance Decay Effect
The correlation between the travel frequency and its distance satisfied the negative power-law distribution as shown in Figure 14a with exponent γ = 0.585. The correlation between the travel frequency and the gyration radius of travelers is shown in Figure 14b. It was not a one-way declining trend as expected but rather divided into two parts. When the gyration radius was less than 800 m, the travel frequency increased linearly. After that, an increase in the gyration radius led to a sharp drop in the travel frequency. By fitting the second part individually, the correlation approximately satisfied the negative power-law distribution with exponent γ = 1.60. The exponent was in a similar value range as the results obtained from other mobility datasets that also obey the negative power-law distributions, such as the cell phone call records (γ = 1.75) [41], the dispersal of bank notes in the United States (γ = 1.59) [42], Foursquare check-ins (γ = 1.88) [43], and geo-located tweets across the world in 2012 (γ = 1.62) [37].

Gravity Law
By solving the gravity model defined by Equation (5), the parameters α, β and γ were estimated as 0.716, 0.275 and 0.48, respectively. As a result, the tourist volume, no matter in origin or destination cities, had a sublinear correlation with the travel flows, but the former had a stronger influence than the latter. The declining distance friction coefficient was slightly smaller (γ = 0.48), so the tourist movement between cities had a slight distance decay effect. The exponent was essentially consistent with what was obtained by Xiao et al. (0.4 < γ < 0.6) from the air passenger flows in China [44]. The exponent that was approximately 0.5 indicated a hybrid tourist pattern of both long-distance travels to determinate destinations and short-haul trips of random choices.

Conclusions
Social media and online content generated by travelers provide a good way to investigate tourist activities and experiences. Using travel blogs from online tourism websites, the collective spatial patterns of intercity tourist movement are discovered from multiple perspectives. The rank of travel volume satisfies the power-law distribution. Developed cities generate more tourists, and more travel flows occur between developed cities. To investigate intercity travels, an interaction network is constructed based on intercity travel flows. The network is found to have an exponential degree distribution, disassortativity, and small-world property. The spatial arrangement of tourism in mainland China is also recognized after four communities and 14 sub-communities are detected. Intercity tourism presents a regional agglomeration effect because travel flows exist more in one community but less between communities. By distinguishing tourist sinks and sources, it is found that tourists from similar cities usually have similar tourism choices. Specifically, distance is essential in determining tourist movements. Intercity travel flows decline with distance under a negative power-law distribution.
These results reveal the spatial patterns of tourist movements at an intercity scale. The spatial disparity of tourist sources is significant in China. Developed cities contribute most of the tourists because of their developed economy and large population. Tourists from similar cities have similar travel choices, and similar attractions have similar tourist sources. Spatial factors, especially the travel distance, make a large impact on tourist mobility. Tourism intentions have significant regionality. The spatial mobility, arrangement and patterns of tourism discovered in the results will be helpful for arranging tourism resources, predicting tourist flows, and understanding tourist activities. They can provide a basis for local, regional and national governments for tourism planning, city management, and sustainable development.
More efforts can be made for further expansion and deepening. The textual contents of online travel blogs are not included in this research. Text mining can be further conducted to discover detailed thematic information about tourist attractions. Moreover, the representativeness of travel blogs for general tourist mobility is still a worthwhile discussion because of the sparsity, incompleteness, and possible bias.

Conflicts of Interest:
The authors declare no conflict of interest.