Exploiting Two-Dimensional Geographical and Synthetic Social Inﬂuences for Location Recommendation

: With the rapid development of location-based social networks (LBSNs), because human behaviors exhibit speciﬁc distribution patterns, personalized geo-social recommendation has played a signiﬁcant role for LBSNs. In addition to user preference and social inﬂuence, geographical inﬂuence has also been widely researched in location recommendation. Kernel density estimation (KDE) is a key method in modeling geographical inﬂuence. However, most current studies based on KDE do not consider the problems of inﬂuence range and outliers on users’ check-in behaviors. In this paper, we propose a method to exploit geographical and synthetic social inﬂuences (GeSSo) on location recommendation. GeSSo uses a kernel estimation approach with a quartic kernel function to model geographical inﬂuences, and two kinds of weighted distance are adopted to calculate bandwidth. Furthermore, we consider the social closeness and connections between friends, and a synthetic friend-based recommendation method is introduced to model social inﬂuences. Finally, we adopt a sum framework which combines user’s preferences on a location with geographical and social inﬂuences. Extensive experiments are conducted on three real-life datasets. The results show that our method achieves superior performance compared to other advanced geo-social recommendation techniques.


Introduction
In recent years, wireless communication network technology, handheld devices, and location technology have developed rapidly. Many location-based social networks (LBSNs) services are widely used, such as Foursquare and Gowalla. In LBSNs, when people visit or check in at a place, their locations and check-in information are shared with other people [1]. These historical check-in data in LBSNs contain abundant knowledge about users' interests and locations, which can help people discover places of interest and form new social connections [2][3][4]. Therefore, these check-in data are beneficial in a wide range of applications, such as location recommendation [5,6], event recommendation [7], and friend recommendation [8,9]. Location recommendation based on LBSNs plays an important role in providing better location-based services.
Locations that people may be interested in are recommended in location recommendation. Location recommendation technologies provide references for travel and greatly facilitate everyday life. Unlike traditional recommendation systems, location recommendation has several unique characteristics of LBSNs [10], such as geographical features, regional popularity, dynamic user mobility, and implicit user feedback. This is because physical interactions are needed when users visit locations [11]. Furthermore, modern recommendation tasks are usually exposed in a context-rich environment, such as text, spatial, and temporal information [5,12]. For example, users are linked to other users via social links, user's mobility is affected by geographical distance, and users share their experiences via traditional social networks (e.g., Microblog, Twitter, and Facebook). There are some previous studies [4,13,14] that exploit one of the above factors to improve location recommendation. In general, the decision process of a user visiting a location is complex and can be affected by many factors.
Research shows that the geographical information of locations significantly affects the check-in behavior of users. Users tend to explore the periphery of locations they have visited [15]. Thus, if a user visits a certain area most frequently, the check-in possibility of an unvisited location in this area is higher, and the possibility diminishes as the distance from the area increases. To better explore users' geographical distribution on locations, different geographical models have been proposed to model users' check-in behaviors at locations [7,10,11,[15][16][17][18][19][20][21][22][23][24]. For example, in [15], a power-law probabilistic model is proposed to capture the geographical influence among point-of-interests (POI), Liu et al. [10] assumes that users' check-in behaviors follow a multi-center distribution, and both of these studies model the geographical distance distribution. Zhang et al. [16] introduces one-dimensional kernel density estimation to obtain personalized distance distributions. In general, there are two major limitations in these studies. (1) In contrast to two-dimensional geographical models, one-dimensional geographic distance distributions cannot intuitively reflect spatial distributions. (2) The check-in locations of a user are usually distributed across several areas, and the separation between these areas may be quite great [25], e.g., some people prefer visiting places around their home while other persons prefer exploring new interesting places around the world. Therefore, personalized two-dimensional geographical models are more intuitive and reasonable in modeling geographical influence. In recent years, the approaches in [11,18] extend one-dimensional kernel density estimation to two-dimensional. The results show that two-dimensional Kernel density estimation (KDE) models have a better performance in location recommendation. However, because of data sparsity and outliers, it is difficult to find a suitable bandwidth to fit the distribution for two-dimensional KDE models.
In addition, with the growth of social networks (e.g., Meetup, Twitter, and Facebook), social links have been utilized to improve the quality of recommendations. Users often establish social links and share their experiences. For example, they often visit museums or stores together. This means friends are more likely to share common locations than non-friends, although most friends have little overlapping on their check-in locations [5]. Social collaborative filtering (SCF) is used to recommend unvisited locations to a user based on his/her friends' preference [18]. Ogundele et al. [7] adopted SCF to model the relevance of a group to a user and her friends. Zhang et al. [16,26] transformed the residence distance of users with social friendships into a normalized similarity. Guo et al. [20] adopted the user-based collaborative filtering (CF) by regarding user's friends as neighbors. In general, there are two limitations in these studies. (1) Most of these methods only use part of the information in social networks, such as residence distance and the social connections between friends. (2) Friends with closer social ties are more likely to trust their recommendations. For example, if two users visited the same location while they have some common friends in the social network, their connection could be strengthened. Therefore, both social closeness and social connections can be considered together to achieve a higher performance.
In this paper, we explore the geographical and social influences on location recommendation. Specifically, we focus on the two-dimensional geographical influence of locations through capturing the spatial distribution of user preferences based on a kernel density estimation. The proposed method uses a new bandwidth computing method and a quartic kernel function, which can more accurately estimate the probability of a user checking in at a new location. Additionally, we propose a synthetic friend-based recommendation method combining social closeness and social connections between friends in the recommendation process. Moreover, a unified framework is used to combine user's preferences, geographical and social influences. Finally, experimental results on three real-life datasets show that our method achieves superior performance compared to the other recommendation techniques that are evaluated in our experiments.
The rest of this article is structured as follows: In Section 2, related work on location recommendation, particularly in the areas of geographical and social influence, is briefly reviewed. The details of the geographical model, social model, and fusion method are introduced in Section 3. Section 4 gives the experiment settings. In Section 5, we conduct experiments on three real-life datasets and analyze the proposed methods compared with other baseline methods. Finally, we conclude this research in Section 6.

Related Work
In this section, we summarize related work in location recommendation in two categories: geographical influence and social influence. Next, we will present the most relevant work in each category.

Location Recommendation Using Geographical Information.
There is a unique feature of LBSNs that distinguishes location recommendation from traditional recommendation techniques: In general, traditional recommendation techniques have been used for non-spatial items, such as movies, music, foods, or books. However, physical interactions are needed for users to visit locations in LBSNs. At the same time, according to the First Law of Geography, "Everything is related to everything else, but near things are more related to each other". Therefore, the geographic information (i.e., longitude and latitude) of locations and the geographical proximity between two locations have a significant impact on users' check-in behaviors. Recent studies show that users tend to visit locations close to their homes or offices and may be interested in locations near the visited locations [15].
Since the geographical information of locations significantly affects the check-in behaviors of users, many researchers integrate geographic information into the study of location recommendation. In recent years, the majority of research has applied geographic connection matrixes, geographic distance matrixes, and user location matrixes, and then performed location recommendation by combining matrix decomposition [1,17,27,28] and deep-learning-based models [14,[29][30][31][32]. The methods above have achieved satisfactory results and become the state of the art. However, they still have some limitations, for example, the geographic matrix sparsity issue, and deep learning methods are difficult to modeling the two-dimensional geographic distribution of locations or users directly.
In addition, some geographical analysis models, including KDE [7,11,[16][17][18][19][20][21][22]33,34], the multi-center Gaussian model (MGM) [10,23] and the power-law distribution (PD) [15,24] are introduced in location recommendation. These models significantly improve the recommendation quality. MGM and PD methods are parametric estimation technologies. By contrast, non-parametric estimation (i.e., KDE) does not make any assumptions about the implied distribution form, and it learns the distribution form from the data. Zhang et al. [16,[19][20][21][22] used a one-dimensional KDE (1D-KDE) model for geographical modeling; these methods learn the distance distribution from users' check-in history. Zhang et al. [11,17,33,34] introduced a two-dimensional KDE (2D-KDE) model to determine the check-in probability distribution; 2D-KDE is more intuitive and reasonable than 1D-KDE. Furthermore, Ogundele et al. [7,18] adopted an adaptive kernel estimation method (A-KDE), which uses a personalized bandwidth for each visited location, and the adaptive bandwidth itself is also learned from the underlying check-in data. A-KDE achieves better results than 1D-KDE and 2D-KDE models, but it is time consuming. In general, previous works personalized geographical information by constructing geographical matrixes or geographical models for users or locations. Our proposed model differs from these works, as we focus on leveraging 2D-KDE with a new bandwidth method to model the geographical distribution for better recommendation performance.
Location Recommendation Using Social Information.
Based on the fact that friends are more likely to visit common locations, a user's preference can be influenced by his or her group of friends, who are likely to share some common interests [35,36]. As implicit feedback, social information has been widely used to improve the accuracy of location recommendation. Recently, some studies have obtained user similarity from social relationships between friends and combined it with traditional recommendation technologies, such as memory-based [37,38] or model-based [39] collaborative filtering technologies. Based on these observations, social collaborative filtering (SCF) methods were proposed in [37,40]. The user similarities of user-based CF and item-based CF are derived from user location matrixes. In contrast to these methods, the social similarity of SCF is obtained from social influence among friends.
As a main type of auxiliary information, social information has mainly been used through ensemble methods [41,42] and regularization methods [43] for location recommendation. The common rationale behind these methods is that users' preferences are similar to those of their friends. However, most of these methods only use part of the information in social networks (either social closeness or social connections). Inspired by the above research, we obtained users' similarity from friends' social closeness and social connections, and then integrated the similarity into the unified framework.

Methods
In this section, the proposed method GeSSo (geographical and synthetic social influences) will be introduced in detail. We first summarize the notations in Section 3.1. Then we describe the overview of proposed model in Section 3.2.

Problem Statement
Before we describe the proposed model, the key notation in this article is defined in Table 1. Then, we present some basic definitions, including location, location recommendation, and geographical coordinates.

Notation Meaning
U Set of users in the LBSN u 1 , u 2 , · · · , u |U| L Set of POIs in the LBSN l 1 , l 2 , · · · , l |L| L u Set of locations that user u visited, L u = {l 1 , l 2 , · · · , l t } ⊂ L F u Set of users having social relations with u, F u = {u 1 , u 2 , · · · , u n } ⊂ U p(l|L u ) Predicted probability of u visiting l given L u r u,l Actual rating of user u for the visited location l r u,l Predicted rating of user u for the unvisited location l h Bandwidth, i.e., search radius Definition 1 (Location). A uniquely identified spatial position, as known as point-of-interest. In this paper, we use l to represent a location and L to represent the set of locations. Each location corresponds to a specific location in the real world and has geographical coordinates.

Definition 2 (Location recommendation).
Given a user u, we recommend locations that u has not visited but might be interested in according to the contextual information, such as check-in, social, geographical, temporal, and categorical information.

Definition 3 (Geographical coordinates).
A location is associated with a pair of geographical latitude and longitude coordinates.

Geographical and Synthetic Social Influences (GeSso) Model
The overview framework of this paper is shown in Figure 1. (1) User preference model (Section 3.2.1). Using a user-based CF method to model users' check-in history, the similarity between two users is computed based on their common locations. (2) Geographical model (Section 3.2.2). We introduce a two-dimensional KDE model that adopts a default bandwidth and a quartic kernel function.
(3) Social model (Section 3.2.3). A social model is built by considering the social connections and closeness between two users. (4) Fusion framework (Section 3.2.4). We adopt a linear fusion framework to integrate user preferences and geographical and social influence.

User Preference Model
As shown in previous works, a user's preference is significant information in enhancing the quality of location recommendation [15,16]. Therefore, we predict the user's preference   Pre m u p l L based on UCG technology [44], given by , u l u l lL CosSim u u is the similarity between user u and u . In our study, we use cosine similarity to measure user similarity.

Geographical Influence Model
Unlike the parametric estimation method, the non-parametric estimation does not make any assumptions about the implied distribution form, but it learns the distribution form from the data. On the other hand, compared with one-dimensional distance distributions, such as power law distributions [15,24] and 1D-KDE [16,19], a two-dimensional check-in probability distribution is more intuitive and reasonable. Traditional two-dimensional kernel density estimation methods [11,18] cannot effectively avoid consuming excessive bandwidth and are not suitable for addressing the outlier and data sparsity problems. To this end, we introduce a two-dimensional kernel density estimation method (WDQ-KDE) based on a fixed bandwidth method, which is calculated according

User Preference Model
As shown in previous works, a user's preference is significant information in enhancing the quality of location recommendation [15,16]. Therefore, we predict the user's preference p Pre (l m |L u ) based on UCG technology [44], given by together with where CosSim(u, u ) is the similarity between user u and u . In our study, we use cosine similarity to measure user similarity.

Geographical Influence Model
Unlike the parametric estimation method, the non-parametric estimation does not make any assumptions about the implied distribution form, but it learns the distribution form from the data. On the other hand, compared with one-dimensional distance distributions, such as power law distributions [15,24] and 1D-KDE [16,19], a two-dimensional check-in probability distribution is more intuitive and reasonable. Traditional two-dimensional kernel density estimation methods [11,18] cannot effectively avoid consuming excessive bandwidth and are not suitable for addressing the outlier and data sparsity problems. To this end, we introduce a two-dimensional kernel density estimation method (WDQ-KDE) based on a fixed bandwidth method, which is calculated according to two kinds of weighted distance, i.e., standard distance and median distance. In general, the kernel estimation method consists of two steps: the default bandwidth calculation and kernel estimation for the geographical relevance score.
Step 1: Default bandwidth calculation. Given a user u and a set of visited locations L u . Each location l i = x i , y i consists of longitude x i and latitude y i . The bandwidth (i.e., search radius) is calculated as follows: where SD is the weighted standard distance. DM is the weighted median distance. w i represents the check-in frequency of user u at location l i . min(·) is the minimal value among a list of numbers. Specifically,SD reflects the dispersion of other locations relative to the center. The computation of the standard distance is based on a spherical coordinate system with a spatial reference point, therefore, we calculate the relative standard distance from the standard deviation σ w = σ x , σ y to the origin l O (0, 0), given by together with where σ x and σ y represent the weighted standard deviation of longitude and latitude, respectively. Additionally, DM is the average distance between l i ∈ L u and l C , given by where dis(l i, , l C ) denotes the distance between l i ∈ L u and l C . l C = X C , Y C is the weighted-mean center of locations locations in L u , and X C and Y C represent the weighted average of longitude and latitude, respectively. Specifically, the check-in frequencies of users reflect the users' preference for locations. Therefore, we consider the check-in frequency of user u at location l i to be the weight of the location, i.e., r u,l i = w i . The higher the check-in frequency is, the greater the weight of the location.
Step 2: Kernel estimation for the geographical relevance score. With the global bandwidth h in Equation (3), the geographical probability that user u visits an unvisited location l m L u is given by: together with and where d(l m, l i ) represents the distance between l i ∈ L u and l m L u . K(·) is the kernel function, and N is the total check-in frequency of user u in L u . This article applies the quartic kernel function [45], which is useful in two-dimensional kernel estimation. As is shown in Equation (8), when the distance between l i and l m is larger than h, location l i has no influence on l m . Note that we use the great-circle distance as the calculation method for distance, including dis(σ w , l O ) in Equation (4), dis(l i , l C ) in Equation (6) and d(l m , l i ) in Equation (7). This is because the check-in datasets have different scales for the size of entities and the geographical range. Euclidean distance is applicable to small-scale scenarios, but not for global datasets.

Social Influence Model
In the real world, friends tend to have similar preferences or behaviors. For example, friends often go to movie theaters or restaurants together, or a user may check-in a market that is shared by his or her friends [46]. Therefore, a user's preferences for locations can be influenced by his or her friends. We predict a user's preference based on the preference of his/her friends. Additionally, friends who have closer social ties may have better trust in each other's recommendations [15]. The social-tie closeness can be measured through the number of mutual friends.
In summary, we compute users' comprehensive similarity based on social information, which combines social closeness and connections to make recommendations. We define the social similarity between u and u as follows: where η is a tuning parameter ranging within [0, 1]. CloSim(u, u ) and ConSim(u, u ) represent the social closeness and connection, respectively, between users u and u and are given by Equation (11) uses a simple and effective method, i.e., Jaccard similarity. Finally, to fully take advantage of the social relations, a synthetic social model (SSo) is built based on SCF technology, the social rating of u for an unvisited location l m can be estimated as:

Fusion Framework
The fusion step for combining of the various contextual information is an important issue in the area of context-aware decision making and recommendation. In this paper, the goal of the fusion framework is to fuse the scores of a user's preferences p Pre (l m |L u ) (Equation (1)), the geographical influence p Geo (l m |L u ) (Equation (7)) and the social influence p Soc (l m |L u ) (Equation (13)) to obtain a better quality of location recommendation.
Crisp rules [7,16,26], fuzzy rules [47][48][49], machine learning and deep learning approaches [22,50], and hybrid approaches [51] are four major approaches for calculating the final score based on the various input contextual features. The sum rule belongs to the crisp rules, it is a simple and conventional fusion method. Therefore, we apply the sum fusion rule [15] to combine the three abovementioned results into the final score. Let S u,l m represent the probability score of user u for an unvisited location l m . S Pre u,l m , S Geo u,l m and S Soc u,l m denote the normalized probability scores of the user preference, geographical influence and social influence, respectively. The fusion model is defined as together with where α and β are weighting parameters (0 ≤ α + β ≤ 1). They denote the relative importance of geographical and social influence compared with user preference. We intend to turn parameters α and β to find out their optimal settings. The parameters reflect the weights of user preference, geographical, and social influence in obtaining optimal recommendations.

Dataset Description
We used three publicly available real check-in datasets, which were crawled from three location-based social networks (Gowalla, Foursquare, and Yelp). The Foursquare dataset is provided in [24]. The Gowalla and Yelp datasets are provided in [5]. These datasets have different scales for the size of entities (i.e., users and POIs) and geographical ranges. The statistics of the datasets after preprocessing are shown in Table 2. Figure 2 shows the distributions of the locations in the three datasets.  (7)) and the social influence   Soc m u p l L (Equation (13)) to obtain a better quality of location recommendation.
Crisp rules [7,16,26], fuzzy rules [47][48][49], machine learning and deep learning approaches [22,50], and hybrid approaches [51] are four major approaches for calculating the final score based on the various input contextual features. The sum rule belongs to the crisp rules, it is a simple and conventional fusion method. Therefore, we apply the sum fusion rule [15] to combine the three abovementioned results into the final score. Let where  and  are weighting parameters   01     . They denote the relative importance of geographical and social influence compared with user preference. We intend to turn parameters α and β to find out their optimal settings. The parameters reflect the weights of user preference, geographical, and social influence in obtaining optimal recommendations.

Evaluated Recommendation Methods
We used geographical, social, and check-in data for location recommendation. In this section, we evaluate the recommendation efficiency with three criteria: overall performance, geographical influence, and social influence. We compared the proposed method with the following baselines.

•
USG. USG is a unified location recommendation framework, which explores user preferences and geographical and social influences for location recommendation. It uses a sum rule to integrate user preferences and geographical and social influences [15].

•
Lore. This method models sequential, geographical, and social influences for location recommendation. It uses an unweighted two-dimensional KDE model for geographical modeling [11]. The similarity between friends is computed based on the distance between residences. Because the residence locations are not available, we define users' most frequently visited locations as their residences. It uses a product fusion rule to integrate different factors.

•
GeoSoCa. This method models three types of contextual information, namely, geographical, social, and categorical information. It uses an adaptive weighted two-dimensional KDE model for geographical modeling [18]. • SCF. SCF is a social-based collaborative filtering method which makes location recommendations based on the Jaccard similarity between friends. The similarity between friends is computed based on the common friends. [37]

Performance Metrics
Two widely used standard metrics, i.e., precision (Pre@K) and recall (Rec@K), are used to evaluate the quality of location recommendation models. For each user, the precision reflects the proportion of recovered locations to the K recommended locations, and the recall reflects the proportion of recovered locations to the locations actually visited in the testing dataset. The averages of the precision and recall of all users are reported in Equations (17) and (19), respectively, which are given by where, Pre u @K and Rec u @K represent the precision and recall of user u, respectively. V k is the set of recommended locations. V test u is the set of locations that were visited by u in the testing dataset. In our experiment, we test the performance when K = 5, 10, 20, 50. The average of precision and recall values of all users are reported.

Experiment Settings
The datasets were split into three parts, a training set, tuning set, and test set [5,24]. Note that we only use the training and testing dataset in experiments. For the Foursquare dataset, 62.5% of the POIs visited by each user are randomly select as training data and 25% of the POIs as testing data. For the Gowalla and Yelp datasets, 70% of each user's check-ins with earlier timestamps are labeled as the training data and the most recent 20% of check-ins as the testing data.
All algorithms were implemented in Python and run on a machine with a 3.4-GHz Intel Xeon E5-1620 Processor and 16GB RAM. Note that η, α and β are not free parameters and are learned from check-in data according to Equations (10) and (14), respectively.

Results and Discussion
In this section, we conduct extensive experiments to evaluate the performance of the proposed method for location recommendation. First, we analyze the recommendation accuracy of all methods in Section 5.1. We compared geographical recommendation methods in Section 5.2. The social recommendation methods are described in Section 5.3. Finally, the effect of the number of check-in locations and kernel density estimation models are discussed in Sections 5.4 and 5.5, respectively.

Overall Performance Results
In this section, we compare the effectiveness of the overall recommendations. Figures 3 and 4 show the performance @K (K = 5, 10, 20, 50) of the sum rule for integrating user preferences and geographical and social influences. All approaches are shown in terms of their best performance (i.e., the performance under the optimal parameter settings).

Results and Discussion
In this section, we conduct extensive experiments to evaluate the performance of the proposed method for location recommendation. First, we analyze the recommendation accuracy of all methods in section 5.1. We compared geographical recommendation methods in section 5.2. The social recommendation methods are described in section 5.3. Finally, the effect of the number of check-in locations and kernel density estimation models are discussed in section 5.4 and 5.5, respectively.

Overall Performance Results
In this section, we compare the effectiveness of the overall recommendations. Figures 3 and 4 show the performance @ K   5,10, 20,50 K  of the sum rule for integrating user preferences and geographical and social influences. All approaches are shown in terms of their best performance (i.e., the performance under the optimal parameter settings).
Next, we explain two parameters, i.e.,  (for geographical influence) and  (for social influence), that can be controlled to tune the performance of GeSSo. Similar to [15], we tune them to explore the roles played by user preferences, social and geographical influences in achieving optimal performance. In our experiments, the optimal parameters are 0.2, 0.2

 
for Gowalla, and for Yelp. The results show that the factor of user preferences weighs more than the factor of social and geographical factors.

Results for the Geographical Influence Methods
Figures 5-7 show the recommendation accuracy of WDQ-KDE, DQ-KDE, Lore [11], and GeoSoca [18] on three large-scale real datasets, i.e., Foursquare, Gowalla, and Yelp. Since Lore uses an unweighted two-dimensional KDE model with a Gaussian kernel function, the model

Results and Discussion
In this section, we conduct extensive experiments to evaluate the performance of the proposed method for location recommendation. First, we analyze the recommendation accuracy of all methods in section 5.1. We compared geographical recommendation methods in section 5.2. The social recommendation methods are described in section 5.3. Finally, the effect of the number of check-in locations and kernel density estimation models are discussed in section 5.4 and 5.5, respectively.

Overall Performance Results
In this section, we compare the effectiveness of the overall recommendations. Figures 3 and 4 show the performance @ K   5,10, 20,50 K  of the sum rule for integrating user preferences and geographical and social influences. All approaches are shown in terms of their best performance (i.e., the performance under the optimal parameter settings).
Next, we explain two parameters, i.e.,  (for geographical influence) and  (for social influence), that can be controlled to tune the performance of GeSSo. Similar to [15], we tune them to explore the roles played by user preferences, social and geographical influences in achieving optimal performance. In our experiments, the optimal parameters are 0.2, 0.2

 
for Gowalla, and for Yelp. The results show that the factor of user preferences weighs more than the factor of social and geographical factors.

Results for the Geographical Influence Methods
Figures 5-7 show the recommendation accuracy of WDQ-KDE, DQ-KDE, Lore [11], and GeoSoca [18] on three large-scale real datasets, i.e., Foursquare, Gowalla, and Yelp. Since Lore uses an unweighted two-dimensional KDE model with a Gaussian kernel function, the model Next, we explain two parameters, i.e., α (for geographical influence) and β (for social influence), that can be controlled to tune the performance of GeSSo. Similar to [15], we tune them to explore the roles played by user preferences, social and geographical influences in achieving optimal performance. In our experiments, the optimal parameters are α = 0.2, β = 0.2 for Gowalla, and α = 0.2, β = 0.2 for Yelp. The results show that the factor of user preferences weighs more than the factor of social and geographical factors.
In both the Gowalla and Yelp datasets, GeSSo always performs the best in terms of accuracy in precision and recall. It is worth noting that GeSSo outperforms WDQ-KDE and SSo by approximately 50% of precision and recall on both datasets. GeSSo exhibits slightly better performance than the UCF method. The result shows that user preferences reflect a user's historical check-in behavior and play a significant role in recommendation. As discussed above, we find that the more influences are considered, the better the performance.          As we can see, as K increases, the precision decreases and the recall increases. This is because more recommended locations for users can include more locations that users would like to check in at as well as more locations that are less likely to be visited by users. Among the four geographical models, WDQ-KDE performs the best, and AWG-KDE and WDQ-KDE perform much better than G-KDE and DQ-KDE on the Foursquare and Gowalla datasets. However, WDQ-KDE and DQ-KDE have better performance than AWG-KDE and G-KDE on the Yelp dataset. This is because the As we can see, as K increases, the precision decreases and the recall increases. This is because more recommended locations for users can include more locations that users would like to check in at as well as more locations that are less likely to be visited by users. Among the four geographical models, WDQ-KDE performs the best, and AWG-KDE and WDQ-KDE perform much better than G-KDE and DQ-KDE on the Foursquare and Gowalla datasets. However, WDQ-KDE and DQ-KDE have better performance than AWG-KDE and G-KDE on the Yelp dataset. This is because the locations of the Yelp dataset are widely distributed in several cities around the world, and performance usually suffers because of the existing outliers.

Results for the Geographical Influence Methods
WDQ-KDE vs. DQ-KDE. WDQ-KDE is greatly superior to DQ-KDE on the Foursquare and Gowalla datasets. These two models both calculate bandwidths according to two kinds of distance, i.e., standard distance and median distance. Specifically, DQ-KDE does not consider the visiting frequency, which reflects users' potential preferences. WDQ-KDE is an enhanced version of DQ-KDE that uses the visiting frequency as the weight of a location. On the Yelp dataset, WDQ-KDE and DQ-KDE have the same performance, and they can effectively address the phenomenon of outliers.
WDQ-KDE vs. AWG-KDE and G-KDE. AWG-KDE has slightly lower accuracy than WD-KDE for all three datasets. AWG-KDE uses an adaptive bandwidth for each check-in data point and uses the check-in frequency as the weight of a location. It performs much better than G-KDE, which uses a fixed bandwidth, on the Foursquare and Gowalla datasets. WDQ-KDE performs better than the other two methods; the reason is that G-KDE cannot effectively avoid excessive bandwidth, and AWG -KDE is not suitable for outliers and data sparsity. We will perform further analysis in Section 5.5.

Results for Social Influence Methods
The model described in Equation (13) is denoted SSo. Figures 8 and 9 depict the recommendation accuracy of SCF, Con, and SSo on two large-scale real datasets, i.e., Gowalla and Yelp. SSo fuses social closeness and connection. We model the factor of social closeness in Equation (11), i.e., SCF, and the influence of connection in Equation (12), i.e., Con. Note that the Foursquare dataset does not have social information, and therefore, we only report the results on Gowalla and Yelp.

Results for Social Influence Methods
The model described in Equation (13) is denoted SSo. Figures 8 and 9 depict the recommendation accuracy of SCF, Con, and SSo on two large-scale real datasets, i.e., Gowalla and Yelp. SSo fuses social closeness and connection. We model the factor of social closeness in Equation (11), i.e., SCF, and the influence of connection in Equation (12), i.e., Con. Note that the Foursquare dataset does not have social information, and therefore, we only report the results on Gowalla and Yelp.   As K increases, the precision decreases and the recall increases. In our experiments, social influence is defined based on two factors: 1) the ratio of the number of common friends; 2) whether The model described in Equation (13) is denoted SSo. Figures 8 and 9 depict the recommendation accuracy of SCF, Con, and SSo on two large-scale real datasets, i.e., Gowalla and Yelp. SSo fuses social closeness and connection. We model the factor of social closeness in Equation (11), i.e., SCF, and the influence of connection in Equation (12), i.e., Con. Note that the Foursquare dataset does not have social information, and therefore, we only report the results on Gowalla and Yelp.   As K increases, the precision decreases and the recall increases. In our experiments, social influence is defined based on two factors: 1) the ratio of the number of common friends; 2) whether As K increases, the precision decreases and the recall increases. In our experiments, social influence is defined based on two factors: (1) the ratio of the number of common friends; (2) whether two users are friends. Among the three social models, SSo performs the best on both the Gowalla and Yelp datasets. Con performs much better than SCF on both datasets.
Through the experiments on the SSo method, the optimal setting for Equation (10) is to be smaller than 0.05 on both the Gowalla and Yelp datasets. However, it does not follow that the factor of social closeness should be weighted more than the factor of connection. This is because the calculation methods of these two factors are different, and the values of Con in Equation (12) are relative. To fuse the results of SCF and Con, we need to find the optimal parameter to adjust the value of Con. The parameter η of SSo on Gowalla and Yelp is 0.01 and 0.05, respectively.
From Figures 8 and 9, we conclude that the factor of connection performs better than the factor of social closeness. More mutual friends between a user and his/her friends means there is a closer relationship between them; however, the similarity in friends' check-in behaviors may not be reflected in the strength of their social closeness. Previous research found that the preferences of a user's friends may be different [15,52]. In this research, we find that the factors of social closeness and connection can enhance the accuracy of recommendations to some extent. Figures 10-12 show the recommendation accuracy of geographical recommendation methods regarding various numbers of check-in locations of users. The numbers of check-in locations are divided into five groups. As the number increases, users visit more locations. Users who visit more locations are called "active users". As users visit more locations, the precision increases; this is because more check-in data are available for these recommendation methods. The methods will more accurately estimate the scores of these active users for new locations. However, the recall fluctuates as the visit number increases. The reason is that users who have visited many locations have usually visited many locations in the testing dataset. As a whole, WDQ-KDE performs better than the other methods on the three datasets. Figures 10-12 show the recommendation accuracy of geographical recommendation methods regarding various numbers of check-in locations of users. The numbers of check-in locations are divided into five groups. As the number increases, users visit more locations. Users who visit more locations are called "active users". As users visit more locations, the precision increases; this is because more check-in data are available for these recommendation methods. The methods will more accurately estimate the scores of these active users for new locations. However, the recall fluctuates as the visit number increases. The reason is that users who have visited many locations have usually visited many locations in the testing dataset. As a whole, WDQ-KDE performs better than the other methods on the three datasets.   as the visit number increases. The reason is that users who have visited many locations have usually visited many locations in the testing dataset. As a whole, WDQ-KDE performs better than the other methods on the three datasets.

Effect of the Number of Check-in Locations
(a) Pre@10 Foursquare (b) Rec@10 Foursquare

Effect of the Kernel Density Estimation Model
To detect the effects of the bandwidth and kernel function on location recommendation, we randomly select a user in the Foursquare and Gowalla dataset, respectively. Figures 13 and 14 depict the density distribution of different kernel density estimation models. The horizontal axis represents longitude and the vertical axis represents latitude. The orange points in Figures 13 and 14 indicate

Effect of the Kernel Density Estimation Model
To detect the effects of the bandwidth and kernel function on location recommendation, we randomly select a user in the Foursquare and Gowalla dataset, respectively. Figures 13 and 14 depict the density distribution of different kernel density estimation models. The horizontal axis represents longitude and the vertical axis represents latitude. The orange points in Figures 13 and 14 indicate the locations that users have visited. According to Equation (7), the kernel density value in a certain place represents the score of the location. We use various shades of blue to represent the kernel density value, i.e., location score. The darker the color is, the higher the density.
As shown in Figure 13, these locations are distributed on a small scale, and the estimate density values and gradients increase from (a) to (d). However, the coverage area of the minimum contour

Effect of the Kernel Density Estimation Model
To detect the effects of the bandwidth and kernel function on location recommendation, we randomly select a user in the Foursquare and Gowalla dataset, respectively. Figures 13 and 14 depict the density distribution of different kernel density estimation models. The horizontal axis represents longitude and the vertical axis represents latitude. The orange points in Figures 13 and 14 indicate the locations that users have visited. According to Equation (7), the kernel density value in a certain place represents the score of the location. We use various shades of blue to represent the kernel density value, i.e., location score. The darker the color is, the higher the density.
As shown in Figure 13, these locations are distributed on a small scale, and the estimate density values and gradients increase from (a) to (d). However, the coverage area of the minimum contour line decreases. The locations in Figure 14 are distributed in a larger area and gather into three clusters. This shows that Figure 14d has the largest estimate density values. The results show that 1) the bandwidth computing method based on Equation (3) enlarges the scope of the scores and significantly reduces the density values from the most central to the periphery; i.e., the change of the gradient of density is obvious; 2) Our method reduces the radiation range for these frequently visited locations, and it also enlarges the values around these "active points". This is suitable for cases in which a user often checks in in different cities or countries. Therefore, WDQ-KDE can effectively select the area of interest for users.

Conclusions and Future Work
In this paper, we proposed an effective location recommendation method called GeSSo. With GeSSo, we mainly explored geographical influences on users' check-in behaviors in LBSNs and

Conclusions and Future Work
In this paper, we proposed an effective location recommendation method called GeSSo. With GeSSo, we mainly explored geographical influences on users' check-in behaviors in LBSNs and modeled a personalized two-dimensional kernel density estimation method that addresses the data sparsity and outlier problems. Furthermore, we designed a friend-based method to measure the similarity between users based on their social closeness and social connection. In addition, user preferences and geographical and social influences are integrated into a unified score using a sum rule. Experiments on real datasets indicated that GeSSo provides better location recommendations than the other recommendation techniques evaluated in our experiments.
There are three directions for future study: (1) With proper methods, more contexts can be built into this method, such as temporal and categorical contexts [53]; (2) We are also interested in exploring geographical characteristics using spatial analysis methods for location recommendation; (3) The machine learning-based or Nero-fuzzy based fusion approaches are important future directions to explore.  As shown in Figure 13, these locations are distributed on a small scale, and the estimate density values and gradients increase from (a) to (d). However, the coverage area of the minimum contour line decreases. The locations in Figure 14 are distributed in a larger area and gather into three clusters. This shows that Figure 14d has the largest estimate density values. The results show that 1) the bandwidth computing method based on Equation (3) enlarges the scope of the scores and significantly reduces the density values from the most central to the periphery; i.e., the change of the gradient of density is obvious; 2) Our method reduces the radiation range for these frequently visited locations, and it also enlarges the values around these "active points". This is suitable for cases in which a user often checks in in different cities or countries. Therefore, WDQ-KDE can effectively select the area of interest for users.

Conclusions and Future Work
In this paper, we proposed an effective location recommendation method called GeSSo. With GeSSo, we mainly explored geographical influences on users' check-in behaviors in LBSNs and modeled a personalized two-dimensional kernel density estimation method that addresses the data sparsity and outlier problems. Furthermore, we designed a friend-based method to measure the similarity between users based on their social closeness and social connection. In addition, user preferences and geographical and social influences are integrated into a unified score using a sum rule. Experiments on real datasets indicated that GeSSo provides better location recommendations than the other recommendation techniques evaluated in our experiments.
There are three directions for future study: (1) With proper methods, more contexts can be built into this method, such as temporal and categorical contexts [53]; (2) We are also interested in exploring geographical characteristics using spatial analysis methods for location recommendation; (3) The machine learning-based or Nero-fuzzy based fusion approaches are important future directions to explore.