A Multi-Period Product Recommender System in Online Food Market based on Recurrent Neural Networks

: A recommender system supports customers to ﬁnd information, products, or services (such as music, books, movies, web sites, and digital contents), so it could help customers to make rapid routine decisions and save their time and money. However, most existing recommender systems do not recommend items that are already purchased by the target customer, so are not suitable for considering customers’ repetitive purchase behavior or purchasing order. In this research, we suggest a multi-period product recommender system, which can learn customers’ purchasing order and customers’ repetitive purchase pattern. For such a purpose we applied the Recurrent Neural Network (RNN), which is one of the artiﬁcial neural network structures specialized in time series data analysis, instead of collaborative ﬁltering techniques. Recommendation periods are segmented as various time-steps, and the proposed RNN-based recommender system can recommend items by multiple periods in a time sequence. Several experiments with real online food market data show that the proposed system shows higher performance in accuracy and diversity in a multi-period perspective than the collaborative ﬁltering-based system. From the experimental results, we conclude that the proposed system is suitable for multi-period product recommendation, which results in robust performance considering well customers’ purchasing orders and customers’ repetitive purchase patterns. Moreover, in terms of sustainability, we expect that our study contributes to the reduction of food wastes by inducing planned consumption, and the reduction of shopping time and e ﬀ ort. recommendation model with the MOV method is higher than that of the MP method by about 8% at T+1, 19% at T+2, 29% at T+3 and 33% at T+4. On the other hand, the CF-based recommendation model with the MOV method is higher than the MP method at about 7% at T+1, 13% at T+2, 18% at T+3 and 25% at T+4. These results show that the proposed LSTM-based recommendation model predicts more accurately what customer prefers than the CF-based model.


Introduction
As the interest in personalization services increases in various fields, recommender systems applying various knowledge discovery techniques are being studied commercially and academically [1]. Especially, product recommendation systems, mostly developed based on online commerce, have been gradually becoming important in terms of sales and customer relationship as well as helping consumers to choose [2]. Collaborative filtering (CF) is a technique known as showing the best performance in product recommender systems [3,4]. The underlying assumption of collaborative filtering is that 'customers with similar preferences for particular items will show similar preferences for other items'. CF-based recommendation models predict preference based on the similarity between users or items, but scalability and sparsity problems may have occurred due to data increases as e-commerce grows [4]. However, as online business and technology advances, customers transaction data not only increases [5] but also consumers' consumption patterns are changing [6]. Consumers not site is very large, but even the most active users would have rated only a small part of the entire items. Therefore, even the most popular items have no rating data. In addition, as online purchases have become popular recently, available item-user data sets have increased. As a result, the computational complexity for recommendations has increased, resulting in scalability problems [4]. As well as sparsity and scalability problems due to the increase of usable data in recommender systems, most of the CF approaches have the problem that almost same products are recommended because it does not reflect the change of customers' preference over time. Most of the CF approaches only use information about user's purchases over a specific period and do not utilize information about the order of purchase items. Song et al. [19] attempted to find changes in customers' behavior by creating association rules from two different points of view datasets and compared the association rules at two points to find out changes in customers' purchase behavior over time. However, they analyzed only a comparison between two points of purchasing using the association rules and did not find any long-term pattern changes.
On the other hand, research has been proposed to reflect temporal dynamics in order to take into account the changing preferences of customers [20][21][22]. The method that reflects the temporal dynamics includes a way of reducing weight by time flow based on the time when the item is purchased or using a moving window method which only a certain period of data is used when recommending. As seen in previous studies, the forgetting method was used to capture the preferences of customers. This forgetting method is done either using a fixed-size moving window over data and repeatedly training the model with data in the window size using a decay function in the similarity calculation making older items less relevant [22,23]. In this study, we use RNN, an artificial neural network model suitable for sequential data, in which input and output are both sequences, to capture preference changes in individual user-level. We propose an improved recommendation model that learns how previous purchase history affects the next purchase history.

RNN and Recommendation
RNN (Recurrent Neural Networks) is a structure appropriate for analyzing time series data since past information is stored in the hidden node and transferred to the next steps, which mean that previous input data can affect predictions of the current output [8]. The previous state is stored or memorized in the current state, that is, the data at time t can be considered when predicting the next state of time t+1. The hidden unit of the RNN serves as a memory block, and each memory block receives data at a specific time t. The memory block receiving the input at time t transfers the information to the connected memory block. However, the input sequence gets longer, errors may not be conveyed forward, which may lead to a problem that is called long-term dependency. In 1997, Hochreiter and Schmidhuber [24] first introduced the vanishing gradient problem and presented LSTM (Long Short-Term Memory) to address this problem. In 2014, Cho et al. [25]. presented GRU (Gated Recurrent Unit), a simple variant of LSTM. Many variants of RNN have appeared, but LSTM and GRU are the most commonly used RNN structures.
Recently, studies using RNN have been actively carried out in research on recommender systems. Zhang et al. [26] used RNN to predict online advertisement clicks and attempted to predict the user's next clicks. Hidasi et al. [27] viewed the recommendation problem as a session-based recommendation and then the use of Gated Recurrent Units (GRUs), a variant of RNN, for modeling user behavior in a session-based scenario in internet sites. Based on these studies, the session-based recommendation, various follow-up studies have been conducted. Tan et al. [28] presented various training methods to improve the performance of the model presented in Hidasi et al. [27]. I.e., data augmentation and a method to account for shifts in input data distribution. However, session-based recommendation is only justified by the fact that most e-commerce sites do not track user behavior beyond the session level. On the other hand, Song et al. [29] modeled both static and temporal user characteristics, assuming that user interest changes over time. The authors saw static features as the entire dataset and temporal features as the most recent dataset. Yu et al. [30] viewed the problem as the next-basket prediction in e-commerce. They proposed a RNN model with real-valued representations of the baskets as an input and trained to rank the items in the next basket. In those scenarios, items for a shopping cart are suggested based on a user's history of past shopping carts.
As previous studies have shown, attempts have been made to improve the accuracy of recommender systems using a RNN model for learning time-related or sequential information from continuous data. In order to capture customers' preference, we first divide the recommendation point into several sections to see whether customers' preferences change or not. In this study, the RNN and its variants will be reviewed and a RNN-based recommendation model considering purchase order is presented to compare the results from traditional recommendation models.

Overview
The purpose of this study is to introduce a recommendation model considering purchase order to capture customers' changing preferences and to examine the recommendation results of the proposed model from a multi-period perspective. Generally, customers received recommendation items at time T based on the purchase history up to the time point T-1. Likewise, in this study, it is assumed that the transaction information before the specific time point T-1 is analyzed and the target customer is recommended to purchase items at the next time point T. Therefore, the time point T does not mean a specific timespan, but basket sequences. For example, a customer purchases 10 times through the online food market, total T equals 10. The purchase at T-1 means the previous purchase(basket), i.e., 9th purchase(basket). More specifically, a set of all customers is represented as U = u 1 , . . . , u p , and a set of all items is represented as I = i 1 , . . . , i q . Let B T be the set of all transaction data by all customers at time T, and represented as To be precise, T stands for the timestamp of a purchase, not the actual time. B u p T is a subset of I and consisted of purchased items grouped into a basket according to the same purchase order at time T. Then, the total purchasing history of a specific customer u p at time T is sorted in the order of purchase time, as B T . In this study, given the previous purchasing history, a problem of sequential prediction is set for each user u p to recommend a set of items to be purchased at next time B u p T+1 . To do this, an RNN-based recommendation model is proposed to consider purchasing order. We divide the recommendation period into various time points, and experiment using real transaction data to see how well the proposed model reflects the changing customer's preference. The schematic framework of the proposed approach is described in Figure 1.

165
Then, the total purchasing history of a specific customer at time T is sorted in the order of purchase 166 time, as , , … , , . In this study, given the previous purchasing history, a problem of 167 sequential prediction is set for each user to recommend a set of items to be purchased at next time

168
. To do this, an RNN-based recommendation model is proposed to consider purchasing order.

169
We divide the recommendation period into various time points, and experiment using real 170 transaction data to see how well the proposed model reflects the changing customer's preference.

171
The schematic framework of the proposed approach is described in Figure 1.

RNN-based Recommendation Model
The recommendation model presents items likely to be purchased by customers at next time T+1 based on the previous purchasing history. In this study, we propose a recommendation model considering both purchase information and purchasing order using RNN which could take sequential pattern into account. The proposed RNN-based recommendation model is shown in Figure 2. In the case of repetitive purchasing pattern over time, the RNN-based recommendation model could learn the temporal changes according to the purchasing order because RNN has a mechanism for storing the previous information in hidden units. To design a RNN model for learning purchase information and purchasing order, data must be input to learn time-dependent features. Since the input of the neural networks should be converted into a vector, each item i q is encoded as a one-hot encoding and the purchase information B u p T of the customer u p is converted into multi-hot encoding by adding one-hot encoding of items at the same purchasing order at time T. x T is represented as a vector converted into a multi-hot encoding of all items in B u p T and the total length of the vector is same as the number of all items. i q is 1 if a customer purchased the item, otherwise it is 0. The output o T is passed through the softmax function and represented as the probability of purchase. The model can be seen as learning the previous purchase information and representing the purchase pattern that will appear at the next time in probability. In the learning process, the difference between the predicted output (o T ) and the actual target (y T ) is calculated by the loss function, here by category cross-entropy, and the weights are updated through back-propagation of the error. Finally, the top-N items with the greatest probability are recommended.

Multi-period Recommender Systems
In this study, the proposed RNN-based recommendation model is evaluated by multiple periods observing the performance of various recommendation time-steps. Traditional recommender systems study only focus on the model accuracy of the next point in time when recommending items to the customer and evaluating model performance, which means they evaluate the accuracy of the model only once. Indeed, customers' preferences may change over time which may degrade the performance of the recommendation model. So, in this study, recommendation periods are segmented as various time-steps, and the proposed RNN-based recommendation model is evaluated by multiple periods in a time sequence. To be more precise, multi-period recommender systems evaluate the performance not only at time point T but also the subsequent time points such as T + 1, T + 2, and so on. Figure 3 as below shows the example of a multi-period recommender system.

183
Since the input of the neural networks should be converted into a vector, each item is encoded as

Evaluation Metrics
Most of the recommender systems measure the accuracy of the recommended items because if the accuracy is not high, it means that the recommended items were not consumed by the users. To measure the accuracy, recall, precision and F1 metrics are widely used in previous studies [4,11,12,18,31]. In this paper, F1 measure is used to measure the accuracy of recommender systems because it considers both precision and recall to compute the score. On the other hand, if the recommendation systems recommend similar items each time, there is a risk of reducing the diversity of the entire consumer, and if the similar items are recommended every time, the satisfaction of the recommendation systems will decrease. Lathia et al. [32] mentioned that diversity should be pursued while maintaining a certain level of accuracy to increase satisfaction with the recommendation systems, and the following diversity metric is suggested.
L1 and L2 are the recommended list and N is the number of recommended items. In this study, this temporal diversity metric is also used to measure the diversity of the recommended list.

Data Description
The data used in this study is transaction data from Fresh Food Delivery Service Company in USA, published in 2017 at the data analysis competition platform Kaggle. As mentioned above, its products' prices are generally low and customers habitually purchase. Therefore, it is a good data set for our experiment because our methodology considers repurchase behavior that other recommender systems are not interested in. Moreover, the transactional data was collected for one year, so we could ignore seasonal factors that could affect model building and performance. The data provides real purchase information for each customer and the order number which is indicated by an index assigned to each customer according to the order of purchase of items. For experiments, the purchased items by customers are arranged according to the order of purchase time, and finally, all the buying information corresponding to the same order number is composed of the same shopping list. In order to compare the recommended performance of the models over time, we used 7716 customers' shopping information with 10 shopping carts. Generally, recommender systems based on these transaction data could not infer with the preference of extremely popular items because they are products almost everyone buys. On the other hand, because sales sub-products are usually purchased by customers with unusual tastes, they could be outliers to build a model. For these reasons, a total of 9073 items were used, except for items that appeared too often or appeared less frequently, so we excluded the top and lower 10% of the sales volume for the experiments. As mentioned earlier, in this study, a recommendation model is measured by multiple recommendation periods. For this purpose, at the recommendation time point T, all the information before point T is regarded as training data and the subsequent buying information is considered as test data.

Experimental Setup
As with most studies on neural networks, we have also experimented various recurrent neural networks structure (basic RNN, LSTM, GRU) and parameters to be used as a recommendation model. First, we experiment with the number of hidden nodes in the basic RNN, LSTM, and GRU. As the number of hidden nodes increases, the initial learning rapidly progresses and quickly converges. However, since the number of parameters to learn increases as the number of hidden nodes increases, the optimal number of hidden nodes is set to 100. Also, since LSTM is slightly better than basic RNN and GRU, LSTM is used as a final model of recommendation in this study. Since the increase in the number of layers does not contribute to the improvement of our suggested model performance, so the number of hidden layers is set to one in our LSTM structure. Also the optimization function should be determined by experiment, as it is known that there is no optimization function that fits into all problems so it should be set on an experimental basis. Figure 4 shows the experimental results of various optimization algorithms. In this figure, when an entire dataset was passed through the neural network model, an epoch was complete. For example, 10 epochs mean that an entire dataset was passed through the model 10 times. Therefore, as the epoch increases, the loss value decreases. Category cross-entropy was used as a loss function. Among optimization algorithms, Adam optimizer which shows the best performance is selected, and the hyper-parameter is set by the value known as the best default value.

257
Category cross-entropy was used as a loss function. Among optimization algorithms, Adam 258 optimizer which shows the best performance is selected, and the hyper-parameter is set by the value 259 known as the best default value.

Experimental Results
The proposed LSTM-based recommendation model and the comparison recommendation model are examined using accuracy metrics when the top five items are recommended in the multi-period perspective. First, we analyze the models by dividing time from various recommendation periods and examining whether the accuracy changes as periods change over time. The comparison model is an item-based CF and popularity model. Item-based CF (represented by CF) is a similarity-based recommendation model that recommends items which are similar to the items purchased by the target user. Popularity model (represented by POP) is a recommendation model that recommends the best-selling items, which is simple but widely applied in many circumstances.
Experiment I: In this experiment, the model is trained with five shopping items for all customers, which are contained from the first to fifth shopping lists. Then the recommendation model is used to predict what customers will purchase in multi-period perspective. The recommendation model is trained at time T with the purchasing history up to time T-1. So, the recommended item list after time T, which is denoted by T+1, T+2 and so on, is also predicted using the model trained at time T. Since each recommendation model, which is trained until time T, is examined from the multi-period perspective, and it is denoted as MP (multi-period) in this study. Figure 5 shows the experimental method and the result of recommendation accuracy measured from a multi-period perspective using a F1 score when recommending the top five items.       The experimental results evaluated by multiple periods from T to T+4 in a time sequence show that the accuracy of the proposed model and the comparison model decreases over time. Overall, the accuracy of the popularity model is as low as about 1%. The popularity model is like a mass marketing strategy which recommends the best-selling items and not a model that provides a personalized recommendation. This result shows that the personalized recommendation model can more satisfy consumer satisfaction. As shown in Figure 5, the F1 score of the proposed LSTM-based recommendation model is higher than that of the CF-based model, and is about 21% higher at T and about 10% higher at T+4. These results show that the LSTM-based recommendation model considering purchase order not only recommends items more accurately than CF-based model, but also predicts more accurately from the multi-period perspective.
Experiment II: In experiment I, the model was trained until time T, and the recommendation results were examined from the multi-period perspective. Experiment II is conducted assuming that the actual purchase information at time T+1, T+2 and so on for all customers is known. In other words, the recommendation model is trained again as the time changes. In these experiments, a moving window method is used which means that the latest five shopping lists are used to train the recommendation model, denoted as MOV (MOVing window) in this study. Figure 6 shows the MOV method (a) and the resulting F1 score (b) when the model recommends top five items.       As can be expected, when the recommendation model learns the actual purchase information, it shows higher accuracy than the recommendation model fixed at T point. The popularity model also shows about 1% accuracy, but with slightly more accuracy using the moving window method. The F1 score of the proposed LSTM-based recommendation model with the MOV method is higher than that of the MP method by about 8% at T+1, 19% at T+2, 29% at T+3 and 33% at T+4. On the other hand, the CF-based recommendation model with the MOV method is higher than the MP method at about 7% at T+1, 13% at T+2, 18% at T+3 and 25% at T+4. These results show that the proposed LSTM-based recommendation model predicts more accurately what customer prefers than the CF-based model. Also, this result implies that customers' preference is changing over time, because the result of moving window is more accurate than that of multi-period method.
Experiment III: In experiment I and II, the training data is set to only five items. To identify whether there is data recency effect, experiment III is conducted, where all previous transaction data at recommending time T are used as training data. We denote this cumulative method as CUM and MOV of LSTM model and CF model.
Experimental results show that the overall F1 score of LSTM-based model is higher than that of the CF-based model. But the graph in Figure 7b shows a different pattern between the LSTM-based model and CF-based model. In the CF-based recommendation model, the moving window method, which learns data of specific window size only, is more accurate than the cumulative method. However, there is no significant difference between the moving window method and the cumulative method in the proposed LSTM-based model. It implies that the LSTM-based model has little effect on data recency compared to the CF-based model. The reason is that the LSTM model could consider long and short-term memory in the process of training process using customers' purchase order.

312
Experimental results show that the overall F1 score of LSTM-based model is higher than that of 313 the CF-based model. But the graph in Figure 7b shows a different pattern between the LSTM-based 314 model and CF-based model. In the CF-based recommendation model, the moving window method, 315 which learns data of specific window size only, is more accurate than the cumulative method.

316
However, there is no significant difference between the moving window method and the cumulative 317 method in the proposed LSTM-based model. It implies that the LSTM-based model has little effect 318 on data recency compared to the CF-based model. The reason is that the LSTM model could consider 319 long and short-term memory in the process of training process using customers' purchase order.
320 Table 1 shows the diversity results of the LSTM-based recommendation model and the CF-based 321 recommendation model. The diversity measure used in this experiment has a value ranging from 0 322 to 1. If the score is closer to one, it means that the result of the recommendation model is more diverse 323 between the previous recommended item list and the subsequent recommended items list.   Table 1 shows the diversity results of the LSTM-based recommendation model and the CF-based recommendation model. The diversity measure used in this experiment has a value ranging from 0 to 1. If the score is closer to one, it means that the result of the recommendation model is more diverse between the previous recommended item list and the subsequent recommended items list. The experimental result shows that the CF-based model recommends almost similar items between the previous and subsequent recommend time, while the proposed LSTM-based model recommends more diverse items. Although the CF-based model recommends more diverse items when using the moving window method compared to the cumulative method, the LSTM-based model has no significant difference between the moving window method and the cumulative method. However, since the moving window method can utilize transaction data more efficiently, so it is thought to be a more effective learning method than the cumulative method, if there is not a significant difference in recommendation accuracy. Furthermore, the diversity of the proposed model is relatively higher even in the condition that items purchased previously are also recommended. This result implies that the LSTM-based model not only considers well customers' purchasing orders but also captures well customers' purchasing patterns so it could suggest more diverse items.
Experiment IV: The results of the previous experiments show that when knowing the actual purchasing information, it is helpful to recommend more accurate items. But in the real world, when recommending items at time T+1, T+2 and so on, the actual purchasing information is unknown like the multi-period method. Also, as shown above, in the multi-period method, the results of the recommendation model at time T is also used at a subsequent recommendation time (T+1, T+2 and so on), which leads to degrading of the recommendation quality over time in a long-term perspective. Therefore, we propose another method of predicting and recommending items that are expected to be purchased from a long-term perspective when the customer is going to buy the recommended items of the model. We assume that the customer purchased the top five items recommended by the model at the previous time. Then, the top five items are used as training data to retrain the model at the next time. For the comparison with the multi-period method, this method which uses both all the previous transaction data and the model's top five recommended items as training data is called the cumulative multi-period method, which is denoted as CUMMP in this study. On the other hand, the moving window multi-period method, which is denoted as MOVMP, excludes the oldest transaction data as training data. MOVMP uses both only the latest transaction data and the model's top five recommended items as training data. As shown in Figure 8, the actual purchase information after the time T is unknown as the multi-period method. However, in the CUMMP and MOVMP method, the unknown data are replaced with the model's recommended top five items. Figure 9 shows the comparison results of the proposed LSTM-based recommendation model and the CF-based recommendation model, specifically the F1 score, among the MP, CUMMP and MOVMP method.  As can be expected, both the proposed LSTM-based recommendation model and CF-based model show that in terms of long-term prediction recommendation accuracy decreases with time. The LSTM-based recommendation model has about a 28%, 28%, 29% accuracy decline in MP, CUMMP and MOVMP methods respectively. Meanwhile, the CF-based recommendation model has about a 21%, 24%, 24% accuracy decline in MP, CUMMP and MOVMP methods respectively. The LSTM-based model shows a slightly larger decline, but the recommendation accuracy is still high. Also, there is no significant difference among MP, CUMMP and MOVMP methods in the proposed LSTM-based model, but the CF-based model shows that the accuracy of the CUMMP and MOVMP methods is found to be further decreased than the MP method. Since the CF-based model using CUMMP or MOVMP methods uses the model's recommended items (together with purchased items) as training data, it can be interpreted that the CUMMP or MOVMP methods do not reflect the changing preferences of customers compared to MP. On the other hand, while the accuracy of the LSTM-based recommendation model is high even in the CUMMP or MOVMP methods compared to CF, however, it is found that there is no significant difference compared to the MP method. From experiment IV, we conclude that the LSTM-based recommendation model gives robust results compared to CF-based model, because the LSTM-based model gives almost the same result with purchase data only and purchase data with predicted data.   shows that considering the purchase order not only helps to improve the recommendation quality 393 Figure 9. Evaluation results of MP, CUMMP and MOVMP by F1 score.

Conclusions
In this study, Recurrent Neural Network, which is specialized in time series data analysis, is applied to recommendation model. The proposed LSTM-based recommendation model is comparatively evaluated with the following two models; item-based collaborative filtering model, which is widely used in the recommender systems research as a benchmark system, and the popularity model, which is relatively simple but still used in the business field. Recommendation periods are segmented as various time-steps, and the proposed LSTM-based recommendation model is evaluated by multiple periods in a time sequence. Real online transaction data of a fresh food delivery market is used as a data set and the recommendation model's performance is evaluated by accuracy and diversity from a multi-period perspective.
The experimental results are as follows. First, the LSTM-based recommendation model outperformed the CF-based model in a multi-period perspective. Precisely, the proposed LSTM-based recommendation model is about 21% higher at T and about 10% higher at T+4. This result shows that considering the purchase order not only helps to improve the recommendation quality but also gives a more accurate prediction in multi periods. In addition, the proposed LSTM-based recommendation model recommends more diverse items than the CF-based model. It implies that the proposed model captures the customers' purchase patterns well and offers various items to customers. Also, experimental results show that the proposed model has no significant difference in model accuracy regardless of the size of the training data, which represents the robustness of the proposed model. However, even if there is no significant difference in recommendation accuracy and diversity, it is better to use the data efficiently through the moving window method, considering the cost of learning the LSTM-based recommendation model. Finally, in the perspective of long-term predictions, both the LSTM-based model and the CF-based model results in decreasing recommendation accuracy over time, but the accuracy curve of the LSTM-based model is of more gradual descent than that the CF-based model.
This study extended the recommendation periods as various time-steps and evaluated the performance by multiple periods in a time sequence. Unlike previous recommender systems researches, which focus on recommendation accuracy at the single point of view, we compare the accuracy of the recommendation model with multiple periods and show that the proposed model has a better performance even at a multi-period perspective.
In applying the suggested recommendation methodology in the real market, it will be necessary to set the update frequency of the recommended model. In the actual stage of operation, the model is suggested to be updated daily or weekly to reflect the changing preferences of users to increase the accuracy and diversity of recommendation results. But the exact update frequency is to be decided by several experiments with a real data set, and may be different from the characteristic of items, number of items, number of customers, and the average time between two sequenced purchases.
In terms of sustainability, highly accurate multi-point recommendations that reflect changing customer preferences can help market managers prevent products with a very short shelf life, such as fresh vegetables, from being discarded. Furthermore, as our suggested multi-point recommender system is a kind of decision support system, it could help customers to make rapid routine decisions and save their time and money. Moreover, we expect that our study contributes to the customers' reduction of food wastes by inducing planned consumption.
In this study, online-based retail transaction data are used, but applying the LSTM-based recommendation model to more diverse transaction data set is a promising future research area. In addition, it will be also a promising research topic to combine the CF-based model with the proposed RNN-based recommendation model.