Detecting the Proportion of Traders in the Stock Market: An Agent-Based Approach

In this research, an agent-based model (ABM) of the stock market is constructed to detect the proportion of different types of traders. We model a simple stock market which has three different types of traders: noise traders, fundamental traders, and technical traders, trading a single asset. Bayesian optimization is used to tune the hyperparameters of the strategies of traders as well as of the stock market. The experimental results on Bayesian calibration with the Kolmogorov–Smirnov (KS) test demonstrated that the proposed separate calibrations reduced simulation error, with plausible estimated parameters. With empirical data of the Dow Jones Industrial Average (DJIA) index, we found that fundamental traders account for 9%–11% of all traders in the stock market. The statistical analysis of simulated data can produce the important stylized facts in real stock markets, such as the leptokurtosis, the heavy tail of the returns, and volatility clustering.


Introduction
The study of a complex system has always been of great interest because of thier rich properties and behavior, for example, the solar system [1] or biological systems [2]. By analyzing such natural systems, people discovered fundamental laws that help them to build other systems, such as computer networks [3] or power-grid systems [4]. A financial market can also be considered as a complex system. For tens of years, researchers have been faced with the problem of the analysis and modeling of financial markets [5][6][7]. One of the most important questions is "How can we develop an artificial stock trading system and describe investors' behaviors in a way that is similar to reality?". Previous models are based only on a single fully rational representative agent and fail to reproduce all the properties of real markets [8,9]. To overcome these problems, a new behavioral approach emerged, referred to as agent-based modeling, characterized by markets populated with bounded rational, heterogeneous agents [10].
An agent-based model (ABM) in the financial market is a class of quantitative models to simulate the decisions and interactions of different traders in order to understand their behaviors and their impact on the market [11]. In recent years, ABM in financial markets is still a prominent topic of interest for researchers [6,12]. The behavior of the agents is described in a variety of ways; their interaction and the

Background
In the financial market, there are many types of agents (investors, banks, central banks, etc.). These agents can interact directly through trading, exchanging derivatives. They can also interact indirectly, because any decision or behavior of an agent affects others [18]. Moreover, the agents have different sensitivities to exogenous factors in the market and thus different news can influence and impact them [19]. Therefore, financial market modeling needs to understand the characteristics of agents such as behavior and interaction between investors and the impact of exogenous factors such as public news on the market [13,20]. The studies in this paper will focus on clarifying these properties.
The first approach addresses the types of agents and how they impact the dynamics of the market. Pioneering works have proposed a number of interesting agent-based financial market models. The concept of zero-intelligence traders or noise traders who decide randomly whether to buy or sell was introduced in a double auction market [21]. The studies of market participants using technical and fundamental analysis to assess financial markets creates a strong empirical foundation for building financial market models [22]. Technical traders are considered to be a factor of instability because their trading often adds a positive response to the dynamics of the market [23]. In contrast, fundamental analysis based on its definition provides a stabilizing impact on market dynamics [24].
The second approach that should be considered is the diversity of agents. To make every fundamental agent and technical agent unique, they are assigned with different memory lengths or reaction intensities to price and fundamental changes [13]. Exogenous factors such as news, social media, and insider information [25] also have different impacts on each agent. In addition, the sentiments of agents such as risk aversion play an important role in the decision-making process [14,26].
The third approach is the interaction between agents. Besides the interactions that directly affect market prices, investors' decisions also have an indirect effect on each other. In [27,28], the authors introduced the propagation in the decision of investors with a fixed probability. In another study [13], the interactions and diversity among fundamental traders and technical traders were introduced. The average utility associated with each type was calculated and agents could rely on profit to come up with an effective investment strategy.
Each of these approaches has been extended in various interesting directions. From that, the stylized facts of financial markets are highlighted, such as heavy tails in stock return distribution and volatility clustering [10,28]. Another proposed stylized fact is the positive autocorrelation in volatility and trading volume [13,29].
One important issue is how to tune hyperparameters for traders' strategies and the market. Grid search is a method to list all combinations of hyperparameters, then perform a model test with this list [30]. Random search only randomly selects a finite number of hyperparameters from the list to conduct a model test [31]. Theoretically, grid search can find the global value, but as the number of parameters increases, this becomes increasingly impossible due to the time and cost of implementation. Random search does not run as many cases as grid search, so it is significantly faster [31]. However, depending on its randomness, each run will receive a different optimal value because it does not guarantee the optimum value to be global or local only. Bayesian optimization is an adaptive approach to parameter optimization, trading off between exploring new areas of the parameter space and exploiting historical information to find the parameters that maximize the function quickly [30]. Thus, Bayesian optimization is introduced to tune hyperparameters for the trading strategies of technical traders and the environment of our model.

The Model
This section outlines the construction of a new agent-based model. This model adds the understanding of the reason for the stylized facts and the estimated proportion of traders in the stock market.

Environment
The agent-based model in this paper focuses on analyzing the behavior of traders and the log return properties which are reflected in the simulation process. The model considers a fixed amount of agents, denoted as N, trading a single asset. The current time is denoted with t and the corresponding price is S(t). At each time step t, trader i can choose between three actions: buy/sell one unit of the stock or hold, s i (t) ∈ {+1, −1, 0}, respectively. For liquidity purposes, the model assumes that every trader always has enough wealth and shares-in other words, they are always able to buy or sell. The market price and market return are then calculated. The market price S(t) is updated according to and return r(t) is determined according to where the excess demand D(t) = ∑ N i=1 s i (t), the price impact function g(z) = arctan(z/λ), and λ measures the market depth or liquidity.
Due to the tendency to form discrete clusters of traders [32], we define different types of traders in this model based on their predefined trading rules. There are three different types of traders operating in the market. Noise traders are people who make decisions at random according to their news reactions and their propensity to sentiment contagion. Technical traders analyze charts and make decisions based on current patterns and trends analysis, and fundamental traders trade based on the fundamental profit-generating potential of the stock.

Noise Agent
Noise agents are traders who trade on the basis of misunderstanding information and news regarding future prices. They make decisions and trade based on inaccurate analysis of the market [33]. In this model, noise agents rely only on the current market situation to make trading decisions: buy, sell, or hold [13]. As the key variable, the sentiment of the noise agents implies their reaction to the news they get [20]. Good sentiment means that agents see the news as good news and hope future prices will increase (bullish) and bad sentiment means that the agents consider the news to be bad and hope prices in the future will decrease (bearish).
The decision of noise agents is determined by the following process. At each time step t: • Agents receive public information as a signal I(t).

•
Each agent i compares the signal to his threshold θ i (t) to make a decision.

•
After calculating the market return, he updates his threshold with a probability p i .
The inflow of news arrives to the market as a signal I(t) with I(t) ∼ N(0, σ 2 I ). Each agent will decide whether the news is significant or not, in which case he will make a decision according to the sign of I(t). To describe the heterogeneity, we introduce the sentiment of each agent to the news as a decision threshold θ i (t) with the initial trading threshold set between 1 and 2 times the standard deviation of the news, Let N noise denote the number of noise agents. The trading rule of each noise agent i, i ∈ N noise , is represented by Market-related sentiments are formed differently for each agent and each responds differently to the information. To avoid the artificial ordering of agents as in sequential choice models, we follow [20] to generate an asynchronous strategy to update the agents' threshold. At each time step, each agent i will update his threshold θ i (t) with probability 0 ≤ p i ≤ 1. The threshold is set equal to the absolute return at time t, |r(t)| = |ln S(t) S(t−1) |. Introducing independent and identically distributed random variables u i (t), i ∈ N noise , uniformly distributed on [0, 1], the updating scheme is represented as In the next addition to the model, technical agents and fundamental agents are introduced.

Technical Agent
Unlike the noise agents, technical agents are traders who use the past prices to infer private information. They make decisions by calculating the technical indicators from historical prices [34]. The characteristic of technical analysis is to make decisions after identifying the trends at an early stage and reverse the decision when the trend reversal occurs [35]. In this model, the agents are assumed to use a technique in the real world called a moving average-oscillator (MAO) to predict future price moves [36]. The technique first involves taking two moving averages of the price, a short-term moving average A and a long-term moving average B, of different lengths l A < l B . The moving average crossover is calculated as the difference M(t) between these two moving averages. M(t) > 0 indicates that the price is trending upwards and the agents will make a buy. M(t) < 0 indicates that price is trending down and the agents will sell the stock.
In order to account for individual heterogeneity, each agent can use different lengths of historical data, so one moving average strategy may indicate an uptrend while another moving average strategy indicates a downtrend. However, in our model, the difference in window lengths, l A and l B , has no effect on agents' trading decision. When the historical stock price data has a strong uptrend or downtrend, all investors will make the same decision. Thus, we assume that all technical agents can use Bayesian optimization to estimate the optimal length for short and long windows.
We formulate an optimization problem with three main ingredients: • Domain space: The range of each hyperparameter is selected. Following the studies in [37], our approach starts the domain from a wide range and then focuses on specific areas around the optimal parameters calculated from the previous run. Thus, each agent is assigned a random short window and long window derived from a random uniform distribution in [5,40] and [50, 100], respectively. • Objective function: We define a score function that indicates how well a set of long-short periods performs. Cumulative returns and Sharpe ratio are usually used to indicate the performance of a technical strategy [38]. In our model, the Sharpe ratio is used as the evaluation metric of choice. We put a minus sign before the objective function since Hyperopt by default is defining a function to minimize. • Surrogate function and selection function: The surrogate function is used to propose sets of values that increase performance in minimizing the score of the objective function. Then, they are selected by applying a criterion. In this model, we follow [30] to use the tree Parzen estimator as the surrogate function. The expected improvement criterion is used as the selection function.
The optimal lengths for short and long windows with the largest Sharpe ratio are 35 and 51, respectively. The buy/sell signals are shown in Figure 1. To describe the heterogeneity, we introduce ψ i (t) as the average profit a technical agent i gained in the previous n i trading day, ψ i (t) = ∑ n i j=1 1 n i r(t − j), which is considered as a threshold to make a decision. Each agent i will update her threshold ψ i (t) with probability 0 ≤ p i ≤ 1 at each time step t. The updating scheme is represented as in the description of noise agents.
Technical agents are also affected by the public information like the noise agents. The news reactions Q i (t) are represented by Define the sign function, sign(x), as Let N ta denote the total number of technical agents. The trading rule of the technical agent i, i ∈ N ta , is represented by The amplification of trends and unlimited buying power of technical agents can cause problems because the price can grow to infinity or drop to extremely small values very quickly. Thus, fundamental agents are necessary to keep the market reasonably stable.

Fundamental Agent
Fundamental agents are traders who measure the intrinsic value by analyzing the accounting, finance, and economics of the stock [39]. They make decisions with the belief that the stock price will return to its fundamental value in the long-run. The fundamental value of the stock f t is public to all agents and is assumed to follow a random walk process where the fundamental shocks η ∼ N(0, σ 2 η ). At each time step t, fundamental agents make a decision based on the difference between S(t) and f (t). Therefore, if the price is below (above) its fundamental value, the agents will make a buy (sell) order. This strategy shortens the difference between price and its fundamental value. They help to stabilize the market and cause the opposite effect on prices for technical agents. In order to account for individual heterogeneity, they are given heterogeneous beliefs about the fundamental price for more interesting dynamics [40].
Let N f a denote the total number of fundamental agents. The trading rule of the fundamental agent i, i ∈ N f a , is represented by where κ is a positive coefficient that describes the speed of adjustment. The idiosyncratic term i embodies a random term that accounts for each individual's own interpretation. We take it as being normally distributed around zero and with a standard deviation controlled by the user, i ∼ N(0, σ 2 ).
Finally, all agents are put together in the artificial financial market. This includes N noise noise agents, N ta technical agents, and N f a fundamental agents. The total number of traders is N,

Data Description
As an empirical basis, we used a data set that covers 6 years of daily quotes of the DJIA Index, from of January 2013 to January 2019. This period contains 1508 daily returns, of which the main statistical properties are given in Table 1. The analysis focused on stylized facts of the distribution of stock returns and the volatility clustering in the market. Based on this data set, we estimated the proportion of each type of trader in our artificial stock market.

Model Calibration
We constructed an artificial stock market of N traders. Each type of trader haD a minimum number of 10 and a maximum of 100. The trading period was considered as days, thus the results can be compared with properties of daily returns. We chose λ ∈ [5,20] which means the minimum value of daily returns was 5% and the maximum value of daily returns was 20%. We denote 1/p i as the latency of an agent i when making his decision θ i (t). p was chosen within [0.001, 0.1] to be interpreted as days. For example, q = 0.1 means the average updating period of agents was 100 days. The number of days each technical agent collected to calculate profit was chosen within [5,22], which describes the interest in the returns of traders in the range from 1 week to 1 month. The short and long windows for the technical strategy of all technical traders are given in Section 2, l A = 35, l B = 51. The amplitude σ I of the noise of the news arrival process can be chosen in the range [0.001, 0.01] to reproduce a realistic range of values for the volatility. The standard deviation of the noise process in fundamental value, σ η , was calculated from the standard deviation of the real price (σ p η = 3255.43), which is in [0.5σ p η , 1.5σ p η ]. The value of the standard deviation in idiosyncratic term was chosen in the range [0.001, 1]. The results discussed in the next section are generic within these parameters' ranges.
For clarity, we summarize the parameters and calibration ranges used in our model in Table 2. We formulated an optimization problem using the "hyperopt" package of Python with the three main ingredients: • Domain space: Each trader was assigned initial values derived from random uniform distributions within the range of Table 2, respectively.

•
Objective function: KL divergence and KS test were used as the evaluation metrics of choice. For this run, the algorithm found the best value of hyperparameters (the one which minimized the loss) in just under 100 trials. • Surrogate function and selection function: Tree Parzen estimator was used as the surrogate function. Then, the expected improvement criterion was used as the selection function. Both KL divergence and KS test found the optimal set of parameters with small simulation error as shown in Table 3. The proportion of each type of trader with KL and KS tests were: N noise : 12.2%, N ta : 78.3%, N f a : 9.6% and N noise : 13.6%, N ta : 75.6%, N f a : 10.6%, respectively. Note that we did not compare values of KL and KS due to differences in measurement definitions. In this work, we picked out the better objective function by investigating their stability over many simulations. KL divergence represents the instability in many simulations; the minimum value ranged from 0.08 to 4.5. In contrast, the KS test showed stability with small fluctuations in the range of 0.03 to 0.06. The stylized facts which we found to be present in the simulated returns are discussed below. Moreover, we considered many different sets, which contained fixed numbers of different types of traders in ABM. The results in Table 4 show that the KS test had the lowest value when there were three types of traders in the market. Thus, the KS test is an effective objective function to measure simulation errors in the Bayesian calibration of our model.

Simulation Results
In this section, we consider a stock market with three types of trader: noise, technical, and fundamental traders, with the KS test as the objective function for calibration. In the first stage, only the noise traders joined the market since the technical traders and fundamental traders need to collect the historical data. In the second stage, after a few days (here we chose the number of days equal to the optimal long window), technical traders and fundamental traders joined the market. The calibrated parameter set is shown in Table 3. Figure 2 shows the daily movement of actual return and simulated return.

Distribution of Returns
The statistical analysis of the actual returns and simulated returns is given in Tables 5 and 6. The simulated returns had similar properties to the real ones. In financial data, the heavy tail of the distribution of returns is displayed by a positive kurtosis [41]. We found that the Shapiro-Francia test on actual returns and simulated returns rejected the normality at a significance level of 0.1% with p-values of 2.2 × 10 −22 and 1.5 × 10 −12 , respectively. The quantile-quantile (Q-Q) plots in Figure 3 are also show the same results. In Figure 4a, the distribution of actual and simulated returns do not display the heavy tail clearly on linear axes. Thus, we change the vertical axis to use logarithmic scaling in Figure 4b to zoom in on the tails. The returns produced by the model were found to be consistently leptokurtically distributed with a heavy tail.       Figure 2 shows the plots of stock returns for both actual data and simulated data. We studied the stationarity of the stock returns by using an augmented Dickey-Fuller (ADF) unit root test. The results are shown in Table 7. The p-values of the ADF test statistic for the real returns and the simulated returns are 2.31 × 10 −30 and 3.44 × 10 −30 , respectively. Since both values are smaller than 0.01, we rejected the null hypothesis that they are non-stationary. In addition, the autocorrelation function (ACF) plots in Figure 5 show the same results, and so we can conclude that there was no correlation between the returns and its lags.

Volatility Clustering
The next stylized fact we studied was volatility clustering, which describes the the tendency of large changes in prices to cluster together. The volatility clustering can be seen in Figure 6, where the ACFs of the absolute value of returns decay slowly. Notice that for both cases, the autocorrelations of absolute returns remained significantly positive over many time lags. This is clear evidence of volatility clustering.

Discussion and Concluding Remarks
In this work we proposed a novel approach to calibrate hyperparameters in an agent-based model with different types of traders. The Bayesian calibration with Kolmogorov-Smirnov test proposed an optimal set of parameters. The model was able to highlight some of the important stylized facts of stock returns, such as the leptokurtosis, the heavy tail of the returns, and volatility clustering.
Our Bayesian approach is different from the recent studies on the calibration of ABMs applied to financial market [12,15,16]. The calibration considers large parameter sets which can become quite a complex and costly operation. In particular, in [15], the authors generated 400,000 initial points to calibrate their model. In [12], the amount of data was reduced due to the high computational complexity of the Island model. Our approach is based on Bayesian rules and looks at previous experience to scale down the search space. Thus, the calibration requires less time and iterations to get the optimal hyperparameters. Moreover, the calibrated results showed that fundamental traders accounted for 9%-11% of all traders in the United States stock market. The proportion of traders in the stock market provides an additional indicator so that traders can come up with effective strategies.
Although Bayesian approaches provide several advantages, the complexity of the objective function could be a problem since it still has its own costs. In our model, the Sharpe ratio used as the objective function is very simple to calculate. However, if we want to consider a complex model with more types of investors and interactions between them, the dimension of domain space will increase. Optimal models require different settings between datasets, and with a high-dimensional space, the correlation between hyperparameters is difficult to study.
This work is only the first step toward fully assessing the properties of agent-based models in the United States stock market. In future research, the model in this research can be applied to other exchange such as the New York Stock Exchange or emerging markets in Asia. The different characteristics of the market and investors provide many interesting results for managers and investors. Moreover, deep learning is a good methodology for dealing with high-dimensional problems. Reinforcement learning can be considered to address Bayesian optimization issues [42], in which our Bayesian approach can be used as a benchmark method for the comparison.