The Outperformance Probability of Mutual Funds

: We propose the outperformance probability as a new performance measure, which can be used in order to compare a strategy with a speciﬁed benchmark, and develop the basic statistical properties of its maximum-likelihood estimator in a Brownian-motion framework. The given results are used to investigate the question of whether mutual funds are able to beat the S&P 500 or the Russell 1000. Most mutual funds that are taken into consideration are, in fact, able to beat the market. We argue that one should refer to differential returns when comparing a strategy with a given benchmark and not compare both the strategy and the benchmark with the money-market account. This explains why mutual funds often appear to underperform the market, but this conclusion is fallacious.


Motivation
The total value of assets under management in open-end funds was 46.7 trillion US dollars at the end of 2018 (ICI 2019). Nowadays, investors and investment advisors have access to an abundant number of performance measures in order to compare investment vehicles with each other. Most performance measures are based on the seminal work of Jensen (1968); Lintner (1965); Mossin (1966); Sharpe (1964Sharpe ( 1966; Treynor (1961). The best-known performance measure is the Sharpe ratio, which divides the expected excess return on investment by the standard deviation of the excess return. Most other performance measures that can be found in the literature are based on the same principle, i.e., they divide the return on investment by its risk, where the precise meaning of "return" and "risk" differs from one performance measure to another. A typical example is the expected excess return to value at risk, which goes back to Alexander and Baptista (2003) as well as Dowd (2000). Other examples are the conditional Sharpe ratio, in which case one chooses the expected shortfall, the tail conditional expectation, or similar downside risk measures as a denominator (Baweja et al. 2015;Chow and Lai 2015). The number of possible combinations of numerators and denominators is practically unbounded. Depending on whether one chooses either the expected excess return or the first higher partial moment as a numerator and some lower partial moment, drawdown measure, or some value-at-risk measure as a denominator, the resulting performance measure is called Omega (Keating and Shadwick 2002), Sortino ratio (Sortino et al. 1999), Calmar ratio (Young 1991), Sterling ratio (Kestner 1996), or Burke ratio (Burke 1994), etc. A nice overview of those performance measures can be found in Table 1 of Eling and Schuhmacher (2006). However, it is evident that any list of return-to-risk measures must be far from exhaustive.
In this work, we propose a new performance measure called outperformance probability (OP). It differs in many aspects from return-to-risk measures: 1.
The OP compares some strategy with a specified benchmark, which need not necessarily be the money-market account.

2.
It is a probability. Thus, it is easy to understand also for a nonacademic audience, more precisely, for people who are not educated in statistics or probability theory. 3.
The holding period of the investor is considered random. This enables us to compute the performance of an investment opportunity for arbitrary liquidity preferences.
Ad 1: Most financial studies that make use of performance measures compare, e.g., the Sharpe ratio of some strategy, according to Sharpe (1966), with the Sharpe ratio of a benchmark. The Sharpe ratios measure the outperformance both of the strategy and of the benchmark with respect to the money-market account, i.e., the riskless asset. Hence, one needs to calculate two performance measures to evaluate the given strategy. By contrast, our approach is based on the idea of analyzing differential returns, not necessarily excess returns. 1 This idea goes back to Sharpe (1994). To be more precise, since we do not use the money market as an anchor point, we need only one measure in order to compare two different investment opportunities. We will see that this sheds a completely different light on the question of whether or not one should prefer actively managed funds to passively managed funds. This fundamental question is still discussed in the finance literature and it is a commonplace that most fund managers are not able to beat their benchmarks after accounting for all management fees and agency costs. We will come back to this point in Section 3.1. However, our empirical study reveals that most actively managed funds, in fact, are able to outperform their benchmarks. The reason for this observation might be explained as follows: The difference between the performance of two investment opportunities is not the same as the performance of one investment opportunity compared to another. Simply put, in general, performance measures are not linear. Let R S be the return on some strategy, R B be the return on a benchmark, and r be the risk-free interest rate after some holding period. Further, let π be any performance measure. Hence, the performance of the strategy with respect to the money-market account is π(R S − r), whereas π(R B − r) is the corresponding performance of the benchmark. Now, the problem is that, in general, We suggest to calculate π(R S − R B ) in order to understand whether or not the strategy is better than its benchmark and not to compare π(R S − r) with π(R B − r). By comparing π(R S − r) with π(R B − r), one investigates the question of which of the two investment opportunities is better able to outperform the money-market account. However, the fact that one investment opportunity is better able to outperform the riskless asset than another investment opportunity does not imply that the first investment opportunity outperforms the second one. This observation is crucial and we will provide an analytical example in Section 2.4.
Ad 2: We do not claim that the OP is more practicable than other performance measures. Indeed, given the computational capacities that are nowadays available, it is quite easy to compute any other performance measure as well. Instead, we refer to a social problem that is frequently discussed in statistical literacy, namely that a major part of our population has no clear understanding of basic statistics. Hence, most people cannot comprehend the precise meaning of "expected value," "standard deviation," or "variance," etc. Presumably, everybody of us knows that it is hard to explain the difference between the sample mean and the expected value to the layman. The same holds true regarding the empirical variance, sample variance, and population variance, etc. Further, our own experience shows that, at the beginning of their studies, many finance students have problems to understand the distinction between an ex-ante and an ex-post performance measure, which can be 1 Each excess return is a differential return but not vice versa.
attributed to the fact that they are struggling with statistics and probability theory. 2 Thus, we are especially concerned about the problem of financial literacy, not only regarding an academic audience. According to the OECD (2016), the "overall levels of financial literacy, indicated by combining scores on knowledge, attitudes and behaviour are relatively low" in all 30 countries that participated in the OECD study. More precisely, it is reported that "only 42% of adults across all participating countries and economies are aware of the additional benefits of interest compounding on savings" and that "only 58% could compute a percentage to calculate a simple interest on savings." Moreover, "only about two in three adults [. . . ] were aware that it is possible to reduce investment risk by buying a range of different stocks" and, in some countries, "no more than half of respondents understood the financial concept of diversification." The OECD study reveals a very low level of numeracy: Many people even are not able to calculate the balance of an account after an interest payment of 2%. Similar results can be found in the comprehensive review article by Lusardi and Mitchell (2013). For example, many people cannot say whether single stocks or mutual funds are more risky. 3 In the light of these findings, we conclude that it is clearly impossible for most people in our population to comprehend the core message of a return-to-risk measure. The OP is a probability and thus it can be much better understood by a nonacademic audience-at least in an intuitive way. Hence, it might help bridging theory and practice.
Ad 3: Performance measures usually presume a fixed holding period. It is well-known that a given performance ratio cannot be extended to other holding periods without making any simplifying-and sometimes quite problematic-assumptions, which are frequently discussed in the literature see, e.g., Lin and Chou (2003); Lo (2002); Sharpe (1994). For example, suppose that the value process {S t } t≥0 of some asset follows a geometric Brownian motion with drift coefficient µ ∈ R and diffusion coefficient σ > 0. 4 We may assume without loss of generality that the instantaneous risk-free interest rate per year is zero. Hence, the (excess) return on the asset at time T > 0 is given by which means that the expected return is E(R) = e µT − 1. Further, the variance of the return amounts to Var(R) = e σ 2 T − 1 e 2µT . We conclude that the Sharpe ratio of the strategy is Figure 1 depicts the Sharpe ratio of two different assets depending on the holding period T. Asset 1 possesses the parameters µ 1 = 0.1 and σ 1 = 0.2, whereas Asset 2 has the parameters µ 2 = 0.2 and σ 2 = 0.3. We can see that the Sharpe ratio is essentially determined by the time of liquidation. Interestingly, the Sharpe ratio is not linear and even not monotonic in time. If we use the Sharpe ratio as a performance measure, it can happen that we prefer Asset 1 for shorter holding periods but Asset 2 for longer holding periods. 5 Hence, the optimal choice between Asset 1 and Asset 2 heavily depends on the investment horizon. That is, in order to give a clear recommendation, we must know the holding period of the investor. However, most investors do not liquidate their strategies after a fixed period of time. More precisely, the time of liquidation is not known to them in advance, which means that it represents a random variable. We take the individual liquidity preference of the investor into 2 It is a matter of fact that some students do not understand even the distinction between parameter and estimator after attending a statistics course. 3 According to Lusardi and Mitchell (2013, pp. 15-16), we can expect that a large number of people cannot understand the question at all because they are unfamiliar with stocks, bonds, and mutual funds. 4 From now on, we will omit the subscript "t ≥ 0" for notational convenience. 5 In the given example, the critical time point is, approximately, T = 5 years. account by specifying a holding-time distribution. Our approach is time-continuous, whereas most other performance measures are based on a one-period model. This clearly distinguishes the OP from other performance measures and we think that this is one of our main contributions to the literature. funds that try to beat the S&P 500 or the Russell 1000 stock-market index. We decided to choose these funds because they focus on growth stocks in the US, have a large amount of assets under management, and their issuers enjoy a good reputation. The question of whether or not it is worth investing in actively managed funds at all is essential because if they do not outperform their benchmarks, market participants might be better advised to refrain from investing their money in those funds in favor of exchange traded funds (ETFs). More precisely, they should prefer so-called index ETFs, which aim at tracking a stock-market index. Analogously, if it turns out that the mutual funds are not able to outperform the bond market, it could be better to buy some US treasury-bond ETFs. 6 Further, it could happen that the money-market account is preferable and, in the worst case, it is even better to keep the money under the mattress.
What are the merits of the OP compared to other performance measures?
1. The first one is conceptual: (a) A simple thought experiment reveals that comparing two performance measures with one another, where each one compares an investment opportunity with the moneymarket account, can lead to completely different conclusions than evaluating only one performance measure that compares the given investment opportunities without taking the money-market account into consideration at all. The former comparison refers to the question of which of the two investment opportunities is better able to outperform the money-market account, whereas the latter comparison refers to the 6 The reason why we focus on ETFs is discussed in Section 3.1. In order to test the OP of different trading strategies, we selected 10 actively managed mutual funds that try to beat the S&P 500 or the Russell 1000 stock-market index. We decided to choose these funds because they focus on growth stocks in the US, have a large amount of assets under management, and their issuers enjoy a good reputation. The question of whether or not it is worth investing in actively managed funds at all is essential because if they do not outperform their benchmarks, market participants might be better advised to buy exchange traded funds (ETFs). More precisely, they should prefer so-called index ETFs, which aim at tracking a stock-market index. Analogously, if it turns out that the mutual funds are not able to outperform the bond market, it could be better to buy some US treasury-bond ETFs. 6 Further, it could happen that the money-market account is preferable and, in the worst case, it is even better to refrain from financial markets at all.
What are the merits of the OP compared to other performance measures?
1. The first one is conceptual: (a) A simple thought experiment reveals that comparing two performance measures with one another, where each one compares an investment opportunity with the money-market account, can lead to completely different conclusions than evaluating only one performance measure that compares the given investment opportunities without taking the money-market account into consideration at all. The former comparison refers to the question of which of the two investment opportunities is better able to outperform the money-market account, whereas the latter comparison refers to the question of whether the first investment opportunity is able to outperform the second. In general, performance measures are not linear and so the former comparison does not imply the latter, i.e., an investment opportunity that is better able to produce excess returns than another investment opportunity need not be better than the other. (b) Most performance measures presume that the holding period of the investor is fixed.
This assumption is clearly violated in real-life because investors usually do not know, in advance, when they will liquidate all their assets. We solve this problem by incorporating any holding-time distribution, which specifies the individual liquidity preference of the 6 The reason why we focus on ETFs is discussed in Section 3.1.
investor. It can be either discrete or continuous and it can have a finite or infinite right endpoint. Any fixed holding period can be considered a special case, which means that we can treat one-period models, too.

2.
The second one is empirical: (a) The natural logarithm of the assets under management of the mutual funds that are taken into consideration are highly correlated with their inverse coefficient of variation (ICV). The ICV is a return-to-risk measure, which is based on differential log-returns but not (necessarily) on excess log-returns, and in the Brownian-motion framework it is the main ingredient of the OP. This means that capital allocation and relative performance are strongly connected to one another, which suggests that market participants take differential (log-)returns implicitly into account when making their investment decisions. (b) We emphasize our results by comparing the p-values of the differences between the Sharpe ratios of all mutual funds and the Sharpe ratios of the given benchmarks. The p-values indicate that it is hard to distinguish between the former and the latter Sharpe ratios. For this reason, we cannot say that any fund is better than the benchmark by comparing two Sharpe ratios with one another. A completely different picture evolves when considering the p-values of the ICVs of all funds with respect to their benchmarks. Those p-values are much lower and economically significant, i.e., it turns out that most funds are able to beat their benchmarks.
The rest of this work is organized as follows: In Section 2 we present and discuss the OP. To be more precise, Section 2.1 contains our basic assumptions and our general definition of the OP. In Section 2.2 we investigate its theoretical properties, whereas in Section 2.3 we derive the statistical properties of our maximum-likelihood (ML) estimator for the OP. Section 2.4 contains a general discussion of the OP compared to other performance measures. Further, Section 3 contains the empirical part of this work, in which we investigate the question of whether or not mutual funds are able to beat their benchmarks. More precisely, in Section 3.1 we discuss some general observations related to the performance of mutual funds and index ETFs, whereas Section 3.2 contains our empirical results for different holding-time distributions. Our conclusion is given in Section 4.

Basic Assumptions and Definition
Let {S t } be the value process of some trading strategy and {B t } be the value process of a benchmark. We implicitly assume that {S t } and {B t } are, almost surely, positive. The strategy starts at time t = 0 and stops at some random time T > 0. Hence, T can be considered the time of liquidation and is referred to as the holding period. Throughout this work, time is measured in years. Moreover, we suppose that S 0 = B 0 = 1 without loss of generality.
Definition 1 (Outperformance probability). The OP is defined as Π = P S T > B T .
Hence, Π is the probability that the value of the strategy will be greater than the value of the benchmark when the investor liquidates his strategy. The time of liquidation, T, is considered a random variable and so the law of total probability leads us to The performance measure Π ranges between 0 and 1. In the case that Π = 1, the strategy is always better than the benchmark, where "always" means "with probability 1," i.e., almost surely. This can be considered a limiting case, which usually does not appear in real life because otherwise the investor would have a weak arbitrage opportunity (Frahm 2016). 7 By contrast, if we have that Π = 0, the benchmark turns out to be always at least as good as the strategy, which does not mean that the benchmark is always better. 8 More generally, if the strategy outperforms its benchmark with probability Π, the benchmark outperforms the strategy with probability less than or equal to 1 − Π. Therefore, the OP is not symmetric. Finally, if Π is near 0.5, there is no clear recommendation in favor or against the strategy. However, if a fund manager claims to be better than the benchmark, we should expect Π to be greater than 0.5.
We make the basic assumption that the time of liquidation does not depend on whether or not the strategy outperforms the benchmark, viz.
Put another way, the decision of the investor whether or not to terminate the strategy at time t does not depend on its performance relative to the benchmark. It follows that where F represents the cumulative distribution function of the holding period T.

Theoretical Properties
In this work, we rely on the standard model of financial mathematics, i.e., we suppose that {S t } and {B t } follow a 2-dimensional geometric Brownian motion. This means that the stochastic processes obey the stochastic differential equations The numbers γ S and γ B are called the growth rates of {S t } and {B t }, respectively (Frahm 2016). Thus, we have that Hence, the random variable X represents the relative log-return on the strategy with respect to its benchmark after the first period of time. We implicitly assume that More precisely, the strategy would dominate the benchmark in the sense of Merton (1973).
in order to avoid the case in which X is degenerate, i.e., constant with probability 1. In our Brownian-motion framework, this happens if and only if Π ∈ 0, 1 . It follows that where Φ is the cumulative distribution function of the standard normal distribution. This leads us to which demonstrates that the OP essentially depends on the diffusion coefficient σ S and not only on the drift coefficient µ S of the considered strategy. For example, if we choose the money-market account as a benchmark, we obtain µ B = r and σ B = 0, where r is the instantaneous risk-free interest rate per year. In this standard case, the OP amounts to In general, the benchmark is risky and we can see that the inverse coefficient of variation of the strategy with respect to its benchmark, i.e., plays a crucial role when calculating the OP. Note that the ICV refers to the differential log-return X = log(S 1 ) − log(B 1 ), not to the excess log-return log(S 1 ) − r. Now, we are ready for our first theorem. Its proof has already been given throughout the previous explanations and so it can be skipped.
Theorem 1 (Outperformance probability). Under the aforementioned assumptions, the OP is given by Hence, Π is strictly increasing in ICV. This means that the higher the ICV the higher the OP. 9 However, under the given assumptions, the OP can never be 0 or 1.
For example, suppose that the holding period is fixed and equal to T > 0. In this case, the OP simply amounts to Π = Φ √ T ICV . By contrast, if the holding period is uniformly distributed between 0 and M > 0, we obtain Further, if the holding period is exponentially distributed with parameter λ > 0, we have that Thus, if a strategy has a higher OP than another strategy, given some holding-time distribution, the same holds true for any other holding-time distribution.
and if we assume that T has a Weibull distribution with parameters κ > 0 and γ > 0, we obtain Note that for κ = 1 and γ = λ −1 , the Weibull distribution turns into an exponential distribution with parameter λ. It is evident that the spectrum of possibilities to capture the distribution of T is almost infinite. In this work, we restrict to the aforementioned holding-time distributions. Obviously, F need not be continuous or even differentiable. The first holding-time distribution, which asserts that T is a constant, is deterministic, which poses no problem at all, too. Figure 2 depicts the densities of the uniform, exponential, and Weibull distribution for different parameterizations. We have chosen all parameters such that the mean holding period equals 5 years. Moreover, Figure 3 contains some curves that quantify the OP depending on the ICV. On the left-hand side we can see how Π depends on ICV if the holding period T is considered a constant, which is the usual assumption in the finance literature, whereas on the right-hand side it is assumed that T is uniformly distributed between 0 and M, i.e., the maximum number of years that the investor maintains the given strategy.   Figure 3: OP depending on the ICV and the (maximal) holding period. On the left-hand side, the holding period T is considered fixed, whereas on the right-hand side, T is supposed to be uniformly distributed between 0 and M years.

Statistical Inference
Here, we propose a parametric estimator for Π, based on the assumption that the value processes  Figure 3: OP depending on the ICV and the (maximal) holding period. On the left-hand side, the holding period T is considered fixed, whereas on the right-hand side, T is supposed to be uniformly distributed between 0 and M years.

Statistical Inference
Here, we propose a parametric estimator for Π, based on the assumption that the value processes . Outperformance probability (OP) depending on the inverse coefficient of variation (ICV) and the (maximal) holding period. On the left-hand side, the holding period T is considered fixed, whereas on the right-hand side, T is supposed to be uniformly distributed between 0 and M years.

Statistical Inference
Here, we propose a parametric estimator for Π, based on the assumption that the value processes {S t } and {B t } follow a 2-dimensional geometric Brownian motion. Let be the difference between the log-return on the strategy and the log-return on the benchmark at time t.
The period between t − ∆ and t, i.e., ∆ > 0, is fixed. Throughout this work, we assume that one year consists of 252 trading days and so we have that ∆ = 1/252. The ML-estimator for the ICV is Our first proposition asserts that the ML-estimator for the ICV is strongly consistent.
Proof. The strong law of large numbers implies that as n → ∞. Hence, the continuous mapping theorem immediately reveals that ICV n converges almost surely to ICV as the number of observations, n, grows to infinity.
The next proposition refers to the asymptotic distribution of √ n ICV n − ICV .
Proposition 2. We have that Proof. The ICV can be treated like a Sharpe ratio and thus 1 + ICV 2 /2 is the asymptotic variance of √ n ICV n − ICV in the case that ∆ = 1 (Frahm 2018, p. 6).
Hence, provided that the sample size n is large, the standard error of ICV n is Our empirical study in Section 3 reveals that the ICV does not exceed 1 and, since the sample size n is large, we can ignore the term ICV 2 /(2n). 10 Note also that m := n∆ corresponds to the sample length measured in years. Thus, we obtain the following nice rule of thumb: The ML-estimator for Π is obtained, in the usual way, by substituting ICV with ICV n .
Definition 2 (ML-estimator). The ML-estimator for Π is given by The following theorem asserts that Π n is strongly consistent.
Proof. From Proposition 1 we already know that ICV n converges almost surely to ICV as n → ∞. According to Equation (1), we have that and it is clear that Π is continuous in ICV. Hence, the continuous mapping theorem reveals that Π n converges almost surely to Π as the number of observations, n, grows to infinity.
The next theorem provides the asymptotic distribution of √ n Π n − Π , which can be used in order to calculate confidence intervals or conduct hypotheses tests. For this purpose, we need only assume that the expected value of √ T is finite. 11 In fact, in most practical applications we even have that E(T) < ∞, which holds true also for all holding-time distributions that are considered in this work. Hence, the moment condition is always satisfied.
where φ is the probability density function of the standard normal distribution.
Thus, we can apply the dominated convergence theorem, i.e., Now, by applying the delta method and using Proposition 2, we obtain which leads us to the desired result.
Hence, provided that n is sufficiently large, we can apply Theorem 3 in order to approximate the standard error of Π n by Figure 4 quantifies the standard error of Π n as a function of m, i.e., the years of observation, for different ICVs. Note that Std Π n is an even function of ICV and so we may focus on ICV ≥ 0. On the left-hand side it is assumed that T equals 5 years, whereas on the right-hand side T is supposed to be uniformly distributed between 0 and 10 years. Obviously, the specific choice of the holding-time distribution is not essential for the standard errors. They are high even if the sample length, m, is big. This is a typical problem of performance measurement (Frahm 2018). Further, we can see that the standard error of the ML-estimator essentially depends on the ICV. To be more precise, the higher the ICV the lower the standard error.  Figure 4: Standard error of Π n depending on the ICV. On the left-hand side, the holding period T is considered fixed and is equal to 5 years, whereas on the right-hand side, T is supposed to be uniformly distributed between 0 and 10 years.
A typical null hypothesis is H 0 : Π ≤ Π 0 ∈ (0, 1), which can be tested by means of the p-value More precisely, given some significance level α, H 0 can be rejected if and only if p n < α, in which case Π turns out to be significantly larger than Π 0 .

Discussion
Classical performance measures are based on the Capital Asset Pricing Model (CAPM). The most prominent performance measures are the Sharpe ratio (Sharpe, 1966), the Treynor ratio (Treynor, 1961), and Jensen's Alpha (Jensen, 1968). Let R be the discrete return on some strategy after a predefined period of time. Further, let r be the discrete risk-free interest rate after the given holding period and R M be the return on the market portfolio. On the left-hand side, the holding period T is considered fixed and is equal to 5 years, whereas on the right-hand side, T is supposed to be uniformly distributed between 0 and 10 years.
A typical null hypothesis is H 0 : Π ≤ Π 0 ∈ (0, 1), which can be tested by means of the p-value More precisely, given some significance level α, H 0 can be rejected if and only if p n < α, in which case Π turns out to be significantly larger than Π 0 .

Discussion
Classical performance measures are based on the Capital Asset Pricing Model (CAPM). The most prominent performance measures are the Sharpe ratio (Sharpe 1966), the Treynor ratio (Treynor 1961), and Jensen's Alpha (Jensen 1968). Let R be the discrete return on some strategy after a predefined period of time. Further, let r be the discrete risk-free interest rate after the given holding period and R M be the return on the market portfolio.

Consider the linear regression equation
where α is the Alpha, β is the Beta, and ε represents the unsystematic risk of the strategy. Further, let σ 2 ε be the variance of ε.
An actively managed fund aims at beating its benchmark. Let us suppose that the benchmark is the market portfolio. If the fund manager performs a strategy with β = 1, we have that and so we obtain the differential return R − R M = α + ε. Hence, if we want to know whether or not the fund is able to beat the market portfolio, we should analyze the differential return R − R M and not the excess return R − r. More precisely, we should calculate the generalized Sharpe ratio according to Sharpe (1994), i.e., and not the ordinary Sharpe ratio (Sharpe 1966), i.e., which is based on the CAPM (Lintner 1965;Mossin 1966;Sharpe 1964). What happens if the fund's Beta does not equal 1? In this case we should compare the fund with a benchmark strategy that possesses the Beta of the fund. Let β be the proportion of equity that is invested in the market portfolio and 1 − β be the proportion of equity that is deposited in the money market. The corresponding (benchmark) return is thus Hence, we obtain the differential return R − R B = α + ε, which leads us to the same result as before, i.e., Sh 1994 = α/σ ε .
Why is it so important to consider differential returns instead of excess returns if we want to understand whether or not a given strategy is able to outperform its benchmark? The most simple illustration goes like this: Consider two random variables X and Y. The probability that X is positive is P(X > 0) and the probability that Y is positive is P(Y > 0). Suppose that we want to know the probability that X exceeds Y. Then we would calculate P(X > Y) = P(X − Y > 0) but not P(X > 0) − P(Y > 0) and the same holds true for performance measurement.
This means that in order to judge whether or not a strategy is able to outperform its benchmark we should not compare one ordinary Sharpe ratio with another. Ordinary Sharpe ratios are calculated on the basis of excess returns and so they answer the question of how good a given investment opportunity is able to beat the money-market account. Analogously, by comparing P(X > 0) with P(Y > 0) we answer the question of whether it is more probable that X or that Y exceeds 0. However, in performance measurement we want to know whether or not a strategy is able to outperform its benchmark, which is a completely different question and it has nothing to do with the money-market account. Analogously, we have to calculate the probability that X exceeds Y and not compare P(X > 0) with P(Y > 0).
This shall be illustrated, in financial terms, by a simple thought experiment. Suppose, for the sake of simplicity but without loss of generality, that the risk-free interest rate is zero. Further, let us assume that the return on some mutual fund is R = R M + ε, where ε is a positive random variable that is uncorrelated with R M . 12 Hence, the fund always yields a better return than the market portfolio and so it is clearly preferable. The ordinary Sharpe ratio of the market portfolio is where µ M > 0 is the expected return on the market portfolio and σ M > 0 represents the standard deviation of the return on the market portfolio. Thus, we have that Sh 1966 M > 0. By contrast, the ordinary Sharpe ratio of the strategy is Sh 1966 where µ ε > 0 is the expected value and σ 2 ε > 0 is the variance of ε. We conclude that also Sh 1966 S is positive. Now, depending on µ ε and σ 2 ε , we can construct situations in which the fund appears to be better than the market portfolio and other situations in which it appears to be worse. To be more precise, we can find a positive random variable ε with a fixed expectation and an arbitrarily high variance. For example, suppose that ε = γ + δξ, where γ, δ > 0 and ξ is a random variable such that P(ξ = 0) = 1 − p and P(ξ = 1/p) = p > 0. Hence, we have that E(ε) = γ + δ and Var(ε) = δ 2 (1 − p)/p, which means that Sh 1966 S 0 as p 0. Put another way, it holds that Sh 1966

S < Sh 1966
M if we make p sufficiently low. Nonetheless, the OP of the fund is which means that the fund outperforms the market portfolio almost surely! It is evident that in such a case every rational investor should prefer the fund. Why do we come to such a misleading conclusion by comparing the two ordinary Sharpe ratios with one another? The reason is that we analyze the marginal distributions of R and R M instead of analyzing their joint distribution. In fact, since ε is positive, it is better for the investor to have a large variance σ 2 ε , not a small one. Thus, it makes no sense to penalize the expected return on the fund by σ 2 ε , but this is precisely what we do when we calculate Sh 1966 S . However, it is worth noting that, in this particular case, it makes no sense either to calculate the generalized Sharpe ratio The problem is that ε is positive, which means that σ ε is the wrong risk measure in that context. 13 However, irrespective of the parameters µ ε and σ ε , we always have that Π = 1, which clearly indicates that the fund is better than its benchmark. We could imagine plenty of other examples that are less striking, but the overall conclusion remains the same: One should use differential (log-)returns instead of excess (log-)returns when comparing some strategy with a benchmark.
The main goal of this work is to provide a performance measure that is better understandable for a nonacademic audience. The OP is based on the ICV, which compares the log-return on the strategy with the log-return on the benchmark. Thus, it is compatible with Sharpe's (Sharpe 1994) principal argument and so we hope that it can serve its purpose in the finance industry. Most investors want to know whether they can beat a risky benchmark, not the money-market account. It is worth emphasizing that the OP is based on a continuous-time model and we do not make any use of capital market theory. For this reason, the OP cannot be compared with performance measures that are based on a one-period model without making any simplifying assumption and thus sacrificing the merits of continuous-time finance. This holds true, in particular, for the classical performance measures mentioned above. The reason why we use log-returns instead of discrete returns is because our model is time-continuous. More precisely, it is build on the standard assumption that the value processes follow a geometric Brownian motion. It is worth noting that return-to-risk measures can change essentially when substituting log-returns with discrete returns or vice versa-even if we work with daily asset (log-)returns.
The classical performance measures are based on the CAPM and so they are vulnerable to a model misspecification. On the contrary, the OP is not based on any capital-market model-it is a probabilistic measure. Thus, it is model-independent although we assume that the value processes of the strategy and of the benchmark follow a 2-dimensional geometric Brownian motion. This assumption is made only for statistical reasons and it has nothing to do with the OP itself (see Definition 1). Hence, it is possible to find nonparametric estimators for the OP, which are not based on any parametric model regarding the value processes of the strategy and of the benchmark. We plan to investigate such kind of estimators in the future.
Nonetheless, in this work we deal with the ML-estimator for Π and this estimator is not unimpeachable:

•
It is based on a parametric model, namely the geometric Brownian motion. While this model is standard in financial mathematics, we may doubt that value processes follow a geometric Brownian motion in real life. The assumption that log-returns are independent and identically normally distributed contradicts the stylized facts of empirical finance.

•
We assume that the time of liquidation does not depend on whether or not the strategy outperforms the benchmark. This assumption might be violated, for example, if investors suffer from the disposition effect, i.e., if they tend to sell winners too early and ride losers too long (Shefrin and Statman 1985).
• Also the holding-time distribution follows a parametric model, which can either be true or false, too. Indeed, we have to choose some model for T but need not necessarily know its parameters. However, in order to estimate the parameters of F, we would need appropriate data and then the statistical properties of the ML-estimator might change essentially.
The reader might ask whether we want to suggest a better alternative to return-to-risk measures from a pure theoretical perspective or to provide a more illustrative performance measure for the asset-management industry. The answer is that we aim at both. However, it is worth emphasizing that we do not try to convince the audience that return-to-risk measures are useless or fallacious per se. In fact, the OP itself is based on a return-to-risk measure, namely the ICV, and we will see that this is the most important ingredient of the OP. The question is whether one should use two return-to-risk measures or just one in order to compare a strategy with some benchmark. We decidedly propagate the latter approach and refer to the arguments raised by Sharpe (1994). Simply put, performance measures are not linear and so the difference between the performance of two investment opportunities is not the same as the performance of the first investment opportunity compared to the second. Our own return-to-risk measure is the ICV and we transform the ICV into a probability by treating the time of liquidation stochastic. This is done in order to account for the individual liquidity preference of the investor and, especially, for a better practical understanding.

General Observations
Actively managed funds usually aim at beating some benchmark, typically a stock-market index. Our financial markets and computer technology provide many possibilities in order to pursue that target. For example, we could think about stock picking, market timing, portfolio optimization, technical analysis, fundamental analysis, time-series analysis, high-frequency trading, algorithmic trading, etc. The list of opportunities is virtually endless, but the problem is that most sophisticated trading strategies produce high transaction costs and other expenses like management fees and agency costs. The principal question is whether or not actively managed funds are able to beat their benchmarks after taking all expenses into account. Jensen (1968) reports that the 115 (open-end) mutual funds that he considered in his study, which covers the period from 1945 to 1964, "were on average not able to predict security prices well enough to outperform a buy-the-market-and-hold policy." He states also that "there is very little evidence that any individual fund was able to do significantly better than that which we expected from mere random chance." By contrast, Ippolito (1989) investigated 143 mutual funds over the period from 1965 to 1984 and finds that "mutual funds, net of all fees and expenses, except load charges, outperformed index funds on a risk-adjusted basis." This is in contrast to many other studies, but he emphasizes that "the industry alpha, though significantly positive, is not sufficiently large to overcome the load charges that characterize the majority of funds in the sample." Hence, after taking sales charges or commissions into account, the Alpha of mutual funds disappears. Further, Grinblatt and Titman (1989) considered the sample period from 1974 to 1984. They write that "superior performance may in fact exist, particularly among aggressive-growth and growth funds and those funds with the smallest net asset values." However, "these funds also have the highest expenses so that their actual returns, net of all expenses, do not exhibit abnormal performance." Hence, they come to the conclusion that "investors cannot take advantage of the superior abilities of these portfolio managers by purchasing shares in their mutual funds." Finally, Fama and French (2010) considered the period from 1984 to 2006 and they conclude that "fund investors in aggregate realize net returns that underperform three-factor, and four-factor benchmarks by about the costs in expense ratios," which means that "if there are fund managers with enough skill to produce benchmark-adjusted expected returns that cover costs, their tracks are hidden in the aggregate results by the performance of managers with insufficient skill." To sum up, it seems to be better for the market participants to track a stock-market index or to choose the market portfolio.
There exists a large number of passively managed ETFs on the market, which aim at replicating a specific stock-market index like the S&P 500 or the Russell 1000. Elton et al. (1996) vividly explain why it makes sense to compare actively managed funds with index ETFs, and not with stock-market indices, in order to analyze whether or not fund managers are able to beat the market: "The recent increase in the number and types of index funds that are available to individual investors makes this a matter of practical as well as theoretical significance. Numerous index funds, which track the Standard and Poor's (S&P) 500 Index or various small-stock, bond, value, growth, or international indexes, are now widely available to individual investors. [. . . ] Given that there are sufficient index funds to span most investors' risk choices, that the index funds are available at low cost, and that the low cost of index funds means that a combination of index funds is likely to outperform an active fund of similar risk, the question is, why select an actively managed fund?" Before the first index ETF, i.e., the SPDR S&P 500 ETF (SPY), came up in 1993, authors had to think about how a private investor would replicate a stock index by himself. That is, they had to take all transaction costs, exchange fees, taxes, time and effort, etc., into account. Index ETFs solved that problem all at once. There are no loading charges and, except for minor broker fees and very low transaction costs, private investors do not have to care about anything in order to track a stock index. Many index funds are very liquid and their tracking error seems to be very small. Thus, it is clear why investors usually are not able or willing to replicate a stock-market index by themselves, which is the reason why we treat index ETFs as benchmarks in this work.
Compared to actively managed funds, index ETFs have much lower expense ratios, but their performance might be worse. In order to clarify whether or not the additional costs of an actively managed fund are overcompensated by a better performance, we compare 10 mutual funds with index ETFs. Most of the funds contain US large-cap stocks and have 1 to 60 billion US dollars of assets under management. In any case, they try to outperform the S&P 500 or the Russell 1000. Hence, our first benchmark is the SPDR S&P 500 ETF. This is almost equivalent to the S&P 500 total return stock index, which serves as a proxy for the overall US stock market. The second benchmark is the iShares Russell 1000 Growth ETF (IWF). That ETF seeks to track the Russell 1000, which is composed of large-and mid-cap US equities exhibiting growth characteristics. In order to reflect the US mid-term treasury-bond market, we chose the iShares 7-10 Year Treasury Bond ETF (IEF). This ETF is composed of US treasury bonds with remaining maturities between 7 and 10 years, including only AAA-rated securities. Our next benchmark is the US money-market account, based on the London Inter-bank Offered Rate (LIBOR) for overnight US-dollar deposits. Our last benchmark is cash, without any interest or risk, i.e., "keeping the money under the mattress." In this special case, the OP quantifies the probability that the final value of the strategy exceeds its initial value, i.e., Π = P(S T > 1). Hence, to outperform cash just means to generate a positive return, which is the least one should expect when investing money.
Our empirical analysis is based on daily price observations from 2 January 2003, to 31 December 2018. Hence, we take also the financial crisis 2008 into account. The length of that history amounts to n = 4027 trading days. All ETFs and mutual funds, together with their symbols, their estimated drift coefficients (μ), diffusion coefficients (σ), growth rates (γ), expense ratios (ER), and their assets under management (AUM) are presented in Table 1. It is separated into two parts: The first part shows the benchmarks, whereas the second part contains the mutual funds. We can see that the expense ratios of the mutual funds are up to 90 basis points higher compared to the S&P 500 ETF (SPY). On average, investors pay 67 basis points more for a mutual fund compared to the S&P 500 ETF and 56 basis points more compared to the Russell 1000 ETF (IWF). MSEGX possesses the highest drift coefficient and growth rate, 14 whereas FKGRX has the lowest volatility, i.e., diffusion coefficient, among all mutual funds. PINDX has the lowest drift coefficient and growth rate but the second highest volatility. While the stock-index ETFs are hard to distinguish according toμ andσ, it should be noted that the Russell 1000 ETF has a slightly higher estimated growth rate,γ, compared to the S&P 500 ETF. Table 1. Basic characteristics of the benchmarks and funds that are taken into consideration, i.e., the estimated drift coefficients (μ), diffusion coefficients (σ), growth rates (γ) as well as their expense ratios (ER) and assets under management (AUM). However, the quantitiesμ,σ, andγ are subject to estimation errors and it is well-known that the estimation errors regarding µ and γ are much higher than those regarding σ. Even if we work with daily observations, they cannot be neglected. Hence, Table 1 gives no clear recommendation in favor or against the mutual funds if we compare the quantities of the index ETFs and the mutual funds with each other. On average, the annualized drifts, volatilities, and growth rates of the mutual funds are close to those of the S&P 500 ETF and the Russell 1000 ETF. Does the overall picture change if we evaluate the ordinary Sharpe ratios of the mutual funds? Table 2 contains the ordinary and the generalized Sharpe ratios of all mutual funds, on an annualized basis, with respect to our two stock-market benchmarks and the bond market. We can see that 9 out of all 10 mutual funds have an ordinary Sharpe ratio that is greater than the (ordinary) Sharpe ratio of the S&P 500 ETF. However, 7 mutual funds possess a Sharpe ratio that is lower than the Sharpe ratio of the Russell 1000 ETF. Hence, we could believe that most fund managers are not able to beat that benchmark. Moreover, except for FKGRX, all funds are inferior to the US treasury-bond ETF. This would suggest that it is better to refrain from the stock market at all and to prefer mid-term US treasury bonds. This conclusion is fallacious. Indeed, the overall picture changes completely when we consider the generalized Sharpe ratios. In fact, 9 out of all 10 mutual funds have a positive Sharpe ratio with respect to the S&P 500 ETF, which means that almost all funds were able to beat this benchmark. Moreover, 7 funds have a positive Sharpe ratio with respect to the Russell 1000 ETF and so the majority of the funds have beaten also that benchmark. Finally, since all Sharpe ratios with respect to the US treasury-bond ETF are positive, it is always better to prefer stocks to bonds. Hence, the given results are entirely different and our conclusion turns into the opposite! The estimated ICV of a fund tells us how the fund manager performed against some benchmark and, according to Equation (1), it essentially determines the (estimated) OP. Table 3 contains the ICVs of the 10 mutual funds with respect to our 5 benchmarks. This table confirms our latter results. This means that 9 out of all 10 mutual funds, in fact, were able to outperform the S&P 500 ETF. Further, 7 funds outperformed the Russell 1000 ETF. The (uniformly) worst performer is PINDX, which is never preferable compared to the index ETFs. The best performer with respect to the index ETFs is TRBCX, whose track record is extraordinarily good. We conclude that most funds are preferable compared to the index ETFs, which was not clear to us after inspecting the ordinary Sharpe ratios in Table 2. It is worth noting that all mutual funds perform much better if we compare them with the S&P 500 ETF rather than the Russell 1000 ETF. Table 3 reveals also that mutual funds are clearly superior to bonds, the money market, and cash. Table 3. ICVs of the mutual funds with respect to the benchmarks. The standard errors can be approximated by 1/ √ m = 0.2502 (see Equation (2)). Clearly, TRBCX is the best performer among all mutual funds and so it is evident why it has, by far, the largest amount of assets under management (over 60 billion US dollars). Figure 5 plots the ICV of each mutual fund, both with respect to the S&P 500 and with respect to the Russell 1000, against the natural logarithm of its assets under management. It turns out that the ICV of a fund is highly correlated with the log-amount of assets under management. To be more precise, the correlation between ICV and log(AUM) amounts to 71.22% if we refer to the S&P 500 and to 65.67% if we refer to the Russell 1000. However, we can observe two outliers, namely MSEGX and PGFAX, which performed well between 2003 and 2018 compared to their assets under management.   2).

Symbol
Russell 1000, against the natural logarithm of its assets under management. It turns out that the ICV of a fund is highly correlated with the log-amount of assets under management. To be more precise, the correlation between ICV and log(AUM) amounts to 71.22% if we refer to the S&P 500 and to 65.67% if we refer to the Russell 1000. However, we can observe two outliers, namely MSEGX and PGFAX, which performed well compared to their assets under management. Our results demonstrate that it is highly important to consider differential returns and not excess returns if we want to compare some strategy with a specified benchmark. By comparing the ordinary Sharpe ratio of some strategy with the ordinary Sharpe ratio of a given benchmark 26 Figure 5. ICV against the natural logarithm of the assets under management of the mutual funds with respect to the S&P 500 (left) and the Russell 1000 (right).
Our results demonstrate that it is highly important to consider differential returns and not excess returns if we want to compare some strategy with a specified benchmark. By comparing the ordinary Sharpe ratio of some strategy with the ordinary Sharpe ratio of a given benchmark (see Table 1) one checks which of the two is better able to outperform the money-market account. However, this is not what we want to know when we try to understand whether or not a strategy is able to outperform its benchmark. For this purpose we have to calculate the generalized Sharpe ratios in Table 2 (Sharpe 1994) or the ICVs in Table 3.
Our arguments shall be clarified also from a statistical point of view: Let Sh Sn be the empirical estimator for the annualized Sharpe ratio of the strategy and Sh Bn be the empirical estimator for the annualized Sharpe ratio of the benchmark. Further, let ∆Sh := Sh S − Sh B be the difference between the annualized Sharpe ratio of the strategy, Sh S , and the annualized Sharpe ratio of the benchmark, i.e., Sh B . Then one uses the test statistic ∆ Sh n := Sh Sn − Sh Bn in order to test the null hypothesis H 0 : ∆Sh ≤ 0. Our results are presented on the right-hand side of Table 4, which contains also the corresponding p-values for H 0 in parentheses. By contrast, the left-hand side of Table 4 contains the estimated ICVs and their corresponding p-values for the null hypothesis H 0 : ICV ≤ 0. 15 We can see that most p-values on the left-hand side are smaller than those on the right-hand side. This holds true in particular when we compare the funds with the Russell 1000 ETF and the US treasury-bond ETF. Hence, using the ICV instead of comparing two Sharpe ratios with one another in order to test whether a strategy outperforms its benchmark leads to results that are more significant in an economic sense.
To be more precise, in the majority of cases, the estimated ICVs are positive and their p-values are low enough in order to conclude that most fund managers are able to beat their benchmarks. However, the given results are still insignificant in a statistical sense.

Empirical Results
In this section, we present our estimates of the OPs for different holding-time distributions. More precisely, we consider a fixed, a uniformly distributed, an exponentially distributed, and a Weibull distributed holding period. In order to make the results comparable, we chose the parameters such that the mean holding period equals 5 years for each holding-time distribution.
Our principal question is whether or not actively managed funds are able to beat their benchmarks, which can be answered in a very simple way by estimating their OPs. In the case that Π ≤ 0.5, we can say that the fund manager is not able to beat the specified benchmark. 16 Then it makes not much sense for an investor to buy the fund share because, with probability greater or equal to 50%, the benchmark performs at least as good as the mutual fund. Hence, our null hypothesis is H 0 : Π ≤ 0.5. Of course, we have to take estimation errors into account and so we report the estimated OPs along with their standard errors and p-values.

Fixed Holding Period
Let us begin by considering the holding period T > 0 fixed. In this case, the ML-estimator for the OP is simply Π n = Φ √ T ICV n . Further, its standard error is, approximately, Table 5 contains the results if we assume that the holding period equals 5 years. We can see that 9 out of the 10 mutual funds that are taken into consideration are able to outperform the S&P 500 and 7 funds can outperform the Russell 1000. In fact, this follows also from Table 3 because the OP is greater than 0.5 if and only if ICV > 0. Nonetheless, Table 5 provides some additional information. It reveals that the OPs typically range between 0.5 and 0.7 with regard to the index ETFs. An apparent exception is the T. Rowe Price Blue Chip Growth Fund (TRBCX), which has a tremendous OP of 0.7515 with respect to the S&P 500. The worst fund is the Pioneer Disciplined Growth Fund (PINDX), which underperforms both the S&P 500 and the Russell 1000 with a quite small OP of 0.4337 and 0.3400, respectively.
Despite the fact that most funds outperform their stock-market benchmarks, the results are not statistically significant. All p-values that are related to the index ETFs exceed α = 0.05. Only TRBCX comes very close to α with a p-value of 0.0779, given that we use the S&P 500 ETF as a benchmark. Of course, this does not mean that we have that Π ≤ 0.5 for any fund. Performance measurement is very susceptible to estimation risk (Frahm 2018) and thus, in that context, we cannot expect to obtain statistically significant results. Nevertheless, we may say that the results are, at least, economically significant. This was surprising to us, since in the finance community it is usually asserted that fund managers, in general, do not better than their benchmarks. This was already discussed in Section 3.1.
All mutual funds outperform the bond market, but these results are not significant either. However, most funds at least outperform the money-market account and cash on a significance level of α = 0.01, which can be seen by the p-values at the end of Table 5. The corresponding OPs usually exceed 0.8, which means that the probability of generating some excess return or of making any profit at the time of liquidation is greater than 80%. Once again, an exception is PINDX, whose OPs with respect to the money-market account and cash are lower than 80%.

Uniformly Distributed Holding Period
Now, we assume that the investor has a holding period that is uniformly distributed between 0 and M years. The ML-estimator for Π is and its standard error can be approximated by Table 6 contains the results if we assume that the holding period is uniformly distributed between 0 and 10 years. As we can see, the given results are not essentially different compared to those in Table 5. However, it is conspicuous that the OPs are slightly lower. The same effect can be observed also in Figure 3 if we compare the OPs on the left-hand side for T = 5 with the OPs on the right-hand side for M = 10. The reason is that liquidating a strategy with a positive ICV prematurely lowers the OP. The problem is that later liquidations cannot compensate for earlier ones if the strategy is profitable compared to its benchmark, i.e., ICV > 0.
While the OPs in Table 6 are slightly lower compared to those in Table 5, we can verify that all funds are still better than the bond market. The majority of funds are even significantly better than the money-market account and cash on a level of 1%. To sum up, in general, it is preferable to invest in mutual funds also if the holding period of the investor is uniformly distributed. However, there are noticeable differences regarding the performance of the mutual funds, which means that investors should be aware that some funds perform much better than others. The question of whether or not the estimated OP of a fund is a persistent performance measure is not answered in this study. If it is persistent, i.e., if it can be used as an ex-ante and not only as an ex-post performance measure, investors should clearly prefer funds with a high estimated OP.

Exponentially Distributed Holding Period
In this section, we discuss the results that are obtained by assuming that the holding period is exponentially distributed. This distributional assumption might be considered more realistic than a fixed or a uniformly distributed holding period. It assigns a high probability to early liquidations and a low probability to late ones. The exponential distribution has no finite right endpoint, which means that T is unbounded above. Hence, this distribution allows for large holding periods. However, the survival probability P(T > t) decreases exponentially with t.
The ML-estimator for the OP is where λ > 0 is the parameter of the exponential distribution. The standard error of the ML-estimator can be approximated by We set the parameter λ to 0.2, which means that the mean holding period equals 5 years. The results are given by Table 7. As we can see, the OPs are, once again, slightly lower than those in Table 6 and thus also lower than those in Table 5. The reason is that the exponential distribution assigns earlier liquidations a substantially higher probability than later ones (see Figure 2). The overall conclusion does not differ compared to our previous findings, but it is worth noting that the standard errors and thus also the p-values are somewhat lower than in Tables 5 and 6. Now, we have a p-value of 0.0675 for TRBCX against the S&P 500 ETF, which comes very close to the significance level of 5%. In any case, the mutual funds still outperform the bond market and the riskless investments, i.e., the money market and cash.

Weibull Distributed Holding Period
Finally, we report the results that are obtained by assuming that the holding period is Weibull distributed. The Weibull distribution generalizes the exponential distribution and allows us to consider unimodal holding-time distributions with mode greater than 0 (see Figure 2). The corresponding ML-estimator for the OP is and its standard error can be approximated by In our empirical study, the parameters of the Weibull distribution are κ = 2 and γ = 5.6419. Once again, this leads us to a mean holding period of 5 years. The results are contained in Table 8. Now, we can observe that the OPs are close to those that are obtained with respect to the index ETFs by assuming that T = 5 is fixed (see Table 5). However, the OPs with respect to the US treasury-bond ETF, the money-market account, and to cash are slightly lower than in Table 5 but still higher than in Tables 6 and 7. Nonetheless, we can see that the given results are not very much driven by our specific assumption about the distribution of the holding period. It turns out that the OP is essentially determined by the ICV, not by the holding-time distribution, provided that the considered holding-time distributions have the same mean.

Conclusions
We propose the OP as a new performance measure, especially for a nonacademic audience. It differs in many aspects from return-to-risk measures: (i) The OP compares some strategy with a specified benchmark, not necessarily with the money-market account. (ii) It is easy to understand for people who are not educated in statistics or probability theory. (iii) The holding period of the investor is considered random. In our Brownian-motion framework, the OP is essentially determined by the ICV, i.e., the expected annual relative log-return on the strategy with respect to its benchmark divided by the standard deviation of that log-return. By contrast, the choice of the holding-time distribution seems to be secondary, provided that the mean holding period is the same for all distributions.
The ML-estimator for the OP is strongly consistent and, if the square root of the holding period has a finite expectation, it is also asymptotically normally distributed under the usual standardization. The moment condition is very mild and allows us to take almost any holding-time distribution into consideration. Our asymptotic results enable us to calculate confidence intervals or to conduct hypotheses tests. The basic null hypothesis is that the OP does not exceed 50%, which means that the strategy does not outperform its benchmark. Our empirical results reveal that most mutual funds, in fact, are able to beat their benchmarks, although the results are not statistically significant. Nonetheless, they are economically significant. This might be surprising because it is a commonplace in the finance literature that most fund managers cannot beat the market. However, the crucial point is that one should refer to differential returns when comparing some strategy with a given benchmark, not to excess returns.
The best performer is the T. Rowe Price Blue Chip Growth Fund, whereas the worst one is the Pioneer Disciplined Growth Fund. It is evident that market participants prefer to invest their money in those funds that have a high ICV with respect to the S&P 500 or the Russell 1000. More precisely, the ICV with respect to any index ETF is highly correlated with the log-amount of assets under management of the mutual fund. This means that relative performance and capital allocation are strongly connected to one another, which suggests that market participants take differential returns implicitly into account when making their investment decisions.
Author Contributions: G.F. developed the theoretical part, whereas both authors are responsible for the empirical part of this work.
Funding: This research received no external funding.