Machine Learning for Algorithmic Trading
上QQ阅读APP看书,第一时间看更新

How to manage portfolio risk and return

Portfolio management aims to pick and size positions in financial instruments that achieve the desired risk-return trade-off regarding a benchmark. As a portfolio manager, in each period, you select positions that optimize persification to reduce risks while achieving a target return. Across periods, these positions may require rebalancing to account for changes in weights resulting from price movements to achieve or maintain a target risk profile.

The evolution of modern portfolio management

Diversification permits us to reduce risks for a given expected return by exploiting how imperfect correlation allows for one asset's gains to make up for another asset's losses. Harry Markowitz invented modern portfolio theory (MPT) in 1952 and provided the mathematical tools to optimize persification by choosing appropriate portfolio weights.

Markowitz showed how portfolio risk, measured as the standard deviation of portfolio returns, depends on the covariance among the returns of all assets and their relative weights. This relationship implies the existence of an efficient frontier of portfolios that maximizes portfolio returns given a maximal level of portfolio risk.

However, mean-variance frontiers are highly sensitive to the estimates of the inputs required for their calculation, namely expected returns, volatilities, and correlations. In practice, mean-variance portfolios that constrain these inputs to reduce sampling errors have performed much better. These constrained special cases include equal-weighted, minimum-variance, and risk-parity portfolios.

The capital asset pricing model (CAPM) is an asset valuation model that builds on the MPT risk-return relationship. It introduces the concept of a risk premium that an investor can expect in market equilibrium for holding a risky asset; the premium compensates for the time value of money and the exposure to overall market risk that cannot be eliminated through persification (as opposed to the idiosyncratic risk of specific assets).

The economic rationale for non-persifiable risk includes, for example, macro drivers of the business risks affecting all equity returns or bond defaults. Hence, an asset's expected return, E[ri], is the sum of the risk-free interest rate, rf, and a risk premium proportional to the asset's exposure to the expected excess return of the market portfolio, rm, over the risk-free rate:

In theory, the market portfolio contains all investable assets and, in equilibrium, will be held by all rational investors. In practice, a broad value-weighted index approximates the market, for example, the S&P 500 for US equity investments.

measures the exposure of asset, i, to the excess returns of the market portfolio. If the CAPM is valid, the intercept component, , should be zero. In reality, the CAPM assumptions are often not met, and alpha captures the returns left unexplained by exposure to the broad market.

As discussed in the previous chapter, over time, research uncovered non-traditional sources of risk premiums, such as the momentum or the equity value effects that explained some of the original alpha. Economic rationales, such as behavioral biases of under- or overreaction by investors to new information, justify risk premiums for exposure to these alternative risk factors.

These factors evolved into investment styles designed to capture these alternative betas that became tradable in the form of specialized index funds. Similarly, risk management now aims to control the exposure of numerous sources of risk beyond the market portfolio.

After isolating contributions from these alternative risk premiums, true alpha becomes limited to idiosyncratic asset returns and the manager's ability to time risk exposures.

The efficient market hypothesis (EMH) has been refined over the past several decades to rectify many of the original shortcomings of the CAPM, including imperfect information and the costs associated with transactions, financing, and agency. Many behavioral biases have the same effect, and some frictions are modeled as behavioral biases.

Modern portfolio theory and practice have evolved significantly over the last several decades. We will introduce several approaches:

  • Mean-variance optimization, and its shortcomings
  • Alternatives such as minimum-risk and 1/n allocation
  • Risk parity approaches
  • Risk factor approaches

Mean-variance optimization

Modern portfolio theory solves for the optimal portfolio weights to minimize volatility for a given expected return or maximize returns for a given level of volatility. The key requisite inputs are expected asset returns, standard deviations, and the covariance matrix.

How it works

Diversification works because the variance of portfolio returns depends on the covariance of the assets. It can be reduced below the weighted average of the asset variances by including assets with less than perfect correlation.

In particular, given a vector, , of portfolio weights and the covariance matrix, , the portfolio variance, , is defined as:

Markowitz showed that the problem of maximizing the expected portfolio return subject to a target risk has an equivalent dual representation of minimizing portfolio risk, subject to a target expected return level, . Hence, the optimization problem becomes:

Finding the efficient frontier in Python

We can calculate an efficient frontier using scipy.optimize.minimize and the historical estimates for asset returns, standard deviations, and the covariance matrix. SciPy 's minimize function implements a range of constrained and unconstrained optimization algorithms for scalar functions that output a single number from one or more input variables (see the SciPy documentation for more details). The code can be found in the strategy_evaluation subfolder of the repository for this chapter and implements the following sequence of steps:

First, the simulation generates random weights using the Dirichlet distribution and computes the mean, standard deviation, and SR for each sample portfolio using the historical return data:

def simulate_portfolios(mean_ret, cov, rf_rate=rf_rate, short=True):
    alpha = np.full(shape=n_assets, fill_value=.05)
    weights = dirichlet(alpha=alpha, size=NUM_PF)
    if short:
        weights *= choice([-1, 1], size=weights.shape)
    returns = weights @ mean_ret.values + 1
    returns = returns ** periods_per_year - 1
    std = (weights @ monthly_returns.T).std(1)
    std *= np.sqrt(periods_per_year)
    sharpe = (returns - rf_rate) / std
    return pd.DataFrame({'Annualized Standard Deviation': std,
                         'Annualized Returns': returns,
                         'Sharpe Ratio': sharpe}), weights

Next, we set up the quadratic optimization problem to solve for the minimum standard deviation for a given return or the maximum SR. To this end, we define the functions that measure the key performance metrics:

def portfolio_std(wt, rt=None, cov=None):
    """Annualized PF standard deviation"""
    return np.sqrt(wt @ cov @ wt * periods_per_year)
def portfolio_returns(wt, rt=None, cov=None):
    """Annualized PF returns"""
    return (wt @ rt + 1) ** periods_per_year - 1
def portfolio_performance(wt, rt, cov):
    """Annualized PF returns & standard deviation"""
    r = portfolio_returns(wt, rt=rt)
    sd = portfolio_std(wt, cov=cov)
    return r, sd

Next, we define a target function that represents the negative SR for scipy's minimize function to optimize, given the constraints that the weights are bounded by, [0, 1], and sum to one in absolute terms:

def neg_sharpe_ratio(weights, mean_ret, cov):
    r, sd = portfolio_performance(weights, mean_ret, cov)
    return -(r - rf_rate) / sd
weight_constraint = {'type': 'eq',
                     'fun': lambda x: np.sum(np.abs(x)) - 1}
def max_sharpe_ratio(mean_ret, cov, short=False):
    return minimize(fun=neg_sharpe_ratio,
                    x0=x0,
                    args=(mean_ret, cov),
                    method='SLSQP',
                    bounds=((-1 if short else 0, 1),) * n_assets,
                    constraints=weight_constraint,
                    options={'tol':1e-10, 'maxiter':1e4})

Then, we compute the efficient frontier by iterating over a range of target returns and solving for the corresponding minimum variance portfolios. To this end, we formulate the optimization problem using the constraints on portfolio risk and return as a function of the weights, as follows:

def min_vol_target(mean_ret, cov, target, short=False):
    def ret_(wt):
        return portfolio_returns(wt, mean_ret)
    constraints = [{'type': 'eq', 'fun': lambda x: ret_(x) - target},
                     weight_constraint]
    bounds = ((-1 if short else 0, 1),) * n_assets
    return minimize(portfolio_std, x0=x0, args=(mean_ret, cov),
                    method='SLSQP', bounds=bounds,
                    constraints=constraints,
                    options={'tol': 1e-10, 'maxiter': 1e4})

The solution requires iterating over ranges of acceptable values to identify optimal risk-return combinations:

def efficient_frontier(mean_ret, cov, ret_range):
    return [min_vol_target(mean_ret, cov, ret) for ret in ret_range]

The simulation yields a subset of the feasible portfolios, and the efficient frontier identifies the optimal in-sample return-risk combinations that were achievable given historic data.

Figure 5.2 shows the result, including the minimum variance portfolio, the portfolio that maximizes the SR, and several portfolios produced by alternative optimization strategies that we'll discuss in the following sections:

Figure 5.2: The efficient frontier and different optimized portfolios

The portfolio optimization can be run at every evaluation step of the trading strategy to optimize the positions.

Challenges and shortcomings

The preceding mean-variance frontier estimation illustrates in-sample, that is, backward-looking optimization. In practice, portfolio optimization requires forward-looking inputs and outputs. However, expected returns are notoriously difficult to estimate accurately. It is best viewed as a starting point and benchmark for numerous improvements.

The covariance matrix can be estimated somewhat more reliably, which has given rise to several alternative approaches. However, covariance matrices with correlated assets pose computational challenges since the optimization problem requires inverting the matrix. The high condition number induces numerical instability, which in turn gives rise to the Markovitz curse: the more persification is required (by correlated investment opportunities), the more unreliable the weights produced by the algorithm.

Many investors prefer to use portfolio-optimization techniques with less onerous input requirements. We will now introduce several alternatives that aim to address these shortcomings, including a more recent approach based on machine learning.

Alternatives to mean-variance optimization

The challenges with accurate inputs for the mean-variance optimization problem have led to the adoption of several practical alternatives that constrain the mean, the variance, or both, or omit return estimates that are more challenging, such as the risk parity approach, which we'll discuss later in this section.

The 1/N portfolio

Simple portfolios provide useful benchmarks to gauge the added value of complex models that generate the risk of overfitting. The simplest strategy—an equally-weighted portfolio—has been shown to be one of the best performers.

Famously, de Miguel, Garlappi, and Uppal (2009) compared the out-of-sample performance of portfolios produced by various mean-variance optimizers, including robust Bayesian estimators, portfolio constraints, and optimal combinations of portfolios, to the simple 1/N rule. They found that the 1/N portfolio produced a higher Sharpe ratio than the alternatives on various datasets, explained by the high cost of estimation errors that often outweighs the benefits of sophisticated optimization out of sample.

More specifically, they found that the estimation window required for the sample-based mean-variance strategy and its extensions to outperform the 1/N benchmark is around 3,000 months for a portfolio with 25 assets and about 6,000 months for a portfolio with 50 assets.

The 1/N portfolio is also included in Figure 5.2 in the previous section.

The minimum-variance portfolio

Another alternative is the global minimum-variance (GMV) portfolio, which prioritizes the minimization of risk. It is shown in Figure 5.2 and can be calculated, as follows, by minimizing the portfolio standard deviation using the mean-variance framework:

def min_vol(mean_ret, cov, short=False):
    return minimize(fun=portfolio_std,
                    x0=x0,
                    args=(mean_ret, cov),
                    method='SLSQP',
                    bounds=bounds = ((-1 if short else 0, 1),) * 
                          n_assets,
                          constraints=weight_constraint,
                          options={'tol': 1e-10, 'maxiter': 1e4})

The corresponding minimum volatility portfolio lies on the efficient frontier, as shown previously in Figure 5.2.

Global Portfolio Optimization – the Black-Litterman approach

The Global Portfolio Optimization approach of Black and Litterman (1992) combines economic models with statistical learning. It is popular because it generates estimates of expected returns that are plausible in many situations.

The technique assumes that the market is a mean-variance portfolio, as implied by the CAPM equilibrium model. It builds on the fact that the observed market capitalization can be considered as optimal weights assigned to each security by the market. Market weights reflect market prices that, in turn, embody the market's expectations of future returns.

The approach can thus reverse-engineer the unobservable future expected returns from the assumption that the market is close enough to equilibrium, as defined by the CAPM. Investors can adjust these estimates to their own beliefs using a shrinkage estimator. The model can be interpreted as a Bayesian approach to portfolio optimization. We will introduce Bayesian methods in Chapter 10, Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading Strategies.

How to size your bets – the Kelly criterion

The Kelly criterion has a long history in gambling because it provides guidance on how much to stake on each bet in an (infinite) sequence of bets with varying (but favorable) odds to maximize terminal wealth. It was published in a 1956 paper, A New Interpretation of the Information Rate, by John Kelly, who was a colleague of Claude Shannon's at Bell Labs. He was intrigued by bets placed on candidates at the new quiz show "The $64,000 Question," where a viewer on the west coast used the three-hour delay to obtain insider information about the winners.

Kelly drew a connection to Shannon's information theory to solve for the bet that is optimal for long-term capital growth when the odds are favorable, but uncertainty remains. His rule maximizes logarithmic wealth as a function of the odds of success of each game and includes implicit bankruptcy protection since log(0) is negative infinity so that a Kelly gambler would naturally avoid losing everything.

The optimal size of a bet

Kelly began by analyzing games with a binary win-lose outcome. The key variables are:

  • b: The odds defining the amount won for a $1 bet. Odds = 5/1 implies a $5 gain if the bet wins, plus recovery of the $1 capital.
  • p: The probability defining the likelihood of a favorable outcome.
  • f: The share of the current capital to bet.
  • V: The value of the capital as a result of betting.

The Kelly criterion aims to maximize the value's growth rate, G, of infinitely repeated bets:

When W and L are the numbers of wins and losses, then:

We can maximize the rate of growth G by maximizing G with respect to f, as illustrated using SymPy, as follows (you can find this in the kelly_rule notebook):

from sympy import symbols, solve, log, diff
share, odds, probability = symbols('share odds probability')
Value = probability * log(1 + odds * share) + (1 - probability) * log(1 
        - share)
solve(diff(Value, share), share)
[(odds*probability + probability - 1)/odds]

We arrive at the optimal share of capital to bet:

Optimal investment – single asset

In a financial market context, both outcomes and alternatives are more complex, but the Kelly criterion logic does still apply. It was made popular by Ed Thorp, who first applied it profitably to gambling (described in the book Beat the Dealer) and later started the successful hedge fund Princeton/Newport Partners.

With continuous outcomes, the growth rate of capital is defined by an integrate over the probability distribution of the different returns that can be optimized numerically:

We can solve this expression for the optimal f* using the scipy.optimize module. The quad function computes the value of a definite integral between two values a and b using FORTRAN's QUADPACK library (hence its name). It returns the value of the integral and an error estimate:

def norm_integral(f, m, st):
    val, er = quad(lambda s: np.log(1+f*s)*norm.pdf(s, m, st), m-3*st, 
                   m+3*st)
    return -val
def norm_dev_integral(f, m, st):
    val, er = quad(lambda s: (s/(1+f*s))*norm.pdf(s, m, st), m-3*st, 
                   m+3*st)
    return val
m = .058
s = .216
# Option 1: minimize the expectation integral
sol = minimize_scalar(norm_integral, args=(
                m, s), bounds=[0., 2.], method='bounded')
print('Optimal Kelly fraction: {:.4f}'.format(sol.x))
Optimal Kelly fraction: 1.1974
Optimal investment – multiple assets

We will use an example with various equities. E. Chan (2008) illustrates how to arrive at a multi-asset application of the Kelly criterion, and that the result is equivalent to the (potentially levered) maximum Sharpe ratio portfolio from the mean-variance optimization.

The computation involves the dot product of the precision matrix, which is the inverse of the covariance matrix, and the return matrix:

mean_returns = monthly_returns.mean()
cov_matrix = monthly_returns.cov()
precision_matrix = pd.DataFrame(inv(cov_matrix), index=stocks, columns=stocks)
kelly_wt = precision_matrix.dot(mean_returns).values

The Kelly portfolio is also shown in the previous efficient frontier diagram (after normalization so that the sum of the absolute weights equals one). Many investors prefer to reduce the Kelly weights to reduce the strategy's volatility, and Half-Kelly has become particularly popular.

Risk parity

The fact that the previous 15 years have been characterized by two major crises in the global equity markets, a consistently upwardly sloping yield curve, and a general decline in interest rates, made risk parity look like a particularly compelling option. Many institutions carved out strategic allocations to risk parity to further persify their portfolios.

A simple implementation of risk parity allocates assets according to the inverse of their variances, ignoring correlations and, in particular, return forecasts:

var = monthly_returns.var()
risk_parity_weights = var / var.sum()

The risk parity portfolio is also shown in the efficient frontier diagram at the beginning of this section.

Risk factor investment

An alternative framework for estimating input is to work down to the underlying determinants, or factors, that drive the risk and returns of assets. If we understand how the factors influence returns, and we understand the factors, we will be able to construct more robust portfolios.

The concept of factor investing looks beyond asset class labels. It looks to the underlying factor risks that we discussed in the previous chapter on alpha factors to maximize the benefits of persification. Rather than distinguishing investment vehicles by labels such as hedge funds or private equity, factor investing aims to identify distinct risk-return profiles based on differences in exposure to fundamental risk factors (Ang 2014).

The naive approach to mean-variance investing plugs (artificial) groupings as distinct asset classes into a mean-variance optimizer. Factor investing recognizes that such groupings share many of the same factor risks as traditional asset classes. Diversification benefits can be overstated, as investors discovered during the 2008 crisis when correlations among risky asset classes increased due to exposure to the same underlying factor risks.

In Chapter 7, Linear Models – From Risk Factors to Return Forecasts, we will show how to measure the exposure of a portfolio to various risk factors so that you can either adjust the positions to tune your factor exposure, or hedge accordingly.

Hierarchical risk parity

Mean-variance optimization is very sensitive to the estimates of expected returns and the covariance of these returns. The covariance matrix inversion also becomes more challenging and less accurate when returns are highly correlated, as is often the case in practice. The result has been called the Markowitz curse: when persification is more important because investments are correlated, conventional portfolio optimizers will likely produce an unstable solution. The benefits of persification can be more than offset by mistaken estimates. As discussed, even naive, equally weighted portfolios can beat mean-variance and risk-based optimization out of sample.

More robust approaches have incorporated additional constraints (Clarke et al., 2002) or Bayesian priors (Black and Litterman, 1992), or used shrinkage estimators to make the precision matrix more numerically stable (Ledoit and Wolf, 2003), available in scikit-learn (http://scikit-learn.org/stable/modules/generated/sklearn.covariance.LedoitWolf.html).

Hierarchical risk parity (HRP), in contrast, leverages unsupervised machine learning to achieve superior out-of-sample portfolio allocations. A recent innovation in portfolio optimization leverages graph theory and hierarchical clustering to construct a portfolio in three steps (Lopez de Prado, 2015):

  1. Define a distance metric so that correlated assets are close to each other, and apply single-linkage clustering to identify hierarchical relationships.
  2. Use the hierarchical correlation structure to quasi-diagonalize the covariance matrix.
  3. Apply top-down inverse-variance weighting using a recursive bisectional search to treat clustered assets as complements, rather than substitutes, in portfolio construction and to reduce the number of degrees of freedom.

A related method to construct hierarchical clustering portfolios (HCP) was presented by Raffinot (2016). Conceptually, complex systems such as financial markets tend to have a structure and are often organized in a hierarchical way, while the interaction among elements in the hierarchy shapes the dynamics of the system. Correlation matrices also lack the notion of hierarchy, which allows weights to vary freely and in potentially unintended ways.

Both HRP and HCP have been tested by JP Morgan (2012) on various equity universes. The HRP, in particular, produced equal or superior risk-adjusted returns and Sharpe ratios compared to naive persification, the maximum-persified portfolios, or GMV portfolios.

We will present the Python implementation in Chapter 13, Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning.