Measuring backtest performance with pyfolio
Pyfolio facilitates the analysis of portfolio performance, both in and out of sample using a rich set of metrics and visualizations. It produces tear sheets that cover the analysis of returns, positions, and transactions, as well as event risk during periods of market stress using several built-in scenarios. It also includes Bayesian out-of-sample performance analysis.
Pyfolio relies on portfolio returns and position data and can also take into account the transaction costs and slippage losses of trading activity. It uses the empyrical library, which can also be used on a standalone basis to compute performance metrics.
Creating the returns and benchmark inputs
The library is part of the Quantopian ecosystem and is compatible with Zipline and Alphalens. We will first demonstrate how to generate the requisite inputs from Alphalens and then show how to extract them from a Zipline backtest performance DataFrame. The code samples for this section are in the notebook 03_pyfolio_demo.ipynb.
Getting pyfolio input from Alphalens
Pyfolio also integrates with Alphalens directly and permits the creation of pyfolio input data using create_pyfolio_input:
from alphalens.performance import create_pyfolio_input
qmin, qmax = factor_data.factor_quantile.min(),
factor_data.factor_quantile.max()
input_data = create_pyfolio_input(alphalens_data,
period='1D',
capital=100000,
long_short=False,
equal_weight=False,
quantiles=[1, 5],
benchmark_period='1D')
returns, positions, benchmark = input_data
There are two options to specify how portfolio weights will be generated:
- long_short: If False, weights will correspond to factor values pided by their absolute value so that negative factor values generate short positions. If True, factor values are first demeaned so that long and short positions cancel each other out, and the portfolio is market neutral.
- equal_weight: If True and long_short is True, assets will be split into two equal-sized groups, with the top/bottom half making up long/short positions.
Long-short portfolios can also be created for groups if factor_data includes, for example, sector information for each asset.
Getting pyfolio input from a Zipline backtest
The result of a Zipline backtest can also be converted into the required pyfolio input using extract_rets_pos_txn_from_zipline:
returns, positions, transactions =
extract_rets_pos_txn_from_zipline(backtest)
Walk-forward testing – out-of-sample returns
Testing a trading strategy involves back- and forward testing. The former involves historical data and often refers to the sample period used to fine-tune alpha factor parameters. Forward-testing simulates the strategy on new market data to validate that it performs well out of sample and is not too closely tailored to specific historical circumstances.
Pyfolio allows for the designation of an out-of-sample period to simulate walk-forward testing. There are numerous aspects to take into account when testing a strategy to obtain statistically reliable results. We will address this in more detail in Chapter 8, The ML4T Workflow – From Model to Strategy Backtesting.
The plot_rolling_returns function displays cumulative in- and out-of-sample returns against a user-defined benchmark (we are using the S&P 500). Pyfolio computes cumulative returns as the product of simple returns after adding 1 to each:
from pyfolio.plotting import plot_rolling_returns
plot_rolling_returns(returns=returns,
factor_returns=benchmark_rets,
live_start_date='2016-01-01',
cone_std=(1.0, 1.5, 2.0))
The plot in Figure 5.4 includes a cone that shows expanding confidence intervals to indicate when out-of-sample returns appear unlikely, given random-walk assumptions. Here, our toy strategy did not perform particularly well against the S&P 500 benchmark during the simulated 2016 out-of-sample period:
Figure 5.4: Pyfolio cumulative performance plot
Summary performance statistics
Pyfolio offers several analytic functions and plots. The perf_stats summary displays the annual and cumulative returns, volatility, skew, and kurtosis of returns and the SR.
The following additional metrics (which can also be calculated inpidually) are most important:
- Max drawdown: Highest percentage loss from the previous peak
- Calmar ratio: Annual portfolio return relative to maximal drawdown
- Omega ratio: Probability-weighted ratio of gains versus losses for a return target, zero per default
- Sortino ratio: Excess return relative to downside standard deviation
- Tail ratio: Size of the right tail (gains, the absolute value of the 95th percentile) relative to the size of the left tail (losses, absolute value of the 5th percentile)
- Daily value at risk (VaR): Loss corresponding to a return two standard deviations below the daily mean
- Alpha: Portfolio return unexplained by the benchmark return
- Beta: Exposure to the benchmark
The plot_perf_stats function bootstraps estimates of parameter variability and displays the result as a box plot:
Figure 5.5: Pyfolio performance statistic plot
The show_perf_stats function computes numerous metrics for the entire period, as well as separately, for in- and out-of-sample periods:
from pyfolio.timeseries import show_perf_stats
show_perf_stats(returns=returns,
factor_returns=benchmark_rets,
positions=positions,
transactions=transactions,
live_start_date=oos_date)
For the simulated long-short portfolio derived from the MeanReversion factor, we obtain the following performance statistics:
See the appendix for details on the calculation and interpretation of portfolio risk and return metrics.
Drawdown periods and factor exposure
The plot_drawdown_periods(returns) function plots the principal drawdown periods for the portfolio, and several other plotting functions show the rolling SR and rolling factor exposures to the market beta or the Fama-French size, growth, and momentum factors:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16, 10))
axes = ax.flatten()
plot_drawdown_periods(returns=returns, ax=axes[0])
plot_rolling_beta(returns=returns, factor_returns=benchmark_rets,
ax=axes[1])
plot_drawdown_underwater(returns=returns, ax=axes[2])
plot_rolling_sharpe(returns=returns)
The plots in Figure 5.6, which highlights a subset of the visualization contained in the various tear sheets, illustrate how pyfolio allows us to drill down into the performance characteristics and gives us exposure to fundamental drivers of risk and returns:
Figure 5.6: Various pyfolio plots of performance over time
Modeling event risk
Pyfolio also includes timelines for various events that you can use to compare the performance of a portfolio to a benchmark during this period. Pyfolio uses the S&P 500 by default, but you can also provide benchmark returns of your choice. The following example compares the performance to the S&P 500 during the fall 2015 selloff, following the Brexit vote:
interesting_times = extract_interesting_date_ranges(returns=returns)
interesting_times['Fall2015'].to_frame('pf') \
.join(benchmark_rets) \
.add(1).cumprod().sub(1) \
.plot(lw=2, figsize=(14, 6), title='Post-Brexit Turmoil')
Figure 5.7 shows the resulting plot:
Figure 5.7: Pyfolio event risk analysis