Asset Classes

Free investment financial education


Multilingual content from IBKR

Close Navigation
Learn more about IBKR accounts
Complexity Is a Virtue in Return Prediction

Complexity Is a Virtue in Return Prediction

Posted June 18, 2024 at 11:39 am
Tommi Johnsen
Alpha Architect

The article “Complexity Is a Virtue in Return Prediction” first appeared on Alpha Architect blog.

Finance has seen unprecedented growth in the use of artificial intelligence, specifically in machine learning models.  Applications have included portfolio construction, stock analysis and in this case, the prediction of stock market returns.  This paper discusses the benefits of using complex models as found in AI, over simple models such as ordinary least squares for predicting market returns. The authors highlight the limitations of traditional simple models and advocate for the adoption of complex, machine learning-based approaches. Traditionally, market return predictions have relied on simple models with only a few parameters which significantly understate the predictability of stock returns. Complex models, which use more parameters than the number of observations, offer much better levels of predictability for market returns. Good news for the application of AI in quantitative finance.

This is an excellent article. I believe it will provide much impetus in moving finance and investments in the direction of complexity and away from the very restrictive, simple models we are using currently. A word of warning: the article is heavy on the mathematical and theoretical foundations of ML models. The reward for working through the details is an understanding of the statistical linkages between the large or complex and small or simple models. This summary will only provide a review of the high points, at least for now.

The Virtue of Complexity in Return Prediction

  • Bryan Kelly, Semyon Malamud and Kangying Zhou
  • Journal of Finance
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category.

What are the research questions?

  1. What are the main objectives of the article?
  2. Does the “virtue of complexity” associated with highly parameterized, machine learning models apply to the prediction of market returns?
  3. Is high complexity or highly parameterized models the same as datamining?

What are the Academic Insights?

  1. The article has three main objectives:
    • First, the authors argue that simple models, which use only a few parameters, significantly understate return predictability compared to complex models with more parameters than observations.
    • Second, they provide theoretical proof that complex models outperform simple models in predicting returns when appropriate shrinkage is applied. Shrinkage is a crucial technique in using complex models for return prediction, ensuring that these models maintain predictive power without overfitting the training data. Shrinkage means that the coefficients are reduced towards zero compared to the OLS parameter estimates, in order to achieve parameter selection.
    • Third, the empirical evidence from the U.S. equity market supported the theoretical findings, demonstrating the benefits of complex models. The focus was on forecasting the aggregate stock market return using a set of 14 predictor variables and evaluating the market timing strategies derived from these forecasts. The empirical analysis targets the monthly excess return of the CRSP value-weighted index. The information set for prediction includes predictor variables from Goyal and Welch (2008), available monthly from 1926 to 2020, (Dividend-Price Ratio, Dividend Yield, Earnings-Price Ratio, Stock Variance, Book-to-Market Ratio, Net Equity Expansion, Treasury Bill Rate, Long-Term Yield, Long-Term Return, Term Spread, Default Yield Spread, Default Return Spread, Inflation, one lag of the market return). Returns and predictors were volatility-standardized using backward-looking standard deviations to preserve the out-of-sample nature of forecasts.
  2. YES. A simple definition of a highly parameterized model refers to the situation when the model parameters are larger than necessary to fit the data. In addition to establishing the virtue of complexity, the authors demonstrate that out-of-sample R2 from a prediction model is a poor measure of its economic value. A simulated market timing model earned large economic profits indicated by significant Sharpe ratios and information ratios, even when the R2 was large and negative. See the highlights in Table I for performance measures. Note that the complex (nonlinear) model turns in positive Sharpe and information ratios when compared to the linear model and the market itself. This is strong empirical evidence supporting the theoretical claims about the virtue of complexity in return prediction and market timing strategies. It demonstrates that high-complexity models can achieve substantial economic value regardless of the R2. The results advocate for the inclusion of rich, nonlinear models in empirical finance to leverage the benefits of model complexity. Perhaps researchers should focus more on economic value and less on forecast accuracy.
  3. NO. In some cases, data mining can involve high parameterization, especially when complex models are used to uncover patterns in data. For instance, using a neural network, also a highly parameterized model, to find patterns in customer behavior can be seen as an intersection of data mining and high parameterization. A quick refresher on the meaning of datamining: the “discovering historical patterns that are driven by random, not real, relationships and assuming they’ll repeat…a huge concern in many fields” (Asness, 2015). In finance, datamining is especially relevant when researchers are attempting to explain or identify patterns in stock returns. It is difficult to ensure that the results are not “one-time wonders” especially if the data has been used inappropriately. These practices are at high-risk to produce significant results out of random phenomena which goes a long way toward explaining why predictions about investment strategies fail on a going forward basis.

Why does it matter?

The research presented here establishes the “virtue of complexity” found in ML models and finds that it aligns itself very closely with real-world market behavior without the bias imposed by the simple models or the misuse of statistics. The authors do caution against adding variables to a model on an arbitrary basis but encourage adding them if they are likely to be relevant. They also encourage the use of highly parameterized nonlinear prediction models. A few takeaways: (1) Simple models are preferable only if they are specified correctly and that’s a tall order, (2) Complex models are preferable under general conditions, and (3) There is a need to move beyond simple models and consider the benefits of complexity, especially in the context of machine learning, to improve return predictions and portfolio performance .

 The most important chart from the paper

The results are hypothetical results and are NOT an indicator of future results and do NOT represent returns that any investor actually attained.  Indexes are unmanaged and do not reflect management or trading fees, and one cannot invest directly in an index.


Much of the extant literature predicts market returns with “simple” models that use only a few parameters. Contrary to conventional wisdom, we theoretically prove that simple models severely understate return predictability compared to “complex” models in which the number of parameters exceeds the number of observations. We empirically document the virtue of complexity in U.S. equity market return prediction. Our findings establish the rationale for modeling expected returns through machine learning.

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Leave a Reply

Disclosure: Alpha Architect

The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).

This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Alpha Architect and is being posted with its permission. The views expressed in this material are solely those of the author and/or Alpha Architect and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.