Researchers have found that AI models can outperform humans in forecasting future company earnings. ChatGPT 4.0 produces superior directional earnings forecasts than human analysts. That’s after ChatGPT examined anonymized historic balance sheet and income statement data for prior years. The researchers also found that investing on that AI-based analysis could historically lead to stock market outperformance.
The Research
The paper ‘Financial Statement Analysis with Large Language Models’ was published by researchers at the University of Chicago in May 2024. They provided the AI tool with standardized balance sheet and income statement data and used a detailed “chain-of-thought” prompt outlining a range of analytical techniques and metrics common in earnings forecast analysis.
Importantly the Large Language Model did not have any of the industry context that human analysts can benefit from. Nor did the models know specific company detail beyond numerical financial statement data. Yet the results were relatively impressive. The goal was to determine whether earnings would grow or decline in a subsequent period and to indicate broad magnitude confidence.
Measured Performance
The model outperformed consensus forecasts. It showed comparable, if not superior, performance to better public industry models. Overall accuracy in predicting whether earnings would increase or decline was approximately 60%.
The model was also able to produce similar results for 2023 using 2022 data, which suggested that the model was not somehow drawing on actual retrieval of historic performance. The researchers were able to know this as 2023 results, disclosed by companies in 2024 were outside of ChatGPT 4.0’s training window.
Interestingly, ChatGPT 4.0 produced superior forecasting accuracy to ChatGPT 3.5. Results for Google’s
Google
Gemini Pro 1.5, though tested over a more limited sample, were broadly similar to ChatGPT 4.0.
However, as is often the case with LLMs the researchers are unable to pinpoint exactly what the model is doing that is resulting in forecast accuracy. That said, the researchers did assess the most common descriptors use in the model’s output finding terms such as “operating margin” and “current ratio” to be more commonly used across a broad set of terminology.
The researchers also suspect that the combination of human and AI models are likely to result in superior forecasts because humans can bring additional insight that LLMs may not currently have access to, whereas LLMs can avoid common human biases and perform robust and comprehensive analysis.
Alpha Generation
The researchers found that the model could outperform the broader stock market if annual portfolios were formed based on its predictions with performance measured monthly. The Sharpe ratio from this strategy was superior to that of an artificial neural net trained for earnings prediction on an equal-weighted basis, though the artificial neural net outperformed the ChatGPT model in terms of Sharpe ratio on a value-weighted basis.
The bulk of the model’s returns especially in recent history appear to have come from its long positions rather than its short exposure. It also appears that forecasting accuracy has declined somewhat in recent decades, though that is true for other models too and the results remain generally above that of human consensus forecasts.
What’s Next?
Of course, many useful stock market models are not public because investors who are profit from them have little incentive publicly share them. Therefore, there may be superior models out that that ChatGPT 4.0 is unable outperform.
Still, the success of ChatGPT 4.0 in predicting earnings directions with relatively limited financial data and, importantly, the considerable improvement in performance relative to ChatGPT 3.5 is impressive. As in many fields, it is likely that LLMs will be increasingly disruptive and effective in areas of financial analysis and prediction.
Read the full article here