Machine learning models outperform classical statistical approaches in anticipating bank distress, according to new research by Joel Suss and Henry Treitel, which compares the effectiveness of traditionally used early warning systems with new techniques.
In terms of firm-level financial ratios, the researchers find that a firm’s sensitivity to market risk (ratio of trading book to total assets), capital buffer (difference between regulatory capital and requirements) and net interest margin are the most important factors. Bank supervisors can use these findings to anticipate firm weaknesses and take appropriate mitigating action ahead of time.
The researchers run a ‘horse race’ between different possible early warning systems for UK bank distress. The research compares and contrasts a number of machine learning techniques (random forest, boosted decision trees, K nearest neighbours and support vector machines) with classical statistical approaches traditionally used as early warning systems (logistic regression and random effects logistic regression) for predicting distress 1 year out.
In essence, each of these machine learning and classical techniques, in their own particular way, learn the relationship between the input and output data provided; in this case, bank financial ratios, balance sheet growth rates and macroeconomic data for the former, and subjective supervisory assessment of firm risk for the latter.
The aim of the research is to choose which of these techniques is best suited as an early warning system of bank distress. The authors have two broad evaluation criteria: performance and transparency. Regulators’ jobs are made easier not only from useful and accurate predictions, but also from an understanding of what is driving those predictions.
For example, if it is known that a fall in a firm’s total capital ratio combined with relatively weak net interest margin is responsible for a spike in a firm’s predicted probability of distress, supervisors have a better sense of what mitigating action may be required.
The study finds that machine learning techniques are generally better than the traditional models at predicting bank distress. The overall winner is the random forest algorithm, which outperforms other approaches for a host of performance metrics.
In particular, the study evaluates the performance of each technique based on two different ways it could make mistakes: false negatives (missing actual cases of distress) and false positives (wrongly predicting distress).
From a supervisor’s perspective, false negatives are far more problematic – an early warning system that fails to set the alarm when it should, particularly for large, systemically important institutions, can have deleterious consequences. Scrutinising a flagged bank that goes on to perform better than predicted, though costly in terms of resources, poses a less serious problem. The random forest performs increasingly better as the importance placed on false negatives increases.
Machine learning techniques tend to be opaque relative to traditional models. This study therefore applies state-of-the-art machine learning interpretability techniques to provide a sense of what is driving the random forest predictions, known as Shapley values.
In terms of firm-level financial ratios, the researchers find that a firm’s sensitivity to market risk (ratio of trading book to total assets), capital buffer (difference between regulatory capital and requirements) and net interest margin are the most important factors.
Overall, this study makes important contributions, not least of which is practical: bank supervisors can use the findings to anticipate firm weaknesses and take appropriate mitigating action ahead of time.