Interpretable Company Similarity with Sparse Autoencoders
Marco Molinari, Victor Shao, Luca Imeneo, Mateusz Mikolajczak, Vladimir Tregubiak, Abhimanyu Pandey, Sebastian Kuznetsov Ryder Torres Pereira

TL;DR
This paper introduces a method using Sparse Autoencoders to create interpretable, meaningful clusters of companies based on descriptions, outperforming traditional sector codes and embeddings in capturing fundamental similarities and improving trading strategies.
Contribution
The paper demonstrates that Sparse Autoencoders can produce interpretable company clusters that better reflect fundamental characteristics than existing classification methods.
Findings
SAE features outperform SIC and GICS codes in correlation with returns
SAE-based clusters yield higher Sharpe ratios in trading strategies
Clusters are simple and interpretable, aiding high-stakes decision-making
Abstract
Determining company similarity is a vital task in finance, underpinning risk management, hedging, and portfolio diversification. Practitioners often rely on sector and industry classifications such as SIC and GICS codes to gauge similarity, the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Since these classifications lack granularity and need regular updating, using clusters of embeddings of company descriptions has been proposed as a potential alternative, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders (SAEs) have shown promise in enhancing the interpretability of Large Language Models (LLMs) by decomposing Large Language Model (LLM) activations into interpretable features. Moreover, SAEs capture an LLM's internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Stock Market Forecasting Methods
