Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models
Victor Zitian Chen

TL;DR
This paper develops fine-tuned BERT and LSTM models to efficiently extract stakeholder-material information from lengthy 10-K reports, significantly outperforming keyword search baselines and enabling better stakeholder impact analysis.
Contribution
It introduces a novel application of fine-tuned BERT and LSTM models for extracting stakeholder-related information from 10-K reports, with extensive evaluation on business expert-labeled data.
Findings
Best model achieved 0.904 accuracy and 0.899 F1 score.
Fine-tuned BERT outperformed LSTM and baseline models.
Models effectively identified stakeholder information across different groups.
Abstract
All public companies are required by federal securities law to disclose their business and financial activities in their annual 10-K reports. Each report typically spans hundreds of pages, making it difficult for human readers to identify and extract the material information efficiently. To solve the problem, I have fine-tuned BERT models and RNN models with LSTM layers to identify stakeholder-material information, defined as statements that carry information about a company's influence on its stakeholders, including customers, employees, investors, and the community and natural environment. The existing practice uses keyword search to identify such information, which is my baseline model. Using business expert-labeled training data of nearly 6,000 sentences from 62 10-K reports published in 2022, the best model has achieved an accuracy of 0.904 and an F1 score of 0.899 in test data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImpact of AI and Big Data on Business and Society · FinTech, Crowdfunding, Digital Finance · Computational and Text Analysis Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Adam · Softmax · Linear Layer · Residual Connection · Dense Connections · Dropout · Sigmoid Activation
