Taxonomy-Aligned Risk Extraction from 10-K Filings with Autonomous Improvement Using LLMs
Rian Dolphin, Joe Dursun, Jarrett Blankenship, Katie Adams, Quinton Pike

TL;DR
This paper introduces a novel LLM-based method for extracting structured risk factors from 10-K filings, ensuring taxonomy alignment and autonomous system improvement, validated on a large dataset with meaningful industry-related risk similarities.
Contribution
It presents a three-stage extraction pipeline combined with autonomous taxonomy maintenance, enabling continuous improvement and accurate risk factor extraction from financial documents.
Findings
Extracted 10,688 risk factors from S&P 500 companies.
Achieved 104.7% improvement in embedding separation through autonomous refinement.
Confirmed taxonomy captures meaningful economic structure with higher risk similarity within industries.
Abstract
We present a methodology for extracting structured risk factors from corporate 10-K filings while maintaining adherence to a predefined hierarchical taxonomy. Our three-stage pipeline combines LLM extraction with supporting quotes, embedding-based semantic mapping to taxonomy categories, and LLM-as-a-judge validation that filters spurious assignments. To evaluate our approach, we extract 10,688 risk factors from S&P 500 companies and examine risk profile similarity across industry clusters. Beyond extraction, we introduce autonomous taxonomy maintenance where an AI agent analyzes evaluation feedback to identify problematic categories, diagnose failure patterns, and propose refinements, achieving 104.7% improvement in embedding separation in a case study. External validation confirms the taxonomy captures economically meaningful structure: same-industry companies exhibit 63% higher risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Advanced Text Analysis Techniques
