LLM-based Triplet Extraction from Financial Reports
Dante Wesslund, Ville Stenstr\"om, Pontus Linde, Alexander Holmberg

TL;DR
This paper introduces a semi-automated, ontology-driven method for extracting triplets from financial reports using LLMs, with new metrics and verification strategies that improve accuracy and address hallucination issues.
Contribution
It presents a novel pipeline utilizing ontology conformance and faithfulness metrics, automated ontology induction, and a hybrid verification approach for improved triplet extraction from financial texts.
Findings
Automated ontology achieves 100% schema conformance.
Hybrid verification reduces false positives from 65.2% to 1.6%.
Identifies asymmetry in subject and object hallucinations.
Abstract
Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Reporting and XBRL · Advanced Text Analysis Techniques
