Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents
Zheng Xu, Douglas Burdick, Louiqa Raschid

TL;DR
This paper presents a rule-based method that leverages lists of financial institution names to improve the extraction and identification of these entities from unstructured financial documents, outperforming general approaches.
Contribution
The paper introduces the first specialized rule-based approach using root and suffix dictionaries for FI name recognition and resolution, tailored to financial documents.
Findings
Specialized dictionaries improve FI name extraction accuracy.
Comparison shows benefits of domain-specific rules over general methods.
Guidelines for tuning and customizing FI name extraction methods.
Abstract
There is a wealth of information about financial systems that is embedded in document collections. In this paper, we focus on a specialized text extraction task for this domain. The objective is to extract mentions of names of financial institutions, or FI names, from financial prospectus documents, and to identify the corresponding real world entities, e.g., by matching against a corpus of such entities. The tasks are Named Entity Recognition (NER) and Entity Resolution (ER); both are well studied in the literature. Our contribution is to develop a rule-based approach that will exploit lists of FI names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. Since the FI names are typically represented by a root, and a suffix that modifies the root, we use these lists of FI names to create specialized root and suffix dictionaries. To evaluate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Advanced Text Analysis Techniques
