Exploiting Lists of Names for Named Entity Identification of Financial   Institutions from Unstructured Documents

Zheng Xu; Douglas Burdick; Louiqa Raschid

arXiv:1602.04427·cs.CL·June 8, 2016·5 cites

Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents

Zheng Xu, Douglas Burdick, Louiqa Raschid

PDF

Open Access

TL;DR

This paper presents a rule-based method that leverages lists of financial institution names to improve the extraction and identification of these entities from unstructured financial documents, outperforming general approaches.

Contribution

The paper introduces the first specialized rule-based approach using root and suffix dictionaries for FI name recognition and resolution, tailored to financial documents.

Findings

01

Specialized dictionaries improve FI name extraction accuracy.

02

Comparison shows benefits of domain-specific rules over general methods.

03

Guidelines for tuning and customizing FI name extraction methods.

Abstract

There is a wealth of information about financial systems that is embedded in document collections. In this paper, we focus on a specialized text extraction task for this domain. The objective is to extract mentions of names of financial institutions, or FI names, from financial prospectus documents, and to identify the corresponding real world entities, e.g., by matching against a corpus of such entities. The tasks are Named Entity Recognition (NER) and Entity Resolution (ER); both are well studied in the literature. Our contribution is to develop a rule-based approach that will exploit lists of FI names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. Since the FI names are typically represented by a root, and a suffix that modifies the root, we use these lists of FI names to create specialized root and suffix dictionaries. To evaluate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Advanced Text Analysis Techniques