Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification
Andres Azqueta-Gavald\'on, Joaquin Ramos Cosgrove

TL;DR
This paper investigates the use of Large Language Models to improve the accuracy of cross-border entity identification, addressing limitations of traditional string-matching algorithms in complex, multilingual financial contexts.
Contribution
It introduces LLM-based methods as a superior alternative for entity matching, demonstrating enhanced accuracy and reduced false positives over traditional techniques.
Findings
Traditional methods achieve over 92% accuracy but have high false positive rates.
LLM-based approaches surpass 93% accuracy and 96% F1 scores.
Interface-based LLMs significantly reduce false positives compared to traditional algorithms.
Abstract
The growing prevalence of cross-border financial activities in global markets has underscored the necessity of accurately identifying and classifying foreign entities. This practice is essential within the Spanish financial system for ensuring robust risk management, regulatory adherence, and the prevention of financial misconduct. This process involves a labor-intensive entity-matching task, where entities need to be validated against available reference sources. Challenges arise from linguistic variations, special characters, outdated names, and changes in legal forms, complicating traditional matching algorithms like Jaccard, cosine, and Levenshtein distances. These methods struggle with contextual nuances and semantic relationships, leading to mismatches. To address these limitations, we explore Large Language Models (LLMs) as a flexible alternative. LLMs leverage extensive training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management
