Large scale deduplication based on fingerprints
Jean Aymar Biyiha Nlend, Ibrahim Moukouop Nguena, Thomas Bouetou, Bouetou

TL;DR
This paper introduces a novel fingerprint deduplication algorithm that significantly reduces computational complexity from quadratic to linear time, enabling large-scale databases to be processed efficiently on a single computer.
Contribution
The paper presents a new fingerprint indexing method using a 5x5 matrix that allows for fast clustering and deduplication, improving scalability and speed over existing algorithms.
Findings
Achieves less than 1% penetration rate in deduplication.
Performs deduplication of 10 million fingerprints in under two hours.
Reduces computational complexity from O(n^2) to O(n) for large datasets.
Abstract
In fingerprint-based systems, the size of databases increases considerably with population growth. In developing countries, because of the difficulty in using a central system when enlisting voters, it often happens that several regional voter databases are created and then merged to form a central database. A process is used to remove duplicates and ensure uniqueness by voter. Until now, companies specializing in biometrics use several costly computing servers with algorithms to perform large-scale deduplication based on fingerprints. These algorithms take a considerable time because of their complexity in O (n2), where n is the size of the database. This article presents an algorithm that can perform this operation in O (2n), with just a computer. It is based on the development of an index obtained using a 5 * 5 matrix performed on each fingerprint. This index makes it possible to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiometric Identification and Security · Advanced Steganography and Watermarking Techniques · Advanced Malware Detection Techniques
