HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System
Bao-Sinh Nguyen, Quang-Bach Tran, Tuan-Anh Nguyen Dang, Duc Nguyen,, Hung Le

TL;DR
This paper introduces HYCEDIS, a novel hybrid confidence estimation system for deep document information extraction, combining conformal prediction and anomaly detection to provide reliable confidence scores without modifying existing models.
Contribution
The paper presents a new architecture that accurately estimates confidence in deep learning-based document information extraction without altering the original models.
Findings
Outperforms existing confidence estimators significantly
Demonstrates strong generalization to out-of-distribution data
Effective on real-world scanned document datasets
Abstract
Measuring the confidence of AI models is critical for safely deploying AI in real-world industrial systems. One important application of confidence measurement is information extraction from scanned documents. However, there exists no solution to provide reliable confidence score for current state-of-the-art deep-learning-based information extractors. In this paper, we propose a complete and novel architecture to measure confidence of current deep learning models in document information extraction task. Our architecture consists of a Multi-modal Conformal Predictor and a Variational Cluster-oriented Anomaly Detector, trained to faithfully estimate its confidence on its outputs without the need of host models modification. We evaluate our architecture on real-wold datasets, not only outperforming competing confidence estimators by a huge margin but also demonstrating generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Topic Modeling
