TL;DR
This paper introduces an advanced stopping method for video text recognition that models next integrated recognition results with per-character alternatives, improving accuracy over previous clustering-based methods.
Contribution
It extends the next integrated result modelling approach to handle string recognition with per-character alternatives, demonstrating superior accuracy in stopping decisions.
Findings
Achieves higher accuracy than previous clustering-based methods.
Effective in determining optimal stopping time for recognition.
Validated on the MIDV-500 dataset.
Abstract
In the field of document analysis and recognition using mobile devices for capturing, and the field of object recognition in a video stream, an important problem is determining the time when the capturing process should be stopped. Efficient stopping influences not only the total time spent for performing recognition and data entry, but the expected accuracy of the result as well. This paper is directed on extending the stopping method based on next integrated recognition result modelling, in order for it to be used within a string result recognition model with per-character alternatives. The stopping method and notes on its extension are described, and experimental evaluation is performed on an open dataset MIDV-500. The method was compares with previously published methods based on input observations clustering. The obtained results indicate that the stopping method based on the next…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
