Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer
Da Chang, Yu Li

TL;DR
This paper introduces DLoRA-TrOCR, a parameter-efficient hybrid OCR model that enhances mixed scene text recognition accuracy and generalization while significantly reducing training parameters and computational resources.
Contribution
It proposes a novel hybrid fine-tuning approach embedding weight-decomposed DoRA and LoRA modules into a pre-trained OCR Transformer, achieving state-of-the-art results with minimal trainable parameters.
Findings
Achieves a CER of 4.02 on IAM dataset
Attains a F1 score of 94.29 on SROIE dataset
Reaches WAR of 86.70 on STR Benchmark
Abstract
With the rapid development of OCR technology, mixed-scene text recognition has become a key technical challenge. Although deep learning models have achieved significant results in specific scenarios, their generality and stability still need improvement, and the high demand for computing resources affects flexibility. To address these issues, this paper proposes DLoRA-TrOCR, a parameter-efficient hybrid text spotting method based on a pre-trained OCR Transformer. By embedding a weight-decomposed DoRA module in the image encoder and a LoRA module in the text decoder, this method can be efficiently fine-tuned on various downstream tasks. Our method requires no more than 0.7\% trainable parameters, not only accelerating the training efficiency but also significantly improving the recognition accuracy and cross-dataset generalization performance of the OCR system in mixed text scenes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam
