DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem, Subhajit Maity, Ayan Banerjee, Matthew Blaschko,, Marie-Francine Moens, Josep Llad\'os, Sanket Biswas

TL;DR
This paper investigates knowledge distillation techniques to create efficient, high-performing models for visually-rich document understanding tasks, emphasizing the importance of model compression and robustness in downstream applications.
Contribution
It introduces a KD experimentation methodology for document understanding, compares various strategies across architectures, and evaluates their robustness in downstream tasks like DocVQA.
Findings
Some KD methods outperform supervised training in knowledge transfer.
Distilled models show a significant knowledge gap affecting downstream robustness.
Robustness in downstream tasks does not directly correlate with knowledge gap size.
Abstract
This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant models on document understanding (DU) tasks that are integral within larger task pipelines. We carefully selected KD strategies (response-based, feature-based) for distilling knowledge to and from backbones with different architectures (ResNet, ViT, DiT) and capacities (base, small, tiny). We study what affects the teacher-student knowledge gap and find that some methods (tuned vanilla KD, MSE, SimKD with an apt projector) can consistently outperform supervised student training. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Digital Humanities and Scholarship
MethodsDeep Layer Aggregation · Knowledge Distillation
