On-Device Document Classification using multimodal features

Sugam Garg; Harichandana; Sumit Kumar

arXiv:2101.01880·cs.CV·January 7, 2021

On-Device Document Classification using multimodal features

Sugam Garg, Harichandana, Sumit Kumar

PDF

TL;DR

This paper presents a novel, optimized multimodal model pipeline for on-device document classification that preserves user privacy and achieves competitive results with significant model compression.

Contribution

The paper introduces a new on-device multimodal classification pipeline combining OCR and a novel model architecture, optimized for size and privacy.

Findings

01

Achieved 30% model compression while maintaining accuracy.

02

Demonstrated effectiveness on FOOD-101 dataset.

03

Enabled private on-device document classification.

Abstract

From small screenshots to large videos, documents take up a bulk of space in a modern smartphone. Documents in a phone can accumulate from various sources, and with the high storage capacity of mobiles, hundreds of documents are accumulated in a short period. However, searching or managing documents remains an onerous task, since most search methods depend on meta-information or only text in a document. In this paper, we showcase that a single modality is insufficient for classification and present a novel pipeline to classify documents on-device, thus preventing any private user data transfer to server. For this task, we integrate an open-source library for Optical Character Recognition (OCR) and our novel model architecture in the pipeline. We optimise the model for size, a necessary metric for on-device inference. We benchmark our classification model with a standard multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.