Multimodal Side-Tuning for Document Classification

Stefano Pio Zingaro; Giuseppe Lisanti; Maurizio Gabbrielli

arXiv:2301.07502·cs.LG·January 24, 2023

Multimodal Side-Tuning for Document Classification

Stefano Pio Zingaro, Giuseppe Lisanti, Maurizio Gabbrielli

PDF

1 Repo

TL;DR

This paper introduces a multimodal document classification method using side-tuning, which effectively combines different data sources like text and images, surpassing current accuracy benchmarks.

Contribution

It applies the side-tuning framework to multimodal data, enabling better model adaptation and avoiding issues like model rigidity and catastrophic forgetting.

Findings

01

Achieves higher accuracy than existing methods

02

Successfully combines text and image data for classification

03

Demonstrates effectiveness of side-tuning in multimodal settings

Abstract

In this paper, we propose to exploit the side-tuning framework for multimodal document classification. Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. Thanks to this technique it is actually possible to overcome model rigidity and catastrophic forgetting of transfer learning by fine-tuning. The proposed solution uses off-the-shelf deep learning architectures leveraging the side-tuning framework to combine a base model with a tandem of two side networks. We show that side-tuning can be successfully employed also when different data sources are considered, e.g. text and images in document classification. The experimental results show that this approach pushes further the limit for document classification accuracy with respect to the state of the art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thezingaro/multimodal-side-tuning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsResidual Connection · Depthwise Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Batch Normalization · Depthwise Separable Convolution · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block