MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document   Quality Prediction

Gideon Maillette de Buy Wenniger; Thomas van Dongen; Lambert Schomaker

arXiv:2308.07971·cs.CL·August 17, 2023·1 cites

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

Gideon Maillette de Buy Wenniger, Thomas van Dongen, Lambert Schomaker

PDF

Open Access

TL;DR

This paper introduces MultiSChuBERT, a multimodal model combining text and visual features to improve scholarly document quality prediction, demonstrating significant performance gains over text-only models across multiple datasets and embeddings.

Contribution

The paper presents a novel multimodal fusion approach for SDQP, highlighting the impact of embedding choice and training strategies like gradual unfreezing for better performance.

Findings

01

Multimodal fusion improves SDQP accuracy.

02

Gradual unfreezing reduces overfitting of visual models.

03

Advanced embeddings like SPECTER2.0 enhance prediction results.

Abstract

Automatic assessment of the quality of scholarly documents is a difficult task with high potential impact. Multimodality, in particular the addition of visual information next to text, has been shown to improve the performance on scholarly document quality prediction (SDQP) tasks. We propose the multimodal predictive model MultiSChuBERT. It combines a textual model based on chunking full paper text and aggregating computed BERT chunk-encodings (SChuBERT), with a visual model based on Inception V3.Our work contributes to the current state-of-the-art in SDQP in three ways. First, we show that the method of combining visual and textual embeddings can substantially influence the results. Second, we demonstrate that gradual-unfreezing of the weights of the visual sub-model, reduces its tendency to ovefit the data, improving results. Third, we show the retained benefit of multimodality when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Adam · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Dense Connections