Self-Supervised Learning for Visual Summary Identification in Scientific   Publications

Shintaro Yamamoto; Anne Lauscher; Simone Paolo Ponzetto; Goran; Glava\v{s}; Shigeo Morishima

arXiv:2012.11213·cs.IR·January 15, 2021·1 cites

Self-Supervised Learning for Visual Summary Identification in Scientific Publications

Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran, Glava\v{s}, Shigeo Morishima

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning method and a new dataset for automatically identifying key figures as visual summaries in scientific publications, across multiple domains, without needing annotated data.

Contribution

It presents a novel self-supervised approach for figure selection and a comprehensive benchmark dataset spanning biomedical and computer science fields.

Findings

01

Outperforms state-of-the-art methods in figure selection

02

Effective across multiple scientific domains

03

Does not require annotated training data

Abstract

Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications. Nonetheless, efforts in providing visual publication summaries have been few and far apart, primarily focusing on the biomedical domain. This is primarily because of the limited availability of annotated gold standards, which hampers the application of robust and high-performing supervised learning techniques. To address these problems we create a new benchmark dataset for selecting figures to serve as visual summaries of publications based on their abstracts, covering several domains in computer science. Moreover, we develop a self-supervised learning approach, based on heuristic matching of inline references to figures with figure captions. Experiments in both biomedical and computer science…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Biomedical Text Mining and Ontologies