Multimodal Approach for Metadata Extraction from German Scientific Publications
Azeddine Bouabdallah, Jorge Gavilan, Jennifer Gerbl, Prayuth, Patumcharoenpol

TL;DR
This paper presents a multimodal deep learning method combining NLP and vision techniques to improve metadata extraction accuracy from German scientific papers with diverse layouts.
Contribution
It introduces a novel multimodal approach that leverages spatial and contextual features, trained on 8800 documents, achieving high F1-score for metadata extraction.
Findings
Achieved an F1-score of 0.923 on the dataset.
Outperformed existing state-of-the-art methods.
Effectively handles diverse German scientific paper layouts.
Abstract
Nowadays, metadata information is often given by the authors themselves upon submission. However, a significant part of already existing research papers have missing or incomplete metadata information. German scientific papers come in a large variety of layouts which makes the extraction of metadata a non-trivial task that requires a precise way to classify the metadata extracted from the documents. In this paper, we propose a multimodal deep learning approach for metadata extraction from scientific papers in the German language. We consider multiple types of input data by combining natural language processing and image vision processing. This model aims to increase the overall accuracy of metadata extraction compared to other state-of-the-art approaches. It enables the utilization of both spatial and contextual features in order to achieve a more reliable extraction. Our model for this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Image Processing and 3D Reconstruction
