XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags
Faisal Tareque Shohan, Mir Tafseer Nayeem, Samsul Islam, Abu Ubaida, Akash, Shafiq Joty

TL;DR
This paper introduces XL-HeadTags, a multimodal retrieval and instruction tuning approach for generating multilingual news headlines and tags, leveraging images and captions to improve content relevance and accessibility.
Contribution
It presents a novel dataset and method for multimodal, multilingual headline and tag generation, addressing the gap in tag generation research and enhancing content selection strategies.
Findings
Multimodal retrievers improve headline and tag quality.
The dataset covers 20 languages across 6 language families.
Tools developed enable better multilingual text processing.
Abstract
Millions of news articles published online daily can overwhelm readers. Headlines and entity (topic) tags are essential for guiding readers to decide if the content is worth their time. While headline generation has been extensively studied, tag generation remains largely unexplored, yet it offers readers better access to topics of interest. The need for conciseness in capturing readers' attention necessitates improved content selection strategies for identifying salient and relevant segments within lengthy articles, thereby guiding language models effectively. To address this, we propose to leverage auxiliary information such as images and captions embedded in the articles to retrieve relevant sentences and utilize instruction tuning with variations to generate both headlines and tags for news articles in a multilingual context. To make use of the auxiliary information, we have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Topic Modeling
