Automated Journalistic Questions: A New Method for Extracting 5W1H in French
Maxence Verhaverbeke, Julie A. Gramaccia, Richard Khoury

TL;DR
This paper introduces the first automated method for extracting 5W1H questions from French news articles, aiding journalistic tasks like summarization and clustering.
Contribution
It presents a novel extraction pipeline for French 5W1H questions and a new annotated corpus for evaluation.
Findings
Pipeline performs comparably to GPT-4o in extraction accuracy.
Created a corpus of 250 annotated French news articles.
Demonstrated effectiveness of the method in journalistic tasks.
Abstract
The 5W1H questions -- who, what, when, where, why and how -- are commonly used in journalism to ensure that an article describes events clearly and systematically. Answering them is a crucial prerequisites for tasks such as summarization, clustering, and news aggregation. In this paper, we design the first automated extraction pipeline to get 5W1H information from French news articles. To evaluate the performance of our algorithm, we also create a corpus of 250 Quebec news articles with 5W1H answers marked by four human annotators. Our results demonstrate that our pipeline performs as well in this task as the large language model GPT-4o.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Digital Communication and Language
