FreSaDa: A French Satire Data Set for Cross-Domain Satire Detection

Radu Tudor Ionescu; Adrian Gabriel Chifu

arXiv:2104.04828·cs.CL·May 18, 2021

FreSaDa: A French Satire Data Set for Cross-Domain Satire Detection

Radu Tudor Ionescu, Adrian Gabriel Chifu

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces FreSaDa, a French satire dataset designed for cross-domain satire detection, and evaluates baseline classification methods along with an unsupervised domain adaptation approach that improves performance.

Contribution

The paper presents a new French satire dataset with a cross-domain setup and proposes an unsupervised domain adaptation method that enhances satire detection accuracy.

Findings

01

Domain-specific features improve classification performance

02

Unsupervised domain adaptation significantly boosts results

03

Cross-source evaluation reveals challenges in satire detection

Abstract

In this paper, we introduce FreSaDa, a French Satire Data Set, which is composed of 11,570 articles from the news domain. In order to avoid reporting unreasonably high accuracy rates due to the learning of characteristics specific to publication sources, we divided our samples into training, validation and test, such that the training publication sources are distinct from the validation and test publication sources. This gives rise to a cross-domain (cross-source) satire detection task. We employ two classification methods as baselines for our new data set, one based on low-level features (character n-grams) and one based on high-level features (average of CamemBERT word embeddings). As an additional contribution, we present an unsupervised domain adaptation method based on regarding the pairwise similarities (given by the dot product) between the training samples and the validation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adrianchifu/FreSaDa
noneOfficial

Datasets

FrancophonIA/FreSaDa
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.