Talking to oneself in CMC: a study of self replies in Wikipedia talk   pages

Ludovic Tanguy (CLLE); C\'eline Poudat; Lydia-Mai Ho-Dac (CLLE)

arXiv:2411.19007·cs.CL·December 2, 2024

Talking to oneself in CMC: a study of self replies in Wikipedia talk pages

Ludovic Tanguy (CLLE), C\'eline Poudat, Lydia-Mai Ho-Dac (CLLE)

PDF

Open Access

TL;DR

This paper investigates self-replies in Wikipedia talk pages, analyzing their linguistic features, categorizing them, and comparing human and AI annotation performance to understand their role in online discussions.

Contribution

It introduces a typology of self-replies, applies it to English and French samples, and compares human and AI annotation effectiveness.

Findings

01

Self-replies occur in over 10% of threads with multiple messages.

02

Humans achieve reasonable annotation efficiency, while LLMs face challenges.

03

A seven-category typology helps classify self-replies effectively.

Abstract

This study proposes a qualitative analysis of self replies in Wikipedia talk pages, more precisely when the first two messages of a discussion are written by the same user. This specific pattern occurs in more than 10% of threads with two messages or more and can be explained by a number of reasons. After a first examination of the lexical specificities of second messages, we propose a seven categories typology and use it to annotate two reference samples (English and French) of 100 threads each. Finally, we analyse and compare the performance of human annotators (who reach a reasonable global efficiency) and instruction-tuned LLMs (which encounter important difficulties with several categories).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWikis in Education and Collaboration · Cancer-related gene regulation · Natural Language Processing Techniques