Guidelines for the Creation of an Annotated Corpus
Bahdja Boudoua, Nadia Guiffant, Mathieu Roche, Maguelonne Teisseire, Annelise Tran

TL;DR
This paper presents a comprehensive methodology for creating, annotating, and sharing textual corpora, including guidelines, storage, and valorization strategies, supported by definitions and illustrative examples.
Contribution
It introduces a generic framework for developing annotation guidelines and corpora, integrating methodological, storage, sharing, and valorization aspects.
Findings
Provides a detailed methodology for corpus annotation
Includes definitions and examples to clarify each step
Addresses data sharing and valorization strategies
Abstract
This document, based on feedback from UMR TETIS members and the scientific literature, provides a generic methodology for creating annotation guidelines and annotated textual datasets (corpora). It covers methodological aspects, as well as storage, sharing, and valorization of the data. It includes definitions and examples to clearly illustrate each step of the process, thus providing a comprehensive framework to support the creation and use of corpora in various research contexts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Computational and Text Analysis Methods · Text Readability and Simplification
