Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study
Jelena Sarajli\'c, Gaurish Thakkar, Diego Alves, Nives Mikelic, Preradovi\'c

TL;DR
This study introduces a Croatian news corpus annotated for direct-speech extraction, co-reference resolution, and sentiment analysis, highlighting language-specific challenges and providing a resource for NLP tasks.
Contribution
It presents a new annotated Croatian news corpus focusing on quotations, co-reference, and sentiment, with analysis of language-specific differences from English.
Findings
Identified language-specific annotation challenges
Created a Croatian annotated corpus for NLP tasks
Analyzed differences between Croatian and English annotations
Abstract
This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these annotations is derived. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
