Creating a contemporary corpus of similes in Serbian by using natural   language processing

Nikola Milosevic; Goran Nenadic

arXiv:1811.10422·cs.CL·November 27, 2018·1 cites

Creating a contemporary corpus of similes in Serbian by using natural language processing

Nikola Milosevic, Goran Nenadic

PDF

Open Access

TL;DR

This paper develops a semi-automated method using text mining and machine learning to collect Serbian similes from the web, expanding an existing corpus with crowdsourcing, resulting in 787 unique similes.

Contribution

It introduces a novel methodology combining text mining, machine learning, and crowdsourcing for building a comprehensive Serbian simile corpus.

Findings

01

Expanded the Serbian simile corpus to 787 entries.

02

Demonstrated effectiveness of semi-automated collection methods.

03

Integrated crowdsourcing to enhance data collection.

Abstract

Simile is a figure of speech that compares two things through the use of connection words, but where comparison is not intended to be taken literally. They are often used in everyday communication, but they are also a part of linguistic cultural heritage. In this paper we present a methodology for semi-automated collection of similes from the World Wide Web using text mining and machine learning techniques. We expanded an existing corpus by collecting 442 similes from the internet and adding them to the existing corpus collected by Vuk Stefanovic Karadzic that contained 333 similes. We, also, introduce crowdsourcing to the collection of figures of speech, which helped us to build corpus containing 787 unique similes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Lexicography and Language Studies · linguistics and terminology studies