As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in   Serbian

Nikola Milosevic; Goran Nenadic

arXiv:1605.06319·cs.CL·May 23, 2016

As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

Nikola Milosevic, Goran Nenadic

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semi-automated method for collecting contemporary Serbian similes from the web, expanding an existing corpus and exploring crowdsourcing for further curation.

Contribution

It presents a novel methodology combining text mining and crowdsourcing to build and update a corpus of modern Serbian similes.

Findings

01

Expanded the simile corpus from 333 to 779 expressions.

02

Demonstrated the effectiveness of text mining in extracting similes.

03

Explored crowdsourcing as a tool for simile curation.

Abstract

Similes are natural language expressions used to compare unlikely things, where the comparison is not taken literally. They are often used in everyday communication and are an important part of cultural heritage. Having an up-to-date corpus of similes is challenging, as they are constantly coined and/or adapted to the contemporary times. In this paper we present a methodology for semi-automated collection of similes from the world wide web using text mining techniques. We expanded an existing corpus of traditional similes (containing 333 similes) by collecting 446 additional expressions. We, also, explore how crowdsourcing can be used to extract and curate new similes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nikolamilosevic86/SerbianComparisonExtractor
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · Topic Modeling