Building Tamil Treebanks

Kengatharaiyer Sarveswaran

arXiv:2409.14657·cs.CL·September 24, 2024

Building Tamil Treebanks

Kengatharaiyer Sarveswaran

PDF

Open Access

TL;DR

This paper explores methods for creating Tamil treebanks, highlighting manual, grammar-based, and machine learning approaches, and discusses the challenges faced in developing these linguistic resources for NLP applications.

Contribution

It presents a comprehensive overview of three different approaches to building Tamil treebanks and discusses associated challenges and solutions.

Findings

01

Manual annotation ensures high-quality linguistic data.

02

Machine learning approaches enable large-scale automated annotation.

03

Challenges include data quality, linguistic complexity, and resource availability.

Abstract

Treebanks are important linguistic resources, which are structured and annotated corpora with rich linguistic annotations. These resources are used in Natural Language Processing (NLP) applications, supporting linguistic analyses, and are essential for training and evaluating various computational models. This paper discusses the creation of Tamil treebanks using three distinct approaches: manual annotation, computational grammars, and machine learning techniques. Manual annotation, though time-consuming and requiring linguistic expertise, ensures high-quality and rich syntactic and semantic information. Computational deep grammars, such as Lexical Functional Grammar (LFG), offer deep linguistic analyses but necessitate significant knowledge of the formalism. Machine learning approaches, utilising off-the-shelf frameworks and tools like Stanza, UDpipe, and UUParser, facilitate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Linguistics, Cultural Analysis · African history and culture analysis