Pre-processing of Domain Ontology Graph Generation System in Punjabi

Rajveer Kaur; Saurabh Sharma

arXiv:1411.5796·cs.CL·November 24, 2014

Pre-processing of Domain Ontology Graph Generation System in Punjabi

Rajveer Kaur, Saurabh Sharma

PDF

TL;DR

This paper details the pre-processing steps for generating ontology graphs from Punjabi text, including text normalization, symbol removal, duplicate elimination, and term extraction using dictionaries.

Contribution

It introduces a specific pre-processing framework tailored for Punjabi text documents to facilitate ontology graph generation.

Findings

01

Effective removal of special symbols and stop words

02

Improved accuracy in term extraction from Punjabi texts

03

Streamlined pre-processing pipeline for Punjabi ontology graph generation

Abstract

This paper describes pre-processing phase of ontology graph generation system from Punjabi text documents of different domains. This research paper focuses on pre-processing of Punjabi text documents. Pre-processing is structured representation of the input text. Pre-processing of ontology graph generation includes allowing input restrictions to the text, removal of special symbols and punctuation marks, removal of duplicate terms, removal of stop words, extract terms by matching input terms with dictionary and gazetteer lists terms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.