Semi-Automatic Indexing of Multilingual Documents

Ulrich Schiel; Ianna M. Sodre Ferreira de Souza; Edberto Ferneda

arXiv:cs/9902022·cs.DL·May 23, 2007

Semi-Automatic Indexing of Multilingual Documents

Ulrich Schiel, Ianna M. Sodre Ferreira de Souza, Edberto Ferneda

PDF

Open Access

TL;DR

This paper introduces a semi-automatic method for indexing multilingual electronic documents, utilizing dictionaries and user input to resolve ambiguities, build a multilingual thesaurus, and improve information retrieval across languages.

Contribution

It presents a novel semi-automatic approach for creating and updating a multilingual thesaurus to enhance document indexing and retrieval in digital libraries.

Findings

01

Effective handling of multilingual document indexing

02

Incremental updating of the multilingual thesaurus

03

Improved retrieval accuracy across languages

Abstract

With the growing significance of digital libraries and the Internet, more and more electronic texts become accessible to a wide and geographically disperse public. This requires adequate tools to facilitate indexing, storage, and retrieval of documents written in different languages. We present a method for semi-automatic indexing of electronic documents and construction of a multilingual thesaurus, which can be used for query formulation and information retrieval. We use special dictionaries and user interaction in order to solve ambiguities and find adequate canonical terms in the language and adequate abstract language-independent terms. The abstract thesaurus is updated incrementally by new indexed documents and is used to search document concerning terms in a query to the document base.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Text Analysis Techniques · Natural Language Processing Techniques