The ALVIS Format for Linguistically Annotated Documents
Adeline Nazarenko (LIPN), Erick Alphonse (LIPN), Julien Derivi\`ere, (LIPN), Thierry Hamon (LIPN), Guillaume Vauvert (LIPN), Davy Weissenbacher, (LIPN)

TL;DR
The paper introduces the ALVIS annotation format, a standardized XML-based stand-off annotation system for linguistically annotated documents, exemplified on biological domain texts for improved topic-specific search engine indexing.
Contribution
It presents a novel, standardized annotation format tailored for large document collections, emphasizing stand-off annotations and XML encoding for enhanced search engine integration.
Findings
Effective indexing of biological documents demonstrated
Stand-off annotations facilitate flexible linguistic analysis
XML encoding supports interoperability and scalability
Abstract
The paper describes the ALVIS annotation format designed for the indexing of large collections of documents in topic-specific search engines. This paper is exemplified on the biological domain and on MedLine abstracts, as developing a specialized search engine for biologists is one of the ALVIS case studies. The ALVIS principle for linguistic annotations is based on existing works and standard propositions. We made the choice of stand-off annotations rather than inserted mark-up. Annotations are encoded as XML elements which form the linguistic subsection of the document record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · Semantic Web and Ontologies
