Enhancing Content-And-Structure Information Retrieval using a Native XML Database
Jovan Pehcevski (RMIT), James A. Thom (RMIT), Anne-Marie Vercoustre

TL;DR
This paper presents a hybrid XML retrieval system that combines full-text search with native XML database querying, significantly improving content-and-structure retrieval effectiveness by utilizing structural relationships and a novel ranking model.
Contribution
The paper introduces a hybrid retrieval approach and a new ranking model based on structural relationships, enhancing native XML database retrieval effectiveness for content-and-structure tasks.
Findings
Hybrid system outperforms Zettair by 1.8 times
Hybrid system outperforms eXist by 3 times
Structural relationship-based ranking improves retrieval accuracy
Abstract
Three approaches to content-and-structure XML retrieval are analysed in this paper: first by using Zettair, a full-text information retrieval system; second by using eXist, a native XML database, and third by using a hybrid XML retrieval system that uses eXist to produce the final answers from likely relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics can be classified in two categories: the first retrieving full articles as final answers, and the second retrieving more specific elements within articles as final answers. We show that for both topic categories our initial hybrid system improves the retrieval effectiveness of a native XML database. For ranking the final answer elements, we propose and evaluate a novel retrieval model that utilises the structural relationships between the answer elements of a native XML database and retrieves Coherent Retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Management and Algorithms
