Hybrid XML Retrieval: Combining Information Retrieval and a Native XML   Database

Jovan Pehcevski (RMIT); James A. Thom (RMIT); Anne-Marie Vercoustre

arXiv:cs/0507070·cs.IR·May 23, 2007

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Jovan Pehcevski (RMIT), James A. Thom (RMIT), Anne-Marie Vercoustre

PDF

TL;DR

This paper presents a hybrid XML retrieval system that combines full-text search and native XML database techniques, significantly improving retrieval effectiveness over individual systems in the INEX 2003 benchmark.

Contribution

The paper introduces a novel hybrid retrieval approach that dynamically selects retrieval units called 'Coherent Retrieval Elements' for improved XML document retrieval.

Findings

01

Hybrid system outperforms individual systems in effectiveness.

02

Dynamic retrieval unit selection enhances retrieval robustness.

03

System shows significant improvements on INEX 2003 data.

Abstract

This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments ("General" and "Specific") and two categories of topics ("Broad" and "Narrow"). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call "Coherent Retrieval Elements". The results of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.