STELLA: Self-Reflective Terminology-Aware Framework for Building an Aerospace Information Retrieval Benchmark
Bongmin Kim

TL;DR
The paper introduces STELLA, a domain-specific IR benchmark for aerospace documents, enabling evaluation of lexical and semantic retrieval capabilities of models with a systematic pipeline and diverse query types.
Contribution
It presents a novel aerospace-specific IR benchmark with a comprehensive pipeline for dataset creation and diverse query types for disentangled evaluation of retrieval models.
Findings
Large decoder-based models excel in semantic understanding.
Lexical matching methods like BM25 are highly effective for exact term retrieval.
The benchmark facilitates reliable evaluation and improvement of IR models in aerospace.
Abstract
Tasks in the aerospace industry heavily rely on searching and reusing large volumes of technical documents, yet there is no public information retrieval (IR) benchmark that reflects the terminology- and query-intent characteristics of this domain. To address this gap, this paper proposes the STELLA (Self-Reflective TErminoLogy-Aware Framework for BuiLding an Aerospace Information Retrieval Benchmark) framework. Using this framework, we introduce the STELLA benchmark, an aerospace-specific IR evaluation set constructed from NASA Technical Reports Server (NTRS) documents via a systematic pipeline that comprises document layout detection, passage chunking, terminology dictionary construction, synthetic query generation, and cross-lingual extension. The framework generates two types of queries: the Terminology Concordant Query (TCQ), which includes the terminology verbatim to evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Graph Neural Networks
