# Trace-LogVector-Based Relational Retrieval for Conversational System Log Analysis

**Authors:** Sun-Chul Park, Young-Han Kim

PMC · DOI: 10.3390/s26061806 · Sensors (Basel, Switzerland) · 2026-03-12

## TL;DR

A new log representation called Trace-LogVector improves retrieval accuracy in system log analysis by preserving execution contexts and entity relationships.

## Contribution

Trace-LogVector (TLV) introduces a relational log representation that outperforms single-chunk methods in retrieval-augmented generation for system logs.

## Key findings

- Trace-LogVector (TLV) achieves a Hit@5 of 1.000 and an MRR@5 of 0.900 in system log retrieval.
- Relational log representations improve retrieval performance by preserving execution flow and entity relationships.
- Multi-chunk TLV outperforms single-chunk representations across all evaluation queries.

## Abstract

What are the main findings?
A relational log representation, Trace-LogVector (TLV), consistently improves retrieval accuracy in RAG-based system log analysis compared to single-chunk representations.CARD-based multi-chunk construction preserves execution contexts and entity relationships, leading to substantial gains in Hit@5 and MRR@5 metrics.

A relational log representation, Trace-LogVector (TLV), consistently improves retrieval accuracy in RAG-based system log analysis compared to single-chunk representations.

CARD-based multi-chunk construction preserves execution contexts and entity relationships, leading to substantial gains in Hit@5 and MRR@5 metrics.

What are the implications of the main findings?
Retrieval performance in conversational system log analysis is strongly influenced by the granularity and structure of log representations, not solely by the embedding model or retrieval algorithm.Relational log representations provide an effective design principle for applying RAG to sensor-driven and cloud-based system analysis tasks.

Retrieval performance in conversational system log analysis is strongly influenced by the granularity and structure of log representations, not solely by the embedding model or retrieval algorithm.

Relational log representations provide an effective design principle for applying RAG to sensor-driven and cloud-based system analysis tasks.

System logs generated in IoT-based and sensor-driven cloud environments encode execution traces and complex relationships among services, functions, and data stores. In many IoT deployments, telemetry is pre-processed at the edge and then integrated into backend services (e.g., application servers and databases) for analytics and operations. During this integration, service executions record relational dependencies (e.g., function-to-data-store interactions) as operational logs (or aggregated statistics), which constitute key evidence for operating sensor-driven services. We therefore evaluate TLV using publicly reproducible backend execution logs as a representative backend model and discuss the generality and limitations of this choice. However, most existing retrieval-augmented generation (RAG) approaches remain document-centric, representing logs as flat textual chunks that fail to preserve execution flow and entity relationships, which are critical for diagnosing complex service execution pipelines in sensor-driven cloud backends. In this study, we propose Trace-LogVector (TLV), a relational log representation that transforms system logs into trace-level retrieval units while explicitly preserving execution order and entity interactions. TLV is constructed based on the Chunk as Relational Data (CARD) design principle, which represents execution flows using entity-centric multi-chunk structures rather than single aggregated text chunks. To evaluate the impact of relational log representation, we conduct controlled experiments comparing single-chunk and CARD-based multi-chunk TLV under identical embedding and retrieval settings. Retrieval performance is quantitatively assessed using Hit@5 and Mean Reciprocal Rank at 5 (MRR@5). Experimental results show that the proposed multi-chunk TLV achieves a Hit@5 of 1.000 and an MRR@5 of 0.900, consistently outperforming the single-chunk baseline across all evaluation queries. These findings demonstrate that preserving execution contexts and entity relationships as relational retrieval units is a key factor in improving RAG-based system log analysis for monitoring and diagnosing large-scale sensor networks and cloud systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030721/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030721/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030721/full.md

---
Source: https://tomesphere.com/paper/PMC13030721