Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation

Julian Oestreich; Maximilian Bley; Frank Binder; Lydia M\"uller; Maksym Sydorenko; Andr\'e Alcalde

arXiv:2603.23047·cs.CL·March 25, 2026

Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation

Julian Oestreich, Maximilian Bley, Frank Binder, Lydia M\"uller, Maksym Sydorenko, Andr\'e Alcalde

PDF

Open Access

TL;DR

This paper evaluates RAG fine-tuning for long-form text generation in electronic design automation, introducing a new evaluation pipeline and metric that better capture factual accuracy and internalized knowledge.

Contribution

It presents TriFEX, a triple-based human-validated evaluation pipeline, and Parametric Knowledge Precision (PKP), a metric isolating internalized knowledge, revealing limitations of existing metrics.

Findings

01

ROUGE and BERTScore fail to detect factual differences.

02

PKP effectively measures internalized knowledge accuracy.

03

Smaller models outperform larger baselines on key metrics.

Abstract

Retrieval-Augmented Generation (RAG) fine-tuning has shown substantial improvements over vanilla RAG, yet most studies target document question answering and often rely on standard NLP metrics that can obscure factual differences. We evaluate RAG fine-tuning for long-form text generation in electronic design automation, adapting a 7B model under five context augmentation strategies with varying retrieval conditions. We introduce TriFEX, a human-validated, triple-based evaluation pipeline that attributes generated claims to their origin-user query, context and reference-and propose Parametric Knowledge Precision (PKP), which isolates internalized knowledge by filtering out claims leaked in the prompt. We show that ROUGE and BERTScore fail to detect factual differences that our triple-based evaluation reveals. Additionally, we demonstrate that an existing metric for knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Software Engineering Research