Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation
Julian Oestreich, Maximilian Bley, Frank Binder, Lydia M\"uller, Maksym Sydorenko, Andr\'e Alcalde

TL;DR
This paper evaluates RAG fine-tuning for long-form text generation in electronic design automation, introducing a new evaluation pipeline and metric that better capture factual accuracy and internalized knowledge.
Contribution
It presents TriFEX, a triple-based human-validated evaluation pipeline, and Parametric Knowledge Precision (PKP), a metric isolating internalized knowledge, revealing limitations of existing metrics.
Findings
ROUGE and BERTScore fail to detect factual differences.
PKP effectively measures internalized knowledge accuracy.
Smaller models outperform larger baselines on key metrics.
Abstract
Retrieval-Augmented Generation (RAG) fine-tuning has shown substantial improvements over vanilla RAG, yet most studies target document question answering and often rely on standard NLP metrics that can obscure factual differences. We evaluate RAG fine-tuning for long-form text generation in electronic design automation, adapting a 7B model under five context augmentation strategies with varying retrieval conditions. We introduce TriFEX, a human-validated, triple-based evaluation pipeline that attributes generated claims to their origin-user query, context and reference-and propose Parametric Knowledge Precision (PKP), which isolates internalized knowledge by filtering out claims leaked in the prompt. We show that ROUGE and BERTScore fail to detect factual differences that our triple-based evaluation reveals. Additionally, we demonstrate that an existing metric for knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Software Engineering Research
