Meet Your New Client: Writing Reports for AI -- Benchmarking Information Loss in Market Research Deliverables
Paul F. Simmering, Benedikt Schulz, Oliver Tabino, Georg Wittenburg

TL;DR
This paper benchmarks information loss in market research reports when used by retrieval-augmented generation systems, highlighting the need for AI-native formats to preserve complex data like charts and diagrams.
Contribution
It introduces an end-to-end benchmark for evaluating information loss in PDF and PPTX reports converted to Markdown for AI question-answering.
Findings
Text extraction is reliable from reports.
Significant information loss occurs in complex objects like charts.
AI-native report formats are needed to preserve insights.
Abstract
As organizations adopt retrieval-augmented generation (RAG) for their knowledge management systems (KMS), traditional market research deliverables face new functional demands. While PDF reports and slides have long served human readers, they are now also "read" by AI systems to answer user questions. To future-proof reports being delivered today, this study evaluates information loss during their ingestion into RAG systems. It compares how well PDF and PowerPoint (PPTX) documents converted to Markdown can be used by an LLM to answer factual questions in an end-to-end benchmark. Findings show that while text is reliably extracted, significant information is lost from complex objects like charts and diagrams. This suggests a need for specialized, AI-native deliverables to ensure research insights are not lost in translation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
