GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs

Nitin Gupta; Pallav Koppisetti; Kausik Lakkaraju; Biplav Srivastava

arXiv:2508.16753·cs.CL·March 24, 2026

GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs

Nitin Gupta, Pallav Koppisetti, Kausik Lakkaraju, Biplav Srivastava

PDF

1 Video

TL;DR

GAICo is an open-source framework that standardizes and streamlines the evaluation of diverse, multimodal Generative AI outputs, enhancing reproducibility and development efficiency.

Contribution

It introduces a comprehensive, extensible Python library supporting multi-modal, reference-based metrics for standardized GenAI output evaluation.

Findings

01

GAICo has been downloaded over 16K times, indicating strong community adoption.

02

It enables detailed comparison and debugging of complex multi-modal AI systems.

03

Demonstrated utility through a case study on AI Travel Assistant pipelines.

Abstract

The rapid proliferation of Generative AI (GenAI) into diverse, high-stakes domains necessitates robust and reproducible evaluation methods. However, practitioners often resort to ad-hoc, non-standardized scripts, as common metrics are often unsuitable for specialized, structured outputs (e.g., automated plans, time-series) or holistic comparison across modalities (e.g., text, audio, and image). This fragmentation hinders comparability and slows AI system development. To address this challenge, we present GAICo (Generative AI Comparator): a deployed, open-source Python library that streamlines and standardizes GenAI output comparison. GAICo provides a unified, extensible framework supporting a comprehensive suite of reference-based metrics for unstructured text, specialized structured data formats, and multimedia (images, audio). Its architecture features a high-level API for rapid,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs· underline