iGAiVA: Integrated Generative AI and Visual Analytics in a Machine   Learning Workflow for Text Classification

Yuanzhe Jin; Adrian Carrasco-Revilla; and Min Chen

arXiv:2409.15848·cs.LG·March 28, 2025

iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification

Yuanzhe Jin, Adrian Carrasco-Revilla, and Min Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces iGAiVA, a software tool that combines visual analytics and generative AI to improve text classification models by guiding targeted synthetic data generation based on identified data deficiencies.

Contribution

The paper presents iGAiVA, a novel integrated tool that combines visual analytics with generative AI to enhance data synthesis and model accuracy in text classification workflows.

Findings

01

Targeted data synthesis improves model accuracy.

02

Visual analytics helps identify data deficiencies.

03

Integrated tool streamlines ML workflow for text classification.

Abstract

In developing machine learning (ML) models for text classification, one common challenge is that the collected data is often not ideally distributed, especially when new classes are introduced in response to changes of data and tasks. In this paper, we present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models. As VA enables model developers to identify data-related deficiency, data synthesis can be targeted to address such deficiency. We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis in improving model accuracy. In addition, we present a software tool, iGAiVA, which maps four groups of ML tasks into four VA views, integrating generative AI and VA into an ML workflow for developing and improving text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mattjin19/rbf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics

MethodsVisual Analytics