# CapTune: Adapting Non-Speech Captions With Anchored Generative Models

**Authors:** Jeremy Zhengqi Huang, Calu\~a de Lacerda Pataca, Liang-Yuan Wu, Dhruv Jain

arXiv: 2508.19971 · 2025-08-28

## TL;DR

CapTune is a system that personalizes non-speech video captions for DHH viewers, balancing creator intent and viewer preferences through customizable transformations, improving engagement and emotional connection.

## Contribution

We introduce CapTune, a novel system enabling personalized non-speech caption customization with safe transformation spaces based on concrete examples.

## Key findings

- Supports creative control for caption authors
- Enhances emotional engagement for DHH viewers
- Reveals trade-offs between information richness and cognitive load

## Abstract

Non-speech captions are essential to the video experience of deaf and hard of hearing (DHH) viewers, yet conventional approaches often overlook the diversity of their preferences. We present CapTune, a system that enables customization of non-speech captions based on DHH viewers' needs while preserving creator intent. CapTune allows caption authors to define safe transformation spaces using concrete examples and empowers viewers to personalize captions across four dimensions: level of detail, expressiveness, sound representation method, and genre alignment. Evaluations with seven caption creators and twelve DHH participants showed that CapTune supported creators' creative control while enhancing viewers' emotional engagement with content. Our findings also reveal trade-offs between information richness and cognitive load, tensions between interpretive and descriptive representations of sound, and the context-dependent nature of caption preferences.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.19971/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/2508.19971/full.md

## References

71 references — full list in the complete paper: https://tomesphere.com/paper/2508.19971/full.md

---
Source: https://tomesphere.com/paper/2508.19971