Captioning Daily Activity Images in Early Childhood Education: Benchmark and Algorithm

Sixing Li; Zhibin Gu; Ziqi Zhang; Weiguo Pan; Bing Li; Ying Wang; and Hongzhe Liu

arXiv:2604.01941·cs.CV·April 3, 2026

Captioning Daily Activity Images in Early Childhood Education: Benchmark and Algorithm

Sixing Li, Zhibin Gu, Ziqi Zhang, Weiguo Pan, Bing Li, Ying Wang, and Hongzhe Liu

PDF

TL;DR

This paper introduces ECAC, a large-scale dataset for early childhood education image captioning, and proposes RSRS, a hybrid training framework, to improve fine-grained, professional object description in educational images.

Contribution

The paper presents ECAC, a new domain-specific benchmark dataset, and RSRS, a novel training method, enabling more accurate and professional image captioning in early childhood education scenarios.

Findings

01

ECAC contains 256,121 annotated images with expert captions.

02

RSRS improves training stability and fine-grained recognition performance.

03

The developed model achieves a TTS of 51.06, surpassing baselines.

Abstract

Image captioning for Early Childhood Education (ECE) is essential for automated activity understanding and educational assessment. However, existing methods face two key challenges. First, the lack of large-scale, domain-specific datasets limits the model's ability to capture fine-grained semantic concepts unique to ECE scenarios, resulting in generic and imprecise descriptions. Second, conventional training paradigms exhibit limitations in enhancing professional object description capability, as supervised learning tends to favor high-frequency expressions, while reinforcement learning may suffer from unstable optimization on difficult samples. To address these limitations, we introduce ECAC, a large-scale benchmark for ECE daily activity image captioning, comprising 256,121 real-world images annotated with expert-level captions and fine-grained labels. ECAC is further equipped with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.