PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling
Xiaoxue Zang, Lijuan Liu, Maria Wang, Yang Song, Hao Zhang, Jindong, Chen

TL;DR
PhotoChat is a new dataset capturing human-human dialogues with shared photos, enabling research on image-text modeling through photo-sharing intent prediction and photo retrieval tasks, with baseline results provided.
Contribution
The paper introduces the first dataset focusing on photo sharing in online messaging, along with two novel tasks and baseline models for image-text joint modeling.
Findings
Photo retrieval model achieves 10.4% recall@1
Photo intent prediction model achieves 58.1% F1 score
Dataset presents challenging real-world photo sharing scenarios
Abstract
We present a new human-human dialogue dataset - PhotoChat, the first dataset that casts light on the photo sharing behavior in onlin emessaging. PhotoChat contains 12k dialogues, each of which is paired with a user photo that is shared during the conversation. Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context. In addition, for both tasks, we provide baseline models using the state-of-the-art models and report their benchmark performances. The best image retrieval model achieves 10.4% recall@1 (out of 1000 candidates) and the best photo intent prediction model achieves 58.1% F1 score, indicating that the dataset presents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
