Field Studies with Multimedia Big Data: Opportunities and Challenges (Extended Version)
Mario Michael Krell, Julia Bernd, Yifan Li, Daniel Ma, Jaeyoung Choi,, Michael Ellsworth, Damian Borth, Gerald Friedland

TL;DR
This paper presents a new open-source framework for conducting field studies using large-scale multimedia big data, specifically leveraging the Yahoo Flickr Creative Commons dataset, to support scientific research across disciplines.
Contribution
It introduces a comprehensive, iterative framework and a user-friendly interface for extracting and refining targeted multimedia subcorpora from big data for scientific field studies.
Findings
Framework enables large-scale, targeted data extraction
Supports iterative refinement with statistical summaries
Facilitates cross-disciplinary multimedia research
Abstract
Social multimedia users are increasingly sharing all kinds of data about the world. They do this for their own reasons, not to provide data for field studies-but the trend presents a great opportunity for scientists. The Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset comprises 99 million images and nearly 800 thousand videos from Flickr, all shared under Creative Commons licenses. To enable scientists to leverage these media records for field studies, we propose a new framework that extracts targeted subcorpora from the YFCC100M, in a format usable by researchers who are not experts in big data retrieval and processing. This paper discusses a number of examples from the literature-as well as some entirely new ideas-of natural and social science field studies that could be piloted, supplemented, replicated, or conducted using YFCC100M data. These examples illustrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Music and Audio Processing
