Clustering Without Knowing How To: Application and Evaluation
Daniil Likhobaba, Daniil Fedulov, Dmitry Ustalov

TL;DR
This paper presents a crowdsourcing-based system for image clustering that achieves meaningful results without machine learning, demonstrated on fashion image datasets.
Contribution
It introduces a novel crowdsourcing approach for data clustering without predefined algorithms, validated through experiments on real-world image datasets.
Findings
Meaningful clusters achieved without machine learning.
Crowdsourcing effectively solves under-specified clustering problems.
System code is publicly available for reproducibility.
Abstract
Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications
