Clustering Without Knowing How To: Application and Evaluation

Daniil Likhobaba; Daniil Fedulov; Dmitry Ustalov

arXiv:2209.10267·cs.HC·June 5, 2023

Clustering Without Knowing How To: Application and Evaluation

Daniil Likhobaba, Daniil Fedulov, Dmitry Ustalov

PDF

Open Access 1 Repo

TL;DR

This paper presents a crowdsourcing-based system for image clustering that achieves meaningful results without machine learning, demonstrated on fashion image datasets.

Contribution

It introduces a novel crowdsourcing approach for data clustering without predefined algorithms, validated through experiments on real-world image datasets.

Findings

01

Meaningful clusters achieved without machine learning.

02

Crowdsourcing effectively solves under-specified clustering problems.

03

System code is publicly available for reproducibility.

Abstract

Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

toloka/crowdclustering
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications