Dataset Structural Index: Leveraging a machine's perspective towards visual data
Dishant Parikh

TL;DR
This paper introduces the Dataset Structural Index (DSI), a novel approach to understanding visual datasets from a machine's perspective, enabling data optimization and better model selection.
Contribution
The paper presents the Dataset Structural Index (DSI), including the Variety contribution ratio and Similarity matrix, as new meta-values to analyze and optimize visual datasets.
Findings
DSI can identify dataset diversity and similarity effectively.
Using DSI, models achieve similar accuracy with less data.
DSI aids in selecting suitable architectures based on dataset structure.
Abstract
With advances in vision and perception architectures, we have realized that working with data is equally crucial, if not more, than the algorithms. Till today, we have trained machines based on our knowledge and perspective of the world. The entire concept of Dataset Structural Index(DSI) revolves around understanding a machine`s perspective of the dataset. With DSI, I show two meta values with which we can get more information over a visual dataset and use it to optimize data, create better architectures, and have an ability to guess which model would work best. These two values are the Variety contribution ratio and Similarity matrix. In the paper, I show many applications of DSI, one of which is how the same level of accuracy can be achieved with the same model architectures trained over less amount of data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Remote-Sensing Image Classification · Data Visualization and Analytics
