A Tool for Automatically Cataloguing and Selecting Pre-Trained Models and Datasets for Software Engineering

Alexandra Gonz\'alez; Oscar Cerezo; Xavier Franch; Silverio Mart\'inez-Fern\'andez

arXiv:2601.13460·cs.SE·January 21, 2026

A Tool for Automatically Cataloguing and Selecting Pre-Trained Models and Datasets for Software Engineering

Alexandra Gonz\'alez, Oscar Cerezo, Xavier Franch, Silverio Mart\'inez-Fern\'andez

PDF

Open Access

TL;DR

MLAssetSelection is a web tool that automates the discovery, ranking, and updating of machine learning models and datasets tailored for software engineering tasks, simplifying asset selection for engineers.

Contribution

It introduces a novel web application that automatically extracts, ranks, and updates SE-specific ML assets, enhancing efficiency and personalization in asset selection.

Findings

01

Provides real-time updates of SE assets

02

Enables requirement-based model and dataset selection

03

Offers a configurable leaderboard for benchmarking

Abstract

The rapid growth of machine learning assets has made it increasingly difficult for software engineers to identify models and datasets that match their specific needs. Browsing large registries, such as Hugging Face, is time-consuming, error-prone, and rarely tailored to Software Engineering (SE) tasks. We present MLAssetSelection, a web application that automatically extracts SE assets and supports four key functionalities: (i) a configurable leaderboard for ranking models across multiple benchmarks and metrics; (ii) requirements-based selection of models and datasets; (iii) real-time automated updates through scheduled jobs that keep asset information current; and (iv) user-centric features including login, personalized asset lists, and configurable alert notifications. A demonstration video is available at https://youtu.be/t6CJ6P9asV4.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Machine Learning and Data Classification