SEMODS: A Validated Dataset of Open-Source Software Engineering Models

Alexandra Gonz\'alez; Xavier Franch; Silverio Mart\'inez-Fern\'andez

arXiv:2601.00635·cs.SE·January 5, 2026

SEMODS: A Validated Dataset of Open-Source Software Engineering Models

Alexandra Gonz\'alez, Xavier Franch, Silverio Mart\'inez-Fern\'andez

PDF

Open Access

TL;DR

SEMODS is a curated, validated dataset of over 3,400 open-source models tailored for Software Engineering tasks, enabling better discovery, benchmarking, and analysis of AI models in SE.

Contribution

This paper introduces SEMODS, the first comprehensive, validated dataset of SE models from Hugging Face, linking models to SE tasks with standardized evaluation data.

Findings

01

Dataset contains 3,427 models with task annotations

02

Links models to software development lifecycle activities

03

Supports multiple applications like benchmarking and model discovery

Abstract

Integrating Artificial Intelligence into Software Engineering (SE) requires having a curated collection of models suited to SE tasks. With millions of models hosted on Hugging Face (HF) and new ones continuously being created, it is infeasible to identify SE models without a dedicated catalogue. To address this gap, we present SEMODS: an SE-focused dataset of 3,427 models extracted from HF, combining automated collection with rigorous validation through manual annotation and large language model assistance. Our dataset links models to SE tasks and activities from the software development lifecycle, offering a standardized representation of their evaluation results, and supporting multiple applications such as data analysis, model discovery, benchmarking, and model adaptation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Model-Driven Software Engineering Techniques