Linux Kernel Configurations at Scale: A Dataset for Performance and Evolution Analysis

Heraldo Borges; Juliana Alves Pereira; Djamel Eddine Khelladi; Mathieu Acher

arXiv:2505.07487·cs.SE·May 14, 2025

Linux Kernel Configurations at Scale: A Dataset for Performance and Evolution Analysis

Heraldo Borges, Juliana Alves Pereira, Djamel Eddine Khelladi, Mathieu Acher

PDF

1 Repo

TL;DR

This paper introduces LinuxData, a large-scale dataset of Linux kernel configurations across multiple versions, enabling advanced research in kernel configuration analysis, prediction, and evolution modeling.

Contribution

The paper provides the first comprehensive, publicly accessible dataset of Linux kernel configurations with detailed measurements, supporting machine learning and transfer learning research.

Findings

01

Dataset includes over 240,000 configurations from versions 4.13 to 5.8.

02

Enables research in feature selection and prediction models.

03

Facilitates reproducibility and new insights into kernel configuration evolution.

Abstract

Configuring the Linux kernel to meet specific requirements, such as binary size, is highly challenging due to its immense complexity-with over 15,000 interdependent options evolving rapidly across different versions. Although several studies have explored sampling strategies and machine learning methods to understand and predict the impact of configuration options, the literature still lacks a comprehensive and large-scale dataset encompassing multiple kernel versions along with detailed quantitative measurements. To bridge this gap, we introduce LinuxData, an accessible collection of kernel configurations spanning several kernel releases, specifically from versions 4.13 to 5.8. This dataset, gathered through automated tools and build processes, comprises over 240,000 kernel configurations systematically labeled with compilation outcomes and binary sizes. By providing detailed records…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

heraldoborges/tuxkconfig
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.