A Static Analysis of Popular C Packages in Linux
Jukka Ruohonen, Mubashrah Saddiqa, Krzysztof Sierszecki

TL;DR
This paper empirically analyzes 3,538 C-based Linux packages using GCC's static analyzer, revealing common security issues, warning distribution, and implications for software quality and security practices.
Contribution
It provides the first large-scale empirical study of Linux packages using GCC's static analyzer, highlighting prevalent issues and warning patterns.
Findings
Uninitialized variables and NULL pointer issues are most common.
Warnings follow a long-tailed distribution across packages.
Most packages (89%) have no warnings.
Abstract
Static analysis is a classical technique for improving software security and software quality in general. Fairly recently, a new static analyzer was implemented in the GNU Compiler Collection (GCC). The present paper uses the GCC's analyzer to empirically examine popular Linux packages. The dataset used is based on those packages in the Gentoo Linux distribution that are either written in C or contain C code. In total, 3,538 such packages are covered. According to the results, uninitialized variables and NULL pointer dereference issues are the most common problems according to the analyzer. Classical memory management issues are relatively rare. The warnings also follow a long-tailed probability distribution across the packages; a few packages are highly warning-prone, whereas no warnings are present for as much as 89% of the packages. Furthermore, the warnings do not vary across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
