Detecting Multi-Parameter Constraint Inconsistencies in Python Data Science Libraries
Xiufeng Xu, Fuman Xie, Chenguang Zhu, Guangdong Bai, Sarfraz Khurshid,, Yi Li

TL;DR
This paper introduces MPDetector, a tool that detects inconsistencies between code and documentation in data science libraries, focusing on multi-parameter constraints, using symbolic execution and large language models, achieving high precision.
Contribution
The paper presents MPDetector, a novel approach combining symbolic execution and LLMs with fuzzy logic to identify multi-parameter constraint inconsistencies in APIs.
Findings
Achieves 92.8% precision in detecting inconsistencies.
Successfully identified 14 issues, with 11 confirmed by developers.
Constructed datasets from four popular data science libraries.
Abstract
Modern AI- and Data-intensive software systems rely heavily on data science and machine learning libraries that provide essential algorithmic implementations and computational frameworks. These libraries expose complex APIs whose correct usage has to follow constraints among multiple interdependent parameters. Developers using these APIs are expected to learn about the constraints through the provided documentations and any discrepancy may lead to unexpected behaviors. However, maintaining correct and consistent multi-parameter constraints in API documentations remains a significant challenge for API compatibility and reliability. To address this challenge, we propose MPDetector, for detecting inconsistencies between code and documentation, specifically focusing on multi-parameter constraints. MPDetector identifies these constraints at the code level by exploring execution paths through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Computational Physics and Python Applications · Time Series Analysis and Forecasting
