FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes
Christodoulos Constantinides, Dhaval Patel, Shuxin Lin, Claudio Guerrero, Sunil Dagajirao Patil, Jayant Kalagnanam

TL;DR
FailureSensorIQ introduces a multi-choice QA benchmark to evaluate large language models' ability to understand and reason about sensor data and failure modes in industrial settings, revealing strengths and weaknesses of current models.
Contribution
This work presents a novel MCQA benchmark, FailureSensorIQ, for assessing LLMs' reasoning on industrial sensor data and failure modes, along with analysis tools and a feature selection pipeline.
Findings
Closed-source models like GPT-4 approach expert-level performance.
Models show performance drops when faced with perturbations and distractions.
Significant knowledge gaps and fragility in current LLM reasoning capabilities.
Abstract
We introduce FailureSensorIQ, a novel Multi-Choice Question-Answering (MCQA) benchmarking system designed to assess the ability of Large Language Models (LLMs) to reason and understand complex, domain-specific scenarios in Industry 4.0. Unlike traditional QA benchmarks, our system focuses on multiple aspects of reasoning through failure modes, sensor data, and the relationships between them across various industrial assets. Through this work, we envision a paradigm shift where modeling decisions are not only data-driven using statistical tools like correlation analysis and significance tests, but also domain-driven by specialized LLMs which can reason about the key contributors and useful patterns that can be captured with feature engineering. We evaluate the Industrial knowledge of over a dozen LLMs-including GPT-4, Llama, and Mistral-on FailureSensorIQ from different lens using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Machine Learning in Materials Science
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · GPT-4 · Feature Selection
