The FIX Benchmark: Extracting Features Interpretable to eXperts
Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle Ungar, Eric Wong

TL;DR
The paper introduces FIX, a benchmark and scoring method to evaluate how well automatically extracted features align with expert knowledge across various domains and data types.
Contribution
It proposes FIXScore, a new measure for assessing feature interpretability aligned with expert understanding, and demonstrates the limitations of existing explanation methods.
Findings
Popular explanation methods show poor alignment with expert knowledge.
FIXScore effectively measures the interpretability of feature collections.
The benchmark spans multiple domains and data modalities.
Abstract
Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language, and time series data modalities. With FIXScore, we find that popular feature-based…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The paper presents a novel benchmark to measure the alignment of machine-extracted features with expert knowledge, which is valuable for guiding future research on general-purpose, automated expert feature extraction. - The paper employs datasets from various domains, including tabular, text, and vision data, including tabular, text and vision data, and evaluates various feature extraction methods, including both domain-specific and domain-agnostic techniques.
- Although the paper uses five datasets for experimentation, it lacks a clear rationale for selecting these specific datasets. - The structure of the paper could be improved - the results and discussions of the results are very brief in the main paper and lack of interpretability. The paper could consider condense the dataset introduction part and add more discussions on the results to better demonstrate the insights. - There is an inconsistency in line 478, where the paper mentions “three segme
- the problem the paper tries to address is important. - the datasets cover different domains
- The formulation of the ExpertAlign metric for various datasets appears somewhat ad hoc, lacking clear justification for the chosen metrics. Why were these particular metrics selected over other potential designs, and how do they relate to Formula 2? - If the proposed metrics were part of a broader methodological paper as a way to evaluate the primary method, they would likely be suitable. However, as a standalone contribution, they seem insufficient.
1. The proposed framework provides an important benchmark encompassing six diverse real-world settings and three different data modalities and unifies them using the introduced expert alignment measure FixScore. 2. The authors provide a very detailed implementation detail of how they calculate the scores for each dataset in the Appendix. 3. The authors raise an important problem of grounding feature attribution in explainable artificial intelligence to knowledge from domain experts.
1. *"We propose FIXScore, a unified expert alignment measure With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods to better identify features interpretable to experts."* -- Are the authors proposing a call to the community to develop new feature attribution explanation methods? If yes, how does FixScore or expert features help in this regard? Most classification models are trained on certa
1. The evaluation of feature-based explanations with respect to interpretability is an important topic, where human-annotated ground-truth explanations specifically tailored to a given task and domain are an important tool to evaluate approaches. 2. The proposed datasets cover multiple domains, and the proposal of a single score for each task is good for comparing methods. The applications seem interesting, although I cannot judge on the quality of any of the proposed expert features. 3. The pap
1. **Lack of popular feature attribution methods:** The paper proposes a benchmark that has the potential to improve explanations based on groups of features. However, the benchmarking of methods is significantly under-developed. The paper claims (e.g. in the abstract) that feature-based explanations fall short in identifying the given "expert features". This claim is not sufficiently covered by the experiments. The authors do barely evaluate feature-based explanations. In the experiments, from
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
