Interpretable DNA Sequence Classification via Dynamic Feature Generation in Decision Trees

Nicolas Huynh; Krzysztof Kacprzyk; Ryan Sheridan; David Bentley; Mihaela van der Schaar

arXiv:2604.12060·cs.LG·April 15, 2026

Interpretable DNA Sequence Classification via Dynamic Feature Generation in Decision Trees

Nicolas Huynh, Krzysztof Kacprzyk, Ryan Sheridan, David Bentley, Mihaela van der Schaar

PDF

TL;DR

DEFT is a new decision tree framework that uses language models to generate interpretable, high-level DNA features, improving predictive accuracy and interpretability in genomic analysis.

Contribution

DEFT introduces adaptive, biologically-informed feature generation during decision tree construction using language models, enhancing interpretability and performance.

Findings

01

DEFT discovers human-interpretable sequence features.

02

DEFT achieves high predictive accuracy across genomic tasks.

03

DEFT reduces tree depth compared to traditional methods.

Abstract

The analysis of DNA sequences has become critical in numerous fields, from evolutionary biology to understanding gene regulation and disease mechanisms. While deep neural networks can achieve remarkable predictive performance, they typically operate as black boxes. Contrasting these black boxes, axis-aligned decision trees offer a promising direction for interpretable DNA sequence analysis, yet they suffer from a fundamental limitation: considering individual raw features in isolation at each split limits their expressivity, which results in prohibitive tree depths that hinder both interpretability and generalization performance. We address this challenge by introducing DEFT, a novel framework that adaptively generates high-level sequence features during tree construction. DEFT leverages large language models to propose biologically-informed features tailored to the local sequence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.