# Detecting statistical interactions in immune receptor data: a comparative study

**Authors:** Thomas Minotto, Ingrid Hobæk Haff, Enrico Riccardi, Geir K. Sandve

PMC · DOI: 10.1080/02664763.2025.2533483 · 2025-07-27

## TL;DR

This paper compares different methods for detecting amino acid interactions in immune receptor data, showing that some machine learning techniques can identify interactions with high accuracy.

## Contribution

The study introduces a comparative evaluation of methods for detecting statistical interactions in immune receptor data, including their performance and efficiency.

## Key findings

- Pairwise interactions were detected from 1000 sequences with optimal performance at a 20% implantation rate.
- Higher-order interactions were best detected using logic regression and random forest methods.
- Neural networks had significantly faster running times compared to other methods.

## Abstract

Statistical interactions are part of numerous data generating processes and several methods have been developed to detect them. We here study immune receptors binding to antigens, where advanced machine learning techniques have proved useful for binding prediction, suggesting significant intra amino acid chain interactions. We reviewed detection methods based on logistic lasso, logic regression, random forests and neural networks. We compared detection performance in simulated immune data, and how it is affected by the order of interactions, their strength related to the main effects, their frequency of occurrence and the size of the data. Interactions were implanted as motifs of amino acids that determined the binding status of sequences through a logistic regression model. Results show that pairwise interactions were retrieved from just 1000 sequences in the dataset, and optimal detection happened for an implantation rate of around 20 percent. For higher-order interactions, the best performance was obtained by logic regression and random forest based methods. The running time for the neural network-based method was several orders of magnitude lower, followed by the lasso-based methods. We applied the methods on an experimental dataset and identified several pairwise interactions as well as a three-way interaction, enhancing the accuracy of prediction models.

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12981274/full.md

---
Source: https://tomesphere.com/paper/PMC12981274