# Multi-task feature integration and interactive active learning for scene image resizing

**Authors:** Ludan Shi, Xianhua Yan, Sen Wang

PMC · DOI: 10.1038/s41598-025-98917-w · 2025-05-19

## TL;DR

This paper introduces a new AI method for resizing complex images by combining visual features and human gaze patterns.

## Contribution

A novel approach combining multi-task feature integration and interactive active learning for adaptive image resizing.

## Key findings

- The proposed method outperforms five other retargeting techniques in user studies.
- It achieves 3% higher precision than the second-best visual recognizer.
- The method uses only 49.8% of the testing time of the second-best performer.

## Abstract

In the realm of artificial intelligence (AI), recomposing the semantic segments of intricate scenes is pivotal. This study attempts to seamlessly combine multi-channel perceptual visual features for the adaptive retargeting of images characterized by complex spatial configurations. The key of our approach is the formulation of an in-depth hierarchical model dedicated to the precise capture of human gaze dynamics. Utilizing the BING objectness metric, we swiftly and accurately acquire patches within scenes that hold semantic and visual significance by identifying objects and their components across varying scales. Subsequent to this, we introduce a multi-task feature selector for the dynamic integration of multi-channel features across disparate scene patches. To capture human perception in recognizing critical scenic patches, we introduce a strategy known as locality-preserved and interactive active learning (LIAL). This technique incrementally crafts gaze shift paths (GSP) for each scene. The primary advantages of LIAL are twofold: firstly, it maintains the local coherence of varied scenes efficiently, and secondly, it allows for the active selection process to be shaped by human interaction. By employing LIAL, we methodically represent a GSP for every scene and calculate its corresponding deep features by a multi-layer aggregating algorithm. The deeply-learned GSP representations are subsequently encoded to a Gaussian mixture model (GMM), serving as the basis for scenic image retargeting. Our empirical analyses affirm the effectiveness of our proposed methodology. Statistics of our designed user study showed that our retargeting outperforms the five counterparts. Besides, compared to other 17 popular visual recognizers, our method’s precision exceeds the second best performer by 3%, and the testing time consumption is only 49.8% of the second best performer.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12089501/full.md

---
Source: https://tomesphere.com/paper/PMC12089501