# Images Versus Videos in Contrast-Enhanced Ultrasound for Computer-Aided Diagnosis

**Authors:** Marina Adriana Mercioni, Cătălin Daniel Căleanu, Mihai-Eronim-Octavian Ursan

PMC · DOI: 10.3390/s25196247 · Sensors (Basel, Switzerland) · 2025-10-09

## TL;DR

This paper compares image and video analysis in liver lesion diagnosis using contrast-enhanced ultrasound and Transformer models.

## Contribution

The study introduces a new Hybrid Transformer Neural Network for image-based diagnosis and evaluates video-based Transformer models for automated liver lesion classification.

## Key findings

- The HTNN achieved 97.77% accuracy in classifying focal liver lesions using images.
- Video-based models achieved 83-88% accuracy without requiring manual region-of-interest selection.
- Transformer-based models show potential for automated diagnosis by capturing subtle differences between lesion types.

## Abstract

The background of the article refers to the diagnosis of focal liver lesions (FLLs) through contrast-enhanced ultrasound (CEUS) based on the integration of spatial and temporal information. Traditional computer-aided diagnosis (CAD) systems predominantly rely on static images, which limits the characterization of lesion dynamics. This study aims to assess the effectiveness of Transformer-based architectures in enhancing CAD performance within the realm of liver pathology. The methodology involved a systematic comparison of deep learning models for the analysis of CEUS images and videos. For image-based classification, a Hybrid Transformer Neural Network (HTNN) was employed. It combines Vision Transformer (ViT) modules with lightweight convolutional features. For video-based tasks, we evaluated a custom spatio-temporal Convolutional Neural Network (CNN), a CNN with Long Short-Term Memory (LSTM), and a Video Vision Transformer (ViViT). The experimental results show that the HTNN achieved an outstanding accuracy of 97.77% in classifying various types of FLLs, although it required manual selection of the region of interest (ROI). The video-based models produced accuracies of 83%, 88%, and 88%, respectively, without the need for ROI selection. In conclusion, the findings indicate that Transformer-based models exhibit high accuracy in CEUS-based liver diagnosis. This study highlights the potential of attention mechanisms to identify subtle inter-class differences, thereby reducing the reliance on manual intervention.

## Full-text entities

- **Diseases:** FLLs (MESH:D008107)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12527036/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12527036/full.md

## References

96 references — full list in the complete paper: https://tomesphere.com/paper/PMC12527036/full.md

---
Source: https://tomesphere.com/paper/PMC12527036