# Topic-Modeling Guided Semantic Clustering for Enhancing CNN-Based Image Classification Using Scale-Invariant Feature Transform and Block Gabor Filtering

**Authors:** Natthaphong Suthamno, Jessada Tanthanuch

PMC · DOI: 10.3390/jimaging12020070 · 2026-02-09

## TL;DR

This paper introduces a new image classification framework that uses semantic clustering to improve CNN performance by grouping images before training.

## Contribution

The novel approach combines topic modeling with semantic clustering to guide CNN training, leading to improved classification accuracy.

## Key findings

- Semantic clustering using topic modeling significantly improves CNN classification accuracy.
- The SIFT pipeline achieves 95.24% accuracy with the MPT strategy, outperforming baseline methods.
- The BGF pipeline achieves 93.76% accuracy with the WPT strategy, also surpassing non-clustered models.

## Abstract

This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local feature descriptors. These descriptors are clustered using K-means to build a visual vocabulary. Bag of Words histograms then represent each image as a visual document. Latent Dirichlet Allocation is applied to uncover latent semantic topics, generating coherent image clusters. Cluster-specific CNN models, including AlexNet, GoogLeNet, and several ResNet variants, are trained under identical conditions to identify the most suitable architecture for each cluster. Two topic guided integration strategies, the Maximum Proportion Topic (MPT) and the Weight Proportion Topic (WPT), are then used to assign test images to the corresponding specialized model. Experimental results show that both the SIFT-based and BGF-based pipelines outperform non-clustered CNN models and a baseline method using Incremental PCA, K-means, Same-Cluster Prediction, and unweighted Ensemble Voting. The SIFT pipeline achieves the highest accuracy of 95.24% with the MPT strategy, while the BGF pipeline achieves 93.76% with the WPT strategy. These findings confirm that semantic structure introduced through topic modeling substantially improves CNN classification performance.

## Full-text entities

- **Diseases:** SIFT (MESH:C538175), glioma (MESH:D005910), injury to (MESH:D014947), tumor (MESH:D009369), pituitary (P) tumor (MESH:D010911), BGF (MESH:D006327), meningioma (MESH:D008579), Brain Tumors (MESH:D001932)
- **Chemicals:** AlexNet (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12941444/full.md

---
Source: https://tomesphere.com/paper/PMC12941444