# Adapting Foundation Model for Dental Caries Detection with Dual-View Co-Training

**Authors:** Tao Luo, Han Wu, Tong Yang, Dinggang Shen, Zhiming Cui

arXiv: 2508.20813 · 2025-08-29

## TL;DR

This paper introduces DVCTNet, a dual-view co-training framework that combines global panoramic X-ray analysis with detailed tooth-level inspection for improved dental caries detection accuracy.

## Contribution

The work presents a novel dual-view co-training network with a gated attention module, integrating global and local features for enhanced caries detection, validated on multiple datasets.

## Key findings

- DVCTNet outperforms existing SOTA methods on public and new datasets.
- The dual-view approach improves detection accuracy by combining global and local information.
- The Gated Cross-View Attention module effectively fuses features from both views.

## Abstract

Accurate dental caries detection from panoramic X-rays plays a pivotal role in preventing lesion progression. However, current detection methods often yield suboptimal accuracy due to subtle contrast variations and diverse lesion morphology of dental caries. In this work, inspired by the clinical workflow where dentists systematically combine whole-image screening with detailed tooth-level inspection, we present DVCTNet, a novel Dual-View Co-Training network for accurate dental caries detection. Our DVCTNet starts with employing automated tooth detection to establish two complementary views: a global view from panoramic X-ray images and a local view from cropped tooth images. We then pretrain two vision foundation models separately on the two views. The global-view foundation model serves as the detection backbone, generating region proposals and global features, while the local-view model extracts detailed features from corresponding cropped tooth patches matched by the region proposals. To effectively integrate information from both views, we introduce a Gated Cross-View Attention (GCV-Atten) module that dynamically fuses dual-view features, enhancing the detection pipeline by integrating the fused features back into the detection model for final caries detection. To rigorously evaluate our DVCTNet, we test it on a public dataset and further validate its performance on a newly curated, high-precision dental caries detection dataset, annotated using both intra-oral images and panoramic X-rays for double verification. Experimental results demonstrate DVCTNet's superior performance against existing state-of-the-art (SOTA) methods on both datasets, indicating the clinical applicability of our method. Our code and labeled dataset are available at https://github.com/ShanghaiTech-IMPACT/DVCTNet.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20813/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20813/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/2508.20813/full.md

---
Source: https://tomesphere.com/paper/2508.20813