DepthTCM: High Efficient Depth Compression via Physics-aware Transformer-CNN Mixed Architecture

Young-Seo Chang; Yatong An; Jae-Sang Hyun

arXiv:2603.21233·cs.CV·March 24, 2026

DepthTCM: High Efficient Depth Compression via Physics-aware Transformer-CNN Mixed Architecture

Young-Seo Chang, Yatong An, Jae-Sang Hyun

PDF

Open Access

TL;DR

DepthTCM introduces a physics-aware, end-to-end depth map compression framework that converts high-bit depth maps into a 3-channel image, then encodes and compresses it using a Transformer-CNN hybrid neural network, achieving high fidelity at low bitrates.

Contribution

The paper presents a novel physics-inspired depth encoding method combined with a Transformer-CNN architecture for efficient depth map compression, demonstrating superior performance and scalability.

Findings

01

Achieves 0.307 bpp on Middlebury 2014 with 99.38% accuracy.

02

Reduces bitrate by 66% with 4-bit quantization while maintaining quality.

03

Transformer-CNN blocks improve PSNR by up to 0.75 dB over CNN-only models.

Abstract

We propose DepthTCM, a physics-aware end-to-end framework for depth map compression. In our framework of DepthTCM, the high-bit depth map is first converted to a conventional 3-channel image representation losslessly using a method inspired by a physical sinusoidal fringe pattern based profiliometry system, then the 3-channel color image is encoded and decoded by a recently developed Transformer-CNN mixed neural network architecture. Specifically, DepthTCM maps depth to a smooth 3-channel using multiwavelength depth (MWD) encoding, then globally quantized the MWD encoded representation to 4 bits per channel to reduce entropy, and finally is compressed using a learned codec that combines convolutional and Transformer layers. Experiment results demonstrate the advantage of our proposed method. On Middlebury 2014, DepthTCM reaches 0.307 bpp while preserving 99.38% accuracy, a level of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies · Advanced Vision and Imaging