DepthTCM: High Efficient Depth Compression via Physics-aware Transformer-CNN Mixed Architecture
Young-Seo Chang, Yatong An, Jae-Sang Hyun

TL;DR
DepthTCM introduces a physics-aware, end-to-end depth map compression framework that converts high-bit depth maps into a 3-channel image, then encodes and compresses it using a Transformer-CNN hybrid neural network, achieving high fidelity at low bitrates.
Contribution
The paper presents a novel physics-inspired depth encoding method combined with a Transformer-CNN architecture for efficient depth map compression, demonstrating superior performance and scalability.
Findings
Achieves 0.307 bpp on Middlebury 2014 with 99.38% accuracy.
Reduces bitrate by 66% with 4-bit quantization while maintaining quality.
Transformer-CNN blocks improve PSNR by up to 0.75 dB over CNN-only models.
Abstract
We propose DepthTCM, a physics-aware end-to-end framework for depth map compression. In our framework of DepthTCM, the high-bit depth map is first converted to a conventional 3-channel image representation losslessly using a method inspired by a physical sinusoidal fringe pattern based profiliometry system, then the 3-channel color image is encoded and decoded by a recently developed Transformer-CNN mixed neural network architecture. Specifically, DepthTCM maps depth to a smooth 3-channel using multiwavelength depth (MWD) encoding, then globally quantized the MWD encoded representation to 4 bits per channel to reduce entropy, and finally is compressed using a learned codec that combines convolutional and Transformer layers. Experiment results demonstrate the advantage of our proposed method. On Middlebury 2014, DepthTCM reaches 0.307 bpp while preserving 99.38% accuracy, a level of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies · Advanced Vision and Imaging
