DeepSeek-OCR 2: Visual Causal Flow
Haoran Wei, Yaofeng Sun, Yukun Li

TL;DR
DeepSeek-OCR 2 introduces a novel encoder capable of dynamically reordering visual tokens based on semantic and causal reasoning, aiming to improve 2D image understanding by mimicking human visual perception.
Contribution
It proposes a new encoder architecture with causal reasoning to reorder visual tokens, enhancing 2D image understanding in vision-language models.
Findings
Enables causal-informed token reordering for complex layouts
Demonstrates improved semantic coherence in visual token processing
Provides publicly accessible code and models
Abstract
We present DeepSeek-OCR 2 to investigate the feasibility of a novel encoder-DeepEncoder V2-capable of dynamically reordering visual tokens upon image semantics. Conventional vision-language models (VLMs) invariably process visual tokens in a rigid raster-scan order (top-left to bottom-right) with fixed positional encoding when fed into LLMs. However, this contradicts human visual perception, which follows flexible yet semantically coherent scanning patterns driven by inherent logical structures. Particularly for images with complex layouts, human vision exhibits causally-informed sequential processing. Inspired by this cognitive mechanism, DeepEncoder V2 is designed to endow the encoder with causal reasoning capabilities, enabling it to intelligently reorder visual tokens prior to LLM-based content interpretation. This work explores a novel paradigm: whether 2D image understanding can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗deepseek-ai/DeepSeek-OCR-2model· 1.3M dl· ♡ 8891.3M dl♡ 889
- 🤗alvinfn/DeepSeek-OCR-2model· 7 dl7 dl
- 🤗JLFaller/DeepSeek-OCR-2model· 12 dl12 dl
- 🤗CrossNow/DeepSeek-OCR-2model· 30 dl30 dl
- 🤗MurphyA/DeepSeek-OCR-2model· 26 dl26 dl
- 🤗Scottjj1199/DeepSeek-OCR-2model· 24 dl24 dl
- 🤗S7351CUN/DeepSeek-OCR-2model· 27 dl27 dl
- 🤗kingabzpro/deepseek-ocr-2-urdu-ocr-1m-loramodel· ♡ 1♡ 1
- 🤗Basha001/DeepSeek-OCR-2model· 13 dl13 dl
- 🤗thisisiron/DeepSeek-OCR-2-hfmodel· 78 dl78 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Ferroelectric and Negative Capacitance Devices
