Falcon Perception
Aviraj Bevli, Sofian Chaybouti, Yasser Dahou, Hakim Hacid, Ngoc Dung Huynh, Phuc H. Le Khac, Sanath Narayan, Wamiq Reyaz Para, Ankit Singh

TL;DR
Falcon Perception introduces a unified dense Transformer that processes images and text jointly, simplifying perception systems and achieving superior performance on dense prediction and OCR benchmarks.
Contribution
A novel single-architecture Transformer model that combines perception and task modeling in a unified framework, reducing complexity and improving performance.
Findings
Improves mask quality to 68.0 Macro-F1 on SA-Co
Achieves 80.3% on olmOCR with a 300M parameter model
Outperforms prior methods on PBench benchmark
Abstract
Perception-centric systems are typically implemented with a modular encoder-decoder pipeline: a vision backbone for feature extraction and a separate decoder (or late-fusion module) for task prediction. This raises a central question: is this architectural separation essential or can a single early-fusion stack do both perception and task modeling at scale? We introduce Falcon Perception, a unified dense Transformer that processes image patches and text tokens in a shared parameter space from the first layer, using a hybrid attention pattern (bidirectional among image tokens, causal for prediction tokens) to combine global visual context with autoregressive, variable-length instance generation. To keep dense outputs practical, Falcon Perception retains a lightweight token interface and decodes continuous spatial outputs with specialized heads, enabling parallel high-resolution mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tiiuae/Falcon-OCRmodel· 11k dl· ♡ 10111k dl♡ 101
- 🤗tiiuae/Falcon-Perceptionmodel· 9.4k dl· ♡ 1239.4k dl♡ 123
- 🤗tiiuae/Falcon-Perception-300Mmodel· 1.3k dl· ♡ 111.3k dl♡ 11
- 🤗beaupi/Falcon-OCR-oQ6model· 10 dl10 dl
- 🤗beaupi/Falcon-OCR-oQ8model· 6 dl6 dl
- 🤗dummy9996/Falcon-OCR-fp8model· 41 dl41 dl
- 🤗introvoyz041/Falcon-Perceptionmodel· 13 dl13 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
