Loading paper
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers | Tomesphere