Loading paper
Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models | Tomesphere