Loading paper
STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models | Tomesphere