Loading paper
From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Grounded Open-vocabulary Situation Recognition | Tomesphere