Loading paper
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark | Tomesphere