Loading paper
CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation | Tomesphere