Loading paper
Vision language models are blind: Failing to translate detailed visual features into words | Tomesphere