Loading paper
Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval | Tomesphere