Loading paper
Structuring GUI Elements through Vision Language Models: Towards Action Space Generation | Tomesphere