On the Importance of Karaka Framework in Multi-modal Grounding
Sai Kiran Gorthi, Radhika Mamidi

TL;DR
This paper explores the potential benefits of the Karaka Framework, based on the Computational Paninian Grammar model, for improving multi-modal grounding in vision-language navigation tasks, an area with limited prior study.
Contribution
It introduces a novel investigation into the application of the CPG dependency scheme in multi-modal vision-language tasks, highlighting its potential advantages and challenges.
Findings
Potential for more semantically aligned dependency relations
Insights into the applicability of CPG in multi-modal tasks
Foundation for future empirical evaluation
Abstract
Computational Paninian Grammar model helps in decoding a natural language expression as a series of modifier-modified relations and therefore facilitates in identifying dependency relations closer to language (context) semantics compared to the usual Stanford dependency relations. However, the importance of this CPG dependency scheme has not been studied in the context of multi-modal vision and language applications. At IIIT Hyderabad, we plan to perform a novel study to explore the potential advantages and disadvantages of CPG framework in a vision-language navigation task setting, a popular and challenging multi-modal grounding task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Natural Language Processing Techniques
