Exploring External Knowledge for Accurate modeling of Visual and Language Problems
Xuewen Yang

TL;DR
This paper investigates how external knowledge sources can be integrated into visual and language AI models to enhance their performance across various challenging tasks like captioning and translation.
Contribution
It introduces a methodology for extracting and integrating external knowledge into existing models, significantly improving their accuracy on multiple tasks.
Findings
Enhanced image captioning performance
Improved machine translation accuracy
Effective use of external knowledge sources
Abstract
The interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. The success can be partly attributed to the advancements of deep neural networks made in the sub-fields of AI such as Computer Vision (CV) and Natural Language Processing (NLP). The promising research area that this dissertation focuses on is visual and language understanding which involves many challenging tasks, i.e., classification, detection, segmentation, machine translation and captioning, etc. The state-of-the-art methods for solving these problems usually involves only two parts: source data and target labels, which is rather insufficient especially when the dataset is small. Meanwhile, many external tools or sources can provide extra useful information (external knowledge) that can help improve the performance of these methods. For example, a detection model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsResidual Connection · Batch Normalization · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization · Max Pooling · Average Pooling · Convolution · Residual Block · Bottleneck Residual Block
