Context-based Deep Learning Architecture with Optimal Integration Layer for Image Parsing
Ranju Mandal, Basim Azam, and Brijesh Verma

TL;DR
This paper introduces a three-layer deep learning architecture that explicitly integrates visual and contextual information for improved image parsing, utilizing genetic algorithms for optimal fusion of features.
Contribution
It presents a novel three-layer architecture with an optimal fusion layer using genetic algorithms, enhancing the integration of visual and contextual data in image parsing.
Findings
Improved accuracy on benchmark datasets
Stable predictions with optimized network weights
Effective integration of context and visual features
Abstract
Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets are promising. Further analysis shows that optimized network weights can improve performance and make stable predictions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
