Deep Collaborative Learning for Visual Recognition
Yan Wang, Lingxi Xie, Ya Zhang, Wenjun Zhang, Alan Yuille

TL;DR
This paper introduces Deep Collaborative Learning (DCL), a novel method that reduces model complexity in deep neural networks for visual recognition by efficiently learning compositional visual concepts, leading to improved accuracy and fewer parameters.
Contribution
DCL replaces traditional convolutional layers with a two-stage module that models feature co-occurrence, increasing vocabulary size exponentially with linear growth in complexity.
Findings
DCL improves recognition accuracy across multiple datasets.
DCL reduces model parameters significantly, e.g., 16.82% fewer in AlexNet.
DCL is effective on various network architectures.
Abstract
Deep neural networks are playing an important role in state-of-the-art visual recognition. To represent high-level visual concepts, modern networks are equipped with large convolutional layers, which use a large number of filters and contribute significantly to model complexity. For example, more than half of the weights of AlexNet are stored in the first fully-connected layer (4,096 filters). We formulate the function of a convolutional layer as learning a large visual vocabulary, and propose an alternative way, namely Deep Collaborative Learning (DCL), to reduce the computational complexity. We replace a convolutional layer with a two-stage DCL module, in which we first construct a couple of smaller convolutional layers individually, and then fuse them at each spatial position to consider feature co-occurrence. In mathematics, DCL can be explained as an efficient way of learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
