Loading paper
Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances | Tomesphere