A Fast Attention Network for Joint Intent Detection and Slot Filling on   Edge Devices

Liang Huang; Senjie Liang; Feiyang Ye; Nan Gao

arXiv:2205.07646·cs.CL·May 17, 2022

A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Liang Huang, Senjie Liang, Feiyang Ye, Nan Gao

PDF

Open Access

TL;DR

This paper introduces a Fast Attention Network (FAN) that efficiently performs joint intent detection and slot filling on edge devices, balancing accuracy and low latency for real-time dialogue systems.

Contribution

The paper proposes a novel attention module and a flexible model design that maintains high accuracy while significantly reducing inference latency on edge hardware.

Findings

01

FAN improves semantic accuracy by over 2%.

02

FAN achieves 15 inferences per second on Jetson Nano.

03

FAN maintains accuracy with minimal drop at various speed levels.

Abstract

Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings