Neck-Learn: Attention-Based Multiple Instance Learning and Ensemble Framework for Ecological Momentary Assessment
Ahsan Jamal Cheema

TL;DR
This paper presents a hybrid deep learning framework combining CNN-based multiple instance learning and gradient-boosted trees to improve ambulatory detection of vocal hyperfunction from daily neck-surface accelerometer data.
Contribution
It introduces a novel architecture that preserves within-day temporal dynamics, outperforming existing challenge baselines in detecting vocal hyperfunction.
Findings
Achieved AUC of 0.879 for PVH detection
Outperformed challenge baselines with AUCs of 0.82 and 0.77
Provided clinically relevant insights into vocal hyperfunction
Abstract
Vocal hyperfunction (VH) is a prevalent voice disorder whose ambulatory detection remains challenging despite extensive daily voice data. Prior approaches capture week-long neck-surface accelerometer recordings but collapse them into fixed-length subject-level feature vectors, discarding within-day temporal dynamics encoding nuanced voicing feature interactions. We introduce a novel hybrid architecture combining gradient-boosted trees on day-level distributional features with a CNN-based multiple instance learning (MIL) framework that preserves and learns from from temporal dynamics throughout each day. On the held-out test set, our model exceeds the challenge baselines (AUC: 0.82 PVH, 0.77 NPVH), achieving AUCs of 0.879 for PVH (Rank 5) and 0.848 for NPVH (Rank 3), while also providing insights into clinically relevant information about both pathologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
