Coin_flipper at eHealth-KD Challenge 2019: Voting LSTMs for Key Phrases and Semantic Relation Identification Applied to Spanish eHealth Texts
Neus Catal\`a, Mario Martin

TL;DR
This paper presents a bi-LSTM-based system for key phrase extraction and semantic relation identification in Spanish eHealth texts, utilizing F1 surrogate loss and ensemble methods, achieving high competitive performance in the 2019 challenge.
Contribution
The approach introduces a surrogate F1 loss function and ensemble voting with standard bi-LSTM models for eHealth text processing tasks.
Findings
Ranked second with 62.18% F1 score
Effective use of surrogate loss function
Ensemble voting improved prediction accuracy
Abstract
This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our approach are: (a) The use of a surrogate function of F1 as loss function to close the gap between the minimization function and the evaluation metric, and (b) The generation of an ensemble of models for generating predictions by majority vote. Our system ranked second with an F1 score of 62.18% in the main task by a narrow margin with the winner that scored 63.94%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
