SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus
Yufeng Zhao, Haiying Che

TL;DR
This paper introduces SkIn, a skimming-based approach that dynamically selects critical information from long medical texts to enable efficient BERT-based classification, achieving superior accuracy with linear resource scaling.
Contribution
The paper proposes a novel skimming-intensive model (SkIn) that improves long-text classification efficiency and accuracy using BERT in the medical domain.
Findings
SkIn outperforms baseline methods in accuracy on medical long-text datasets.
Time and space complexity of SkIn increase linearly with text length.
SkIn effectively reduces computational costs compared to standard BERT.
Abstract
BERT is a widely used pre-trained model in natural language processing. However, since BERT is quadratic to the text length, the BERT model is difficult to be used directly on the long-text corpus. In some fields, the collected text data may be quite long, such as in the health care field. Therefore, to apply the pre-trained language knowledge of BERT to long text, in this paper, imitating the skimming-intensive reading method used by humans when reading a long paragraph, the Skimming-Intensive Model (SkIn) is proposed. It can dynamically select the critical information in the text so that the sentence input into the BERT-Base model is significantly shortened, which can effectively save the cost of the classification algorithm. Experiments show that the SkIn method has achieved superior accuracy than the baselines on long-text classification datasets in the medical field, while its time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Adam · Softmax · Dropout · Dense Connections · Residual Connection · Weight Decay
