LFTK: Handcrafted Features in Computational Linguistics
Bruce W. Lee, Jason Hyung-Jong Lee

TL;DR
LFTK is an open-source, systematically organized multilingual system that extracts over 220 handcrafted linguistic features, aiding various NLP tasks and addressing previous inconsistencies and lack of standardization.
Contribution
It introduces a comprehensive categorization, correlation analysis, and an expandable multilingual extraction system for handcrafted linguistic features.
Findings
Identified over 220 useful handcrafted features
Provided correlation insights for task-specific applications
Developed and released the largest open-source feature extraction system
Abstract
Past research has identified a rich set of handcrafted linguistic features that can potentially assist various tasks. However, their extensive number makes it difficult to effectively select and utilize existing handcrafted features. Coupled with the problem of inconsistent implementation across research works, there has been no categorization scheme or generally-accepted feature names. This creates unwanted confusion. Also, most existing handcrafted feature extraction libraries are not open-source or not actively maintained. As a result, a researcher often has to build such an extraction system from the ground up. We collect and categorize more than 220 popular handcrafted features grounded on past literature. Then, we conduct a correlation analysis study on several task-specific datasets and report the potential use cases of each feature. Lastly, we devise a multilingual handcrafted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
