Classification of descriptions and summary using multiple passes of statistical and natural language toolkits
Saumya Banthia, Anantha Sharma

TL;DR
This paper presents a method for checking the relevance of summaries or definitions to entity names using multiple passes of statistical and natural language processing tools, primarily applied to package descriptions from PyPI.
Contribution
It introduces a name relevance classifier that combines statistical and NLP techniques to assess summary relevance, with potential for integration with other scoring methods.
Findings
Achieved objective relevance scoring on package descriptions
Proposed multi-pass NLP approach enhances relevance detection
Potential for improving automated summarization and classification
Abstract
This document describes a possible approach that can be used to check the relevance of a summary / definition of an entity with respect to its name. This classifier focuses on the relevancy of an entity's name to its summary / definition, in other words, it is a name relevance check. The percentage score obtained from this approach can be used either on its own or used to supplement scores obtained from other metrics to arrive upon a final classification; at the end of the document, potential improvements have also been outlined. The dataset that this document focuses on achieving an objective score is a list of package names and their respective summaries (sourced from pypi.org).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
