Classification of descriptions and summary using multiple passes of   statistical and natural language toolkits

Saumya Banthia; Anantha Sharma

arXiv:2009.04953·cs.CL·December 18, 2024·1 cites

Classification of descriptions and summary using multiple passes of statistical and natural language toolkits

Saumya Banthia, Anantha Sharma

PDF

Open Access

TL;DR

This paper presents a method for checking the relevance of summaries or definitions to entity names using multiple passes of statistical and natural language processing tools, primarily applied to package descriptions from PyPI.

Contribution

It introduces a name relevance classifier that combines statistical and NLP techniques to assess summary relevance, with potential for integration with other scoring methods.

Findings

01

Achieved objective relevance scoring on package descriptions

02

Proposed multi-pass NLP approach enhances relevance detection

03

Potential for improving automated summarization and classification

Abstract

This document describes a possible approach that can be used to check the relevance of a summary / definition of an entity with respect to its name. This classifier focuses on the relevancy of an entity's name to its summary / definition, in other words, it is a name relevance check. The percentage score obtained from this approach can be used either on its own or used to supplement scores obtained from other metrics to arrive upon a final classification; at the end of the document, potential improvements have also been outlined. The dataset that this document focuses on achieving an objective score is a list of package names and their respective summaries (sourced from pypi.org).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression