Two Steps Feature Selection and Neural Network Classification for the TREC-8 Routing
Mathieu Stricker, Frantz Vichot, Gerard Dreyfus, Francis Wolinski

TL;DR
This paper presents a two-step feature selection process combined with neural network classification to improve document relevance filtering for TREC-8 routing, automating feature selection and optimizing document representation.
Contribution
It introduces a topic-dependent, automatic feature selection method that enhances neural network classification for document filtering in TREC-8.
Findings
Effective term selection improves classification accuracy.
Automatic vector length optimization enhances relevance detection.
Top-ranked documents are accurately identified for TREC submission.
Abstract
For the TREC-8 routing, one specific filter is built for each topic. Each filter is a classifier trained to recognize the documents that are relevant to the topic. When presented with a document, each classifier estimates the probability for the document to be relevant to the topic for which it has been trained. Since the procedure for building a filter is topic-independent, the system is fully automatic. By making use of a sample of documents that have previously been evaluated as relevant or not relevant to a particular topic, a term selection is performed, and a neural network is trained. Each document is represented by a vector of frequencies of a list of selected terms. This list depends on the topic to be filtered; it is constructed in two steps. The first step defines the characteristic words used in the relevant documents of the corpus; the second one chooses, among the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Topic Modeling
