Valence extraction using EM selection and co-occurrence matrices

{\L}ukasz D\k{e}bowski

arXiv:0711.4475·cs.CL·March 11, 2020

Valence extraction using EM selection and co-occurrence matrices

{\L}ukasz D\k{e}bowski

PDF

TL;DR

This paper introduces two novel methods for extracting verb valences from raw texts, utilizing an EM selection algorithm for disambiguation and co-occurrence matrices for filtering, specifically applied to Polish language processing.

Contribution

It presents the first application of EM selection and co-occurrence matrices for unsupervised verb valence extraction in Polish, improving accuracy over standard filtering methods.

Findings

01

Achieved an F-score of 45% with the new method.

02

Outperformed standard BHT filtering which had an F-score of 39%.

03

Demonstrated effectiveness of co-occurrence matrices in argument filtering.

Abstract

This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs unsupervised disambiguation of valence frame forests, obtained by applying a non-probabilistic deep grammar parser and some post-processing to the text. The second new idea concerns filtering of incorrect frames detected in the parsed text and is motivated by an observation that verbs which take similar arguments tend to have similar frames. This phenomenon is described in terms of newly introduced co-occurrence matrices. Using co-occurrence matrices, we split filtering into two steps. The list of valid arguments is first determined for each verb, whereas the pattern according to which the arguments are combined into frames is computed in the following stage. Our best extracted dictionary reaches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.