Valence extraction using EM selection and co-occurrence matrices
{\L}ukasz D\k{e}bowski

TL;DR
This paper introduces two novel methods for extracting verb valences from raw texts, utilizing an EM selection algorithm for disambiguation and co-occurrence matrices for filtering, specifically applied to Polish language processing.
Contribution
It presents the first application of EM selection and co-occurrence matrices for unsupervised verb valence extraction in Polish, improving accuracy over standard filtering methods.
Findings
Achieved an F-score of 45% with the new method.
Outperformed standard BHT filtering which had an F-score of 39%.
Demonstrated effectiveness of co-occurrence matrices in argument filtering.
Abstract
This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs unsupervised disambiguation of valence frame forests, obtained by applying a non-probabilistic deep grammar parser and some post-processing to the text. The second new idea concerns filtering of incorrect frames detected in the parsed text and is motivated by an observation that verbs which take similar arguments tend to have similar frames. This phenomenon is described in terms of newly introduced co-occurrence matrices. Using co-occurrence matrices, we split filtering into two steps. The list of valid arguments is first determined for each verb, whereas the pattern according to which the arguments are combined into frames is computed in the following stage. Our best extracted dictionary reaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
