TL;DR
This paper introduces a novel, efficient algorithm for computing higher-order derivatives of weighted finite-state machines, enhancing the calculation of second-order expectations crucial for NLP applications.
Contribution
The authors present the first general algorithm for all-order derivatives of weighted finite-state machines, with an optimized second-order derivative computation scheme.
Findings
Algorithm runs in $ ext{O}(A^2 N^4)$ time for second derivatives
Significantly faster than previous methods
Enables efficient computation of second-order expectations
Abstract
Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time -- from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal time where is the alphabet size and is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
