Combining Natural Gradient with Hessian Free Methods for Sequence Training
Adnan Haider, P.C. Woodland

TL;DR
This paper introduces a novel optimization method combining Natural Gradient and Hessian Free techniques, improving sequence training of neural networks for speech recognition and reducing word error rates more effectively.
Contribution
It presents a new optimization approach that integrates Natural Gradient with Hessian Free methods, derived from Information Geometry principles, for enhanced sequence training of DNNs.
Findings
Achieves larger WER reductions than NG and HF with the same updates.
Outperforms standard stochastic gradient descent in sequence training.
Addresses over-fitting issues due to training criterion mismatch.
Abstract
This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method is derived using an alternative derivation of Taylor's theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. The efficacy of the method is shown within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. It is shown that for the same number of updates the proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
