Sensitivity Analysis on Transferred Neural Architectures of BERT and GPT-2 for Financial Sentiment Analysis
Tracy Qian, Andy Xie, Camille Bruckmann

TL;DR
This paper evaluates the fine-tuning sensitivity of BERT and GPT-2 models for financial sentiment analysis, revealing differences in stability and the importance of early layers for preserving word patterns.
Contribution
It provides a comparative analysis of the sensitivity of BERT and GPT-2 during fine-tuning for financial NLP tasks, highlighting stability differences and layer importance.
Findings
BERT is hypersensitive to stochasticity during fine-tuning.
GPT-2 exhibits more stable fine-tuning behavior.
Early layers of both models contain crucial word pattern information.
Abstract
The explosion in novel NLP word embedding and deep learning techniques has induced significant endeavors into potential applications. One of these directions is in the financial sector. Although there is a lot of work done in state-of-the-art models like GPT and BERT, there are relatively few works on how well these methods perform through fine-tuning after being pre-trained, as well as info on how sensitive their parameters are. We investigate the performance and sensitivity of transferred neural architectures from pre-trained GPT-2 and BERT models. We test the fine-tuning performance based on freezing transformer layers, batch size, and learning rate. We find the parameters of BERT are hypersensitive to stochasticity in fine-tuning and that GPT-2 is more stable in such practice. It is also clear that the earlier layers of GPT-2 and BERT contain essential word pattern information that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Energy Load and Power Forecasting · Topic Modeling
MethodsINFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors · Multi-Head Attention · Attention Is All You Need · Test · Linear Layer · Linear Warmup With Linear Decay · Byte Pair Encoding · Layer Normalization · Weight Decay · WordPiece
