Sensitivity Analysis on Transferred Neural Architectures of BERT and   GPT-2 for Financial Sentiment Analysis

Tracy Qian; Andy Xie; Camille Bruckmann

arXiv:2207.03037·cs.CL·July 8, 2022·5 cites

Sensitivity Analysis on Transferred Neural Architectures of BERT and GPT-2 for Financial Sentiment Analysis

Tracy Qian, Andy Xie, Camille Bruckmann

PDF

Open Access

TL;DR

This paper evaluates the fine-tuning sensitivity of BERT and GPT-2 models for financial sentiment analysis, revealing differences in stability and the importance of early layers for preserving word patterns.

Contribution

It provides a comparative analysis of the sensitivity of BERT and GPT-2 during fine-tuning for financial NLP tasks, highlighting stability differences and layer importance.

Findings

01

BERT is hypersensitive to stochasticity during fine-tuning.

02

GPT-2 exhibits more stable fine-tuning behavior.

03

Early layers of both models contain crucial word pattern information.

Abstract

The explosion in novel NLP word embedding and deep learning techniques has induced significant endeavors into potential applications. One of these directions is in the financial sector. Although there is a lot of work done in state-of-the-art models like GPT and BERT, there are relatively few works on how well these methods perform through fine-tuning after being pre-trained, as well as info on how sensitive their parameters are. We investigate the performance and sensitivity of transferred neural architectures from pre-trained GPT-2 and BERT models. We test the fine-tuning performance based on freezing transformer layers, batch size, and learning rate. We find the parameters of BERT are hypersensitive to stochasticity in fine-tuning and that GPT-2 is more stable in such practice. It is also clear that the earlier layers of GPT-2 and BERT contain essential word pattern information that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Energy Load and Power Forecasting · Topic Modeling

MethodsINFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors · Multi-Head Attention · Attention Is All You Need · Test · Linear Layer · Linear Warmup With Linear Decay · Byte Pair Encoding · Layer Normalization · Weight Decay · WordPiece