AstBERT: Enabling Language Model for Financial Code Understanding with   Abstract Syntax Trees

Rong Liang; Tiehua Zhang; Yujie Lu; Yuze Liu; Zhen Huang; Xin Chen

arXiv:2201.07984·cs.AI·October 12, 2022·1 cites

AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees

Rong Liang, Tiehua Zhang, Yujie Lu, Yuze Liu, Zhen Huang, Xin Chen

PDF

Open Access

TL;DR

AstBERT is a specialized pre-trained language model that leverages abstract syntax trees to improve understanding of financial source code, enhancing performance in code-related tasks.

Contribution

The paper introduces AstBERT, a novel model integrating AST information into pre-trained language models for better financial code understanding.

Findings

01

AstBERT outperforms baseline models on code question answering.

02

AstBERT achieves higher accuracy in code clone detection.

03

AstBERT shows promising results in code refinement tasks.

Abstract

Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to solve programming language-related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Stock Market Forecasting Methods · Online Learning and Analytics