AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees
Rong Liang, Tiehua Zhang, Yujie Lu, Yuze Liu, Zhen Huang, Xin Chen

TL;DR
AstBERT is a specialized pre-trained language model that leverages abstract syntax trees to improve understanding of financial source code, enhancing performance in code-related tasks.
Contribution
The paper introduces AstBERT, a novel model integrating AST information into pre-trained language models for better financial code understanding.
Findings
AstBERT outperforms baseline models on code question answering.
AstBERT achieves higher accuracy in code clone detection.
AstBERT shows promising results in code refinement tasks.
Abstract
Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to solve programming language-related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Stock Market Forecasting Methods · Online Learning and Analytics
