Domain-Adaptive Small Language Models for Structured Tax Code Prediction
Souvik Nath, Sumit Wadhwa, Luis Perez

TL;DR
This paper introduces a domain-adaptive encoder-decoder small language model for accurately predicting hierarchical tax codes from unstructured data, outperforming flat classifiers and other architectures in structured sequence prediction tasks.
Contribution
It presents a novel encoder-decoder small language model tailored for hierarchical tax code prediction, demonstrating superior performance over existing models in this domain.
Findings
Encoder-decoder SLMs outperform flat classifiers in tax code prediction.
The approach effectively captures hierarchical dependencies in structured tax codes.
Model scalability to other government-mandated tax codes is demonstrated.
Abstract
Every day, multinational firms process thousands of transactions, each of which must adhere to tax regulations that vary by jurisdiction and are often nuanced. The determination of product and service tax codes, such as HSN or SAC is a major use case in Tax compliance. An accurate determination of such codes is imperative to avoid any tax penalties. This paper proposes a domain-adaptive small language model (SLM) with an encoder-decoder architecture for the enhanced prediction of product and service tax codes. In this approach, we address the problem of predicting hierarchical tax code sequences using unstructured product and services data. We employ an SLM based upon encoder-decoder architecture as this enables sequential generation of tax codes to capture the hierarchical dependencies present within the tax codes. Our experiments demonstrate that encoder-decoder SLMs can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Imbalanced Data Classification Techniques
