Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance
Jihang Li, Qing Liu, Zulong Chen, Jing Wang, Wei Wang, Chuanfei Xu, Zeyi Wen

TL;DR
Taxon is a novel framework that combines semantic alignment and expert guidance to improve hierarchical tax code prediction accuracy for large-scale e-commerce, with successful deployment in Alibaba.
Contribution
Introduces a semantically aligned, expert-guided hierarchical tax code prediction model integrating multi-modal features and large language model distillation.
Findings
Achieves state-of-the-art performance on proprietary and public benchmarks.
Significantly improves structural consistency in tax code predictions.
Successfully deployed in Alibaba's production system, handling over 500,000 queries daily.
Abstract
Tax code prediction is a crucial yet underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms. Each product must be accurately mapped to a node within a multi-level taxonomic hierarchy defined by national standards, where errors lead to financial inconsistencies and regulatory risks. This paper presents Taxon, a semantically aligned and expert-guided framework for hierarchical tax code prediction. Taxon integrates (i) a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels, and (ii) a semantic consistency model distilled from large language models acting as domain experts to verify alignment between product titles and official tax definitions. To address noisy supervision in real business records, we design a multi-source training pipeline that combines curated tax databases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
