Foundational Study on Authorship Attribution of Japanese Web Reviews for Actor Analysis

Hiroshi Matsubara; Shingo Matsugaya; Taichi Aoki; Masaki Hashimoto

arXiv:2604.16376·cs.CL·April 21, 2026

Foundational Study on Authorship Attribution of Japanese Web Reviews for Actor Analysis

Hiroshi Matsubara, Shingo Matsugaya, Taichi Aoki, Masaki Hashimoto

PDF

TL;DR

This paper explores authorship attribution of Japanese web reviews using stylistic features, comparing methods like BERT fine-tuning and TF-IDF, with implications for actor analysis in threat intelligence.

Contribution

It evaluates various stylistic feature-based methods for Japanese review authorship attribution, highlighting their strengths and limitations for threat intelligence applications.

Findings

01

BERT fine-tuning achieved the highest accuracy but was unstable with many authors.

02

TF-IDF with logistic regression was more stable and cost-effective for larger author sets.

03

Error analysis identified boilerplate, topic dependency, and short texts as main misclassification factors.

Abstract

This study investigates the applicability of authorship attribution based on stylistic features to support actor analysis in threat intelligence. As a foundational step toward future application to dark web forums, we conducted experiments using Japanese review data from clear web sources. We constructed datasets from Rakuten Ichiba reviews and compared four methods: TF-IDF with logistic regression (TF-IDF+LR), BERT embeddings with logistic regression (BERT-Emb+LR), BERT fine-tuning (BERT-FT), and metric learning with $k$ -nearest neighbors (Metric+kNN). Results showed that BERT-FT achieved the best performance; however, training became unstable as the number of authors scaled to several hundred, where TF-IDF+LR proved superior in terms of accuracy, stability, and computational cost. Furthermore, Top- $k$ evaluation demonstrated the utility of candidate screening, and error analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.