Detecting Syntactic Features of Translated Chinese

Hai Hu; Wen Li; Sandra K\"ubler

arXiv:1804.08756·cs.CL·April 25, 2018·1 cites

Detecting Syntactic Features of Translated Chinese

Hai Hu, Wen Li, Sandra K\"ubler

PDF

Open Access

TL;DR

This study demonstrates that syntactic features alone can effectively distinguish translated Chinese from original texts, achieving high accuracy without relying on lexical information, and provides insights into syntactic patterns in translated Chinese.

Contribution

It introduces a machine learning method using syntactic features to identify translated Chinese, showing high accuracy and interpretability without lexical cues.

Findings

01

Syntactic features achieve over 90% F-measure in classification.

02

Translated Chinese shows increased use of determiners and pronouns.

03

Syntactic features align with previous translation studies on Chinese.

Abstract

We present a machine learning approach to distinguish texts translated to Chinese (by humans) from texts originally written in Chinese, with a focus on a wide range of syntactic features. Using Support Vector Machines (SVMs) as classifier on a genre-balanced corpus in translation studies of Chinese, we find that constituent parse trees and dependency triples as features without lexical information perform very well on the task, with an F-measure above 90%, close to the results of lexical n-gram features, without the risk of learning topic information rather than translation features. Thus, we claim syntactic features alone can accurately distinguish translated from original Chinese. Translated Chinese exhibits an increased use of determiners, subject position pronouns, NP + 'de' as NP modifiers, multiple NPs or VPs conjoined by a Chinese specific punctuation, among other structures. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices