Comparison between the Structures of Word Co-occurrence and Word Similarity Networks for Ill-formed and Well-formed Texts in Taiwan Mandarin
Po-Hsuan Huang, Hsuan-Lei Shao

TL;DR
This paper compares the structural properties of word co-occurrence and word similarity networks in Taiwan Mandarin, examining differences between ill-formed internet texts and well-formed judicial texts to assess universality across languages and network types.
Contribution
It investigates and compares the structural characteristics of word co-occurrence and similarity networks in Taiwan Mandarin, focusing on differences between ill-formed and well-formed texts.
Findings
Word co-occurrence networks are small-world and disassortative in both text types.
Ill-formed texts' networks are scale-free and follow power law distribution.
Structural properties are consistent across languages and network types.
Abstract
The study of word co-occurrence networks has attracted the attention of researchers due to their potential significance as well as applications. Understanding the structure of word co-occurrence networks is therefore important to fully realize their significance and usages. In past studies, word co-occurrence networks built on well-formed texts have been found to possess certain characteristics, including being small-world, following a two-regime power law distribution, and being generally disassortative. On the flip side, past studies have found that word co-occurrence networks built from ill-formed texts such as microblog posts may behave differently from those built from well-formed documents. While both kinds of word co-occurrence networks are small-world and disassortative, word co-occurrence networks built from ill-formed texts are scale-free and follow the power law distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsSoftmax · Attention Is All You Need · FLIP
