Comparing Computational Architectures for Automated Journalism

Yan V. Sym; Jo\~ao Gabriel M. Campos; Marcos M. Jos\'e; Fabio G.; Cozman

arXiv:2210.04107·cs.CL·October 11, 2022

Comparing Computational Architectures for Automated Journalism

Yan V. Sym, Jo\~ao Gabriel M. Campos, Marcos M. Jos\'e, Fabio G., Cozman

PDF

Open Access

TL;DR

This paper compares traditional template and pipeline architectures with neural end-to-end models for data-to-text generation in Brazilian Portuguese, finding that explicit intermediate steps improve text quality and reduce hallucination.

Contribution

It provides a comparative analysis of different generation architectures for Brazilian Portuguese, highlighting the advantages of explicit intermediate representations over neural end-to-end models.

Findings

01

Explicit intermediate steps produce higher quality texts.

02

End-to-end neural models tend to hallucinate data.

03

Traditional architectures generalize better to unseen inputs.

Abstract

The majority of NLG systems have been designed following either a template-based or a pipeline-based architecture. Recent neural models for data-to-text generation have been proposed with an end-to-end deep learning flavor, which handles non-linguistic input in natural language without explicit intermediary representations. This study compares the most often employed methods for generating Brazilian Portuguese texts from structured data. Results suggest that explicit intermediate steps in the generation process produce better texts than the ones generated by neural end-to-end architectures, avoiding data hallucination while better generalizing to unseen inputs. Code and corpus are publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational Physics and Python Applications