Normalizador Neural de Datas e Endere\c{c}os
Gustavo Plensack, Paulo Finardi

TL;DR
This paper introduces a deep neural network model based on T5 to normalize diverse date and address formats in texts, achieving over 90% accuracy and handling unexpected and noisy data effectively.
Contribution
The authors propose a novel neural network approach that generalizes date and address normalization beyond rigid pattern matching methods.
Findings
Achieved over 90% accuracy in normalizing varied date and address formats.
Handled noisy data simulating real-world errors effectively.
Provided a flexible deep learning solution surpassing traditional rule-based tools.
Abstract
Documents of any kind present a wide variety of date and address formats, in some cases dates can be written entirely in full or even have different types of separators. The pattern disorder in addresses is even greater due to the greater possibility of interchanging between streets, neighborhoods, cities and states. In the context of natural language processing, problems of this nature are handled by rigid tools such as ReGex or DateParser, which are efficient as long as the expected input is pre-configured. When these algorithms are given an unexpected format, errors and unwanted outputs happen. To circumvent this challenge, we present a solution with deep neural networks state of art T5 that treats non-preconfigured formats of dates and addresses with accuracy above 90% in some cases. With this model, our proposal brings generalization to the task of normalizing dates and addresses.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems · Time Series Analysis and Forecasting
MethodsLinear Layer · Gated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Softmax · Inverse Square Root Schedule · Dense Connections · Dropout · Byte Pair Encoding · SentencePiece
