Mining Spatio-temporal Data on Industrialization from Historical   Registries

David Berenbaum; Dwyer Deighan; Thomas Marlow; Ashley Lee; Scott; Frickel; Mark Howison

arXiv:1612.00992·cs.CV·April 21, 2021

Mining Spatio-temporal Data on Industrialization from Historical Registries

David Berenbaum, Dwyer Deighan, Thomas Marlow, Ashley Lee, Scott, Frickel, Mark Howison

PDF

1 Repo

TL;DR

This paper presents a novel data-mining pipeline that combines layout analysis and OCR to extract structured, geocoded spatio-temporal data from printed registries, enabling detailed analysis of historical industrialization.

Contribution

The authors develop an integrated method for digitizing and structuring printed socioenvironmental data, facilitating new insights into historical industrial land use patterns.

Findings

01

Dispersal of manufacturing from Providence's urban core along I-95.

02

High-resolution spatio-temporal data enables detailed socioenvironmental analysis.

03

Method successfully extracts structured data from scanned printed directories.

Abstract

Despite the growing availability of big data in many fields, historical data on socioevironmental phenomena are often not available due to a lack of automated and scalable approaches for collecting, digitizing, and assembling them. We have developed a data-mining method for extracting tabulated, geocoded data from printed directories. While scanning and optical character recognition (OCR) can digitize printed text, these methods alone do not capture the structure of the underlying data. Our pipeline integrates both page layout analysis and OCR to extract tabular, geocoded data from structured text. We demonstrate the utility of this method by applying it to scanned manufacturing registries from Rhode Island that record 41 years of industrial land use. The resulting spatio-temporal data can be used for socioenvironmental analyses of industrialization at a resolution that was not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/brown-data-science/georeg
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.