# Towards the Improvement of Automated Scientific Document Categorization   by Deep Learning

**Authors:** Thomas Krause

arXiv: 1706.05719 · 2017-06-20

## TL;DR

This thesis explores deep learning for automated scientific document categorization, demonstrating high accuracy and feasibility of integrating CNN-based classifiers into REST APIs for scalable classification.

## Contribution

It introduces a CNN-based classifier for scientific document categorization and develops a reusable REST API for integration into existing software systems.

## Key findings

- Deep learning classifier achieves high accuracy in multi-class categorization.
- CNN-based approach is feasible for integration into larger ecosystems.
- API implementation enables automation of classification tasks.

## Abstract

This master thesis describes an algorithm for automated categorization of scientific documents using deep learning techniques and compares the results to the results of existing classification algorithms. As an additional goal a reusable API is to be developed allowing the automation of classification tasks in existing software. A design will be proposed using a convolutional neural network as a classifier and integrating this into a REST based API. This is then used as the basis for an actual proof of concept implementation presented as well in this thesis. It will be shown that the deep learning classifier provides very good result in the context of multi-class document categorization and that it is feasible to integrate such classifiers into a larger ecosystem using REST based services.

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/1706.05719/full.md

---
Source: https://tomesphere.com/paper/1706.05719