How to perform research in Hadoop environment not losing mental equilibrium - case study
Piotr Jan Dendek, Artur Czeczko, Mateusz Fedoryszak, Adam, Kawa, Piotr Wendykier, Lukasz Bolikowski

TL;DR
This paper provides guidelines and best practices for conducting efficient and cost-effective research in the Apache Hadoop environment, illustrated through the Content Analysis System case study.
Contribution
It introduces a set of practical guidelines and tools for research in Hadoop, based on the development and use of the CoAnSys environment.
Findings
Established best practices for Hadoop-based research
Demonstrated cost reduction strategies in software engineering
Provided a case study of CoAnSys implementation
Abstract
Conducting a research in an efficient, repetitive, evaluable, but also convenient (in terms of development) way has always been a challenge. To satisfy those requirements in a long term and simultaneously minimize costs of the software engineering process, one has to follow a certain set of guidelines. This article describes such guidelines based on the research environment called Content Analysis System (CoAnSys) created in the Center for Open Science (CeON). Best practices and tools for working in the Apache Hadoop environment, as well as the process of establishing these rules are portrayed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Cloud Computing and Resource Management · Big Data Technologies and Applications
