# A Study on Trends In Information Technologies using Big Data Analytics

**Authors:** Mahmut Ali Ozkuran

arXiv: 1703.09664 · 2017-03-29

## TL;DR

This paper analyzes big data from StackExchange and GitHub to identify and predict trends in information technologies, providing insights for decision makers and IT professionals.

## Contribution

It introduces a comprehensive methodology combining data mining, preprocessing, clustering, and time series forecasting to analyze IT trends from large-scale data sources.

## Key findings

- Identified key trends in programming languages, databases, cloud services, and mobile OS.
- Developed accurate forecasts of technology usage patterns.
- Provided visualizations and insights for strategic decision making.

## Abstract

We are living in an information era from Twitter to Fitocracy every episode of peoples life is converted to numbers. That abundance of data is also available in information technologies. From Stackoverflow to GitHub many big data sources are available about trends in Information Technologies. The aim of this research is studying information technology trends and compiling useful information about those technologies using big data sources mentioned above. Those collected information might be helpful for decision makers or information technology professionals to decide where to invest their time and money. In this research we have mined and analyzed StackExchange and GitHub data for creating meaningful predictions about information technologies. Initially StackExchange and GitHub data were imported into local data repositories. After the data is imported, cleaning and preprocessing techniques like tokenization, stemming and dimensionality reduction are applied to data. After preprocessing and cleaning keywords, their relations are extracted from data. Using those keywords data, four main knowledge areas and their variations, i.e., 20 Programming Languages, 8 Database Applications, 4 Cloud Services and 3 Mobile Operating Systems, are selected for analysis of their trends. After the keywords are selected, extracted patterns are used for cluster analysis in Gephi. Produced graphs are used for the exploratory analysis of the programming languages data. After exploratory analysis, time series of usage are created for selected keywords. Those times series are used as training and testing data for forecasts created using R forecast library. After making forecasts, their accuracy are tested using Mean Magnitude of Relative Error and Median Magnitude of Relative Error.

---
Source: https://tomesphere.com/paper/1703.09664