Data4UrbanMobility: Towards Holistic Data Analytics for Mobility Applications in Urban Regions
Nicolas Tempelmeier, Yannick Rietz, Iryna Lishchuk, Tina Kruegel, Olaf, Mumm, Vanessa Miriam Carlow, Stefan Dietze, Elena Demidova

TL;DR
This paper introduces the Data4UrbanMobility platform and tools for integrated urban mobility data analytics, aiming to support holistic understanding and planning of multi-modal transportation in cities.
Contribution
It presents a novel platform and a citizen science app for integrating diverse mobility data sources and analyzing intermodal journeys in urban environments.
Findings
Development of the D4UM platform for holistic data analytics
Introduction of the MiC app for citizen-driven intermodal mobility data collection
Demonstration of use cases showing potential for improved urban mobility insights
Abstract
With the increasing availability of mobility-related data, such as GPS-traces, Web queries and climate conditions, there is a growing demand to utilize this data to better understand and support urban mobility needs. However, data available from the individual actors, such as providers of information, navigation and transportation systems, is mostly restricted to isolated mobility modes, whereas holistic data analytics over integrated data sources is not sufficiently supported. In this paper we present our ongoing research in the context of holistic data analytics to support urban mobility applications in the Data4UrbanMobility (D4UM) project. First, we discuss challenges in urban mobility analytics and present the D4UM platform we are currently developing to facilitate holistic urban data analytics over integrated heterogeneous data sources along with the available data sources.…
| Source | Description | Granularity / Size | Timespan | License | Format |
| Traffic Data | |||||
| Traffic flow | Average car traffic speed records in Lower Saxony, Germany. | Average traffic speed per road segment and time interval (15 min) | September 2017 - January 2019 | Commercial | CSV |
| Traffic feeds | Traffic warnings and incidents999E.g. in Lower Saxony, Germany: http://www.vmz-niedersachsen.de/ . | 25000 notifications | June 2017-January 2019 | Provider-specific | RSS feeds / XML |
| Public Transport Information | |||||
| Public transportation query logs | Query logs for public transportation routes and timetables. | queries per month | October 2016-January 2019 | Commercial | CSV |
| Global Transit Feed Specification (GTFS) data | Public transportation timetable information for Lower Saxony, Germany. | 8800 stops, 2600 routes, stop times | until January 2019 | Open Data | CSV |
| City Data | |||||
| Rainfall data | Volume of rain in a region. | 1 Record / hour / km2 | 2005-2019 | Open data | CSV |
| Social Media | |||||
| Event- and location-centric tweets from Twitter API. Traffic information channels101010E.g. from Hannover police and Üstra (a public transportation provider) https://twitter.com/Polizei_H, https://twitter.com/uestra. | German tweets from Twitter API | May 2017-January 2019 | Twitter API license | JSON | |
| Web | |||||
| Event-centric Web markup | Annotated Web pages, e.g. using schema.org. | Web Data Commons event subset: facts | until November 2017 | Common Crawl ToU | RDFa, MicroData |
| Focused crawls | Event-centric crawls, news111111Collected using iCrawl toolbox http://icrawl.l3s.uni-hannover.de/. | 22000 events located in Hannover, Germany | Crawl-specific | Provider-specific | HTML |
| EventKG | Multilingual event-centric temporal knowledge graph | events and temporal relations (V1.1) | until today | Creative Commons Attribution Share Alike 4.0 | RDF |
| OpenStreetMap | Geometries (points, lines, polygons) annotated with properties. Subset for parts of Lower Saxony. | facts | until January 2019 | ODbL | XML |
| Mode | Number | Median Duration | Accuracy |
|---|---|---|---|
| Bicycle | 16 | 10 min. | 100% |
| Car | 14 | 9.5 min. | 92.9% |
| Tram | 13 | 7 min. | 76.9% |
| Bus | 15 | 12 min. | 73.3% |
| Total | 58 | 11 min. | 86.2% |
| Property | Value |
|---|---|
| Number of Users | 78 |
| Number of Recorded Trips | 218 |
| Average Trip Duration | 69 min. |
| Number of Captured GPS Points | 92.550 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Data4UrbanMobility: Towards Holistic Data Analytics for Mobility Applications in Urban Regions
Nicolas Tempelmeier, Yannick Rietz, Iryna Lishchuk, Tina Kruegel, Olaf Mumm, Vanessa Miriam Carlow, Stefan Dietze, Elena Demidova
L3S Research Center, Leibniz Universität Hannover
Institute for Sustainable Urbanism, TU Braunschweig
PROJEKTIONISTEN GmbH
Institute for Legal Informatics, Leibniz Universität Hannover
GESIS - Leibniz Institute for the Social Sciences
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected], [email protected]
Abstract.
With the increasing availability of mobility-related data, such as GPS-traces, Web queries and climate conditions, there is a growing demand to utilize this data to better understand and support urban mobility needs. However, data available from the individual actors, such as providers of information, navigation and transportation systems, is mostly restricted to isolated mobility modes, whereas holistic data analytics over integrated data sources is not sufficiently supported. In this paper we present our ongoing research in the context of holistic data analytics to support urban mobility applications in the Data4UrbanMobility (D4UM) project. First, we discuss challenges in urban mobility analytics and present the D4UM platform we are currently developing to facilitate holistic urban data analytics over integrated heterogeneous data sources along with the available data sources. Second, we present the MiC app - a tool we developed to complement available datasets with intermodal mobility data (i.e. data about journeys that involve more than one mode of mobility) using a citizen science approach. Finally, we present selected use cases and discuss our future work.
1. Introduction
Contemporary urban mobility behavior is undergoing a rapid transition and paradigm shift and also is affected by a wide range of short-, medium- as well as long-term factors. These vary from aspects such as immediate climate conditions, regional construction sites or ephemeral events to long-term trends such as increasing negative environmental impacts and the widespread adoption of intermodal mobility chains (i.e. journeys involving several means of transportation and mobility such as walking, cycling, public transportation, etc.). Moreover, the lifestyle- and population-induced new demand for mobility leads to region-specific, ecological and traffic related problems in growing metropolitan areas and is a limiting factor for urban development.
Traditional urban traffic planning relies on complex models, combining trip generation, trip distribution, mode choice and route choice (McNally, 2008) to simulate travel demands. Data foundations of these models essentially consist of traffic census, socio-demographic data as well as infrastructural and service data, where large-scale and accurate data about actual mobility behavior, in particular in the context of intermodality, is costly to obtain and the aforementioned contextual factors are largely ignored.
Throughout the last decade, the widespread use of Web and mobile applications has led to an increasing availability of data, which captures both actual mobility needs and usage as well as contextual information about, for instance, traffic incidents, city events and weather conditions. This data has the potential to complement existing data sources and information systems currently used for handling urban mobility processes. In particular, within densely populated areas, the correlation of mobility behavior with data obtainable from mobility apps, public transportation websites and social media streams can uncover more complex dependencies and aid the development of supervised models which consider a wide variety of features and enable predictions of future needs.
Despite an overall increasing availability of datasets related to mobility behaviour, this data typically focuses on one mode of transportation, most prominently covering individual traffic and, to some extent, public transportation services. Other modes of mobility that become particularly important in modern cities, e.g. cycling and walking, are typically not captured at scale. Furthermore, data sources coming from particular mobility services and transportation providers do not adequately capture intermodal mobility sequences. However, such data is of utmost importance to better understand and predict the actual demand in mobility and associated services.
The challenges related to collection and analysis of such data are manifold. They include the provision of tools and methods to capture and analyse intermodal mobility, designing incentives for city inhabitants to share their mobility data, protection of personal data to be collected in accordance to the legal framework as well as data analytics methods and models to provide added value for the individual participants, mobility service providers and city authorities.
In this work, we present our ongoing research within the Data4UrbanMobility project (D4UM)111http://data4urbanmobility.l3s.uni-hannover.de/. This research aims at gathering, augmenting and analysing mobility data in urban regions to address the problems presented above. We introduce the D4UM platform built to facilitate collection and analysis of mobility-related data from a variety of sources on a long term. In the project we in particular focus on the datasets available in the urban regions of Hanover, Wolfsburg and Brunswick (Germany), whereas the results will be transferable to other urban regions. The overall contributions of the project include a large annotated data catalogue including regionally and globally relevant mobility-related data sources, models built upon these data sources and analytic results in the particular regions.
Furthermore, the paper introduces the MiC-App - a human-centric data tool developed in the project framework to capture individual movements – tracks and modes – to complement existing sources in the context of intermodal urban mobility. Combined with other available data sources, the data collected through the MiC-App and the underlying D4UM platform facilitates the capturing and the analyses of data to better understand the growing demand in intermodal mobility in urban regions.
2. Problem Description
The aim of the D4UM project is to facilitate efficient analytics and data-driven estimates of mobility demand and evaluation of mobility network quality, addressing the needs of different stakeholders. The stakeholders of the project include individual citizens, city administrations, mobility service providers and urban traffic planners. The aims of these stakeholders can be categorised with respect to the different time horizons, as follows:
- •
Short Term: Facilitate efficient access to mobility services, taking into account temporary high load (citizens).
- •
Medium Term: Provide new and optimise existing mobility services (mobility service providers).
- •
Long Term: Facilitate data-driven integrated urban planning processes (urban traffic planers, city administrations).
To address these problems the project develops the D4UM platform that facilitates collection and analytics of relevant data sources. The research questions that can be addressed using this platform include e.g.:
- •
Which external conditions such as climate or spatial structures influence the typical movement patterns of city inhabitants?
- •
How are the urban movement patterns influenced by the external factors, such as e.g. weather conditions, given specific urban contexts and mobility options?
- •
How can an increased demand in mobility services be determined and addressed?
3. D4UM Platform
This section describes the architecture of the D4UM platform. This platform enables integrated analytics of heterogeneous data sources for different stakeholders in the context of urban mobility. Figure 1 provides an overview of the platform layers and its individual components.
The input layer of the D4UM platform consists of heterogeneous data sources that cover various aspects of urban mobility. These data sources are described in Section 4 in more detail.
The data aggregation and integration layer conducts all necessary pre-processing and transforms these data sources to comply with the D4UM data model. This data model formally specifies an integrated schema and establishes spatial, temporal and contextual connections across these sources, thus facilitating integrated analytics going across the dataset boundaries. For example, traffic speed records are aligned with the street segments obtained from the OpenStreetMap data.
The D4UM platform conducts long-term collection of mobility-related data from data streams to create an overview of relevant mobility data over longer time periods (i.e. months or years) within the long-term data collection component. In particular, recording of dedicated streaming sources such as traffic warnings published as RSS feeds or data extracted from social media channels (e.g. police channels on Twitter) allows for long-term analytics to observe patterns and temporal fluctuations, which would otherwise not be possible. Data enrichment facilitates collection of supplemental data by employing Web mining (e.g. by identifying events in social media streams) and citizen science tools (e.g. by collecting data with the MiC App presented in Section 5).
The data analytics layer introduces models that build on the integrated data. For example, analysis of long-term data collections can be employed to identify typical urban movement patterns. For instance, spatio-temporal dependencies between traffic conditions on different roads can reveal structural problems within the road network (Feuerhake et al., 2018; Liang et al., 2017). Another example is the analysis of the impact of external factors (e.g. heavy rain or snowfall) on mobility behaviour (Soua et al., 2016). Data-based forecasts can make use of long-term data collections to predict urban mobility patterns in the future. For example, in the presence of planned special events (e.g. football matches or concerts), an increased load on the mobility infrastructure such as roads and public transportation capacity can be estimated (Zhou et al., 2016; Rodrigues et al., 2017).
Dedicated interfaces such as APIs and graphical user interfaces grant access to the data analytics results. APIs serve as a conceptual abstraction from the complex models and provide information in a machine readable form, such as the GeoJSON222https://tools.ietf.org/html/rfc7946 format. Graphical interfaces such as map layers graphically present the analysis results and allow for embedding a map into Web pages.
Finally the services, apps and pilots layer makes the analysis results available for end users. In particular, we develop a dashboard Web application that can be used by city planers to analyse traffic patterns on the long term, while citizens can use the MiC app to derive insights about their own mobility behaviour and voluntarily contribute data to the D4UM system.
4. Integration of Web-based Mobility Data
An overview of existing data sources from which we currently extract regional and other relevant information in the context of urban mobility is presented in Table 1. The majority of these data sources (except the traffic flow information) are Web-based. Traffic Data information sources include traffic flow information and traffic feeds. Traffic flow data, provided as aggregated Floating Car Data (FCD), reflects the average speed of the road traffic with respect to the individual road segments, whereas traffic feeds contain traffic warnings and accident notifications provided as RSS feeds. Public Transport Information includes public transportation query logs and GTFS data. We obtain query log data from the EFA-system333https://www.efa.de. EFA is the official Web service for routing and time table information of public transportation for the region of Hannover. In this context, a query is a request for a public transportation route with specified origin, destination and departure time, issued via a Web-interface or a mobile application. Moreover, we consider Global Transit Feed Specification (GTFS)444https://developers.google.com/transit/gtfs/ data which provides timetable information for public transportation such as departure times, routes, etc. This data is complemented with regional weather conditions, which can potentially impact the selection of the transportation mode and routes.
Next to such directly related mobility data, we consider additional information obtained from social media and the Web. We consider social media data obtained from the Twitter streaming API555https://developer.twitter.com/en/docs/tweets/filter-realtime/overview, which potentially contains information about regional events as well as traffic incidents. Web data includes event-centric Web markup, which is prevalent in Web pages through standards such as RDFa666RDFa W3C recommendation: http://www.w3.org/TR/xhtml-rdfa-primer/, and Microdata777http://www.w3.org/TR/microdata. We currently investigate in particular data complying with schema.org as the most established markup vocabulary on the Web so far (Meusel et al., 2014). In our previous work we developed methods to infer missing categorical information in noisy and sparse Web markup data (Tempelmeier et al., 2018), increasing usefulness of this data for event-centric applications. Furthermore, we consider event-centric focused crawls from the Web (Gossen et al., 2015) and Web archives (Gossen et al., 2017), (Gossen et al., 2018), as well as Twitter data regarding events and traffic. Another source of event-centric information is the recently proposed EventKG knowledge graph (Gottschalk and Demidova, 2018, 2019). Finally, this information is complemented with geographic data (e.g. street networks, locations of event venues) obtained from OpenStreetMap888http://www.openstreetmap.org/. The information contained in these sources is highly complementary.
These data sources can be used to address parts of the afore introduced research questions. For instance, in the context of events, the traffic flow and public transportation query data can be used to determine typical patterns while Web, Web archives, knowledge graphs and social media sources can provide information about planned special events such as concerts or football matches. Another example is the computation of the average load on the roads with respect to the external factors, e.g. weather. These approaches require integration of data originating at several data sources, where semantic data descriptions, methods of dataset profiling (Ellefi et al., 2018), (Dietze et al., 2019) and data quality analytics play an important role.
However, for comprehensive analysis of urban mobility some data is still missing. In particular foot walks, bicycle rides and intermodal data is not captured by any of the existing data sources. On the contrary, these modes become increasingly popular and thus important for urban mobility applications. To this end, the project introduces the MiC App, which can be used to collect the required data.
5. MiC app
To address the lack of data regarding foot walks, bicycle rides and intermodal trips, the project introduces the Move in the City (MiC) app. This app poses a citizen science inspired method to gather movement data. Users of the MiC app can capture GPS traces of their trips with their mobile devices and voluntarily contribute the data to the D4UM project. In return, individual movement statistics are provided to the user. Moreover, the increased amount of available mobility data enables novel analytics of mobility behaviour. Based on the analytics, previously unseen problems can be identified and addressed, creating further benefits for the user and for the mobility ecosystem as a whole.
Movement data captured by the MiC app undergoes a preprocessing routine tailored to the specific needs of urban mobility analytics. Different modes of mobility such as walking, cycling, driving and the use of public transportation are automatically distinguished. Furthermore, if public transportation is used, the data is enriched with additional information such as entry stop, exit stop and the public transportation line used. This way, the data captured by the MiC app represents a valuable data source for further mobility analytics approaches.
5.1. Architecture
The architecture of the MiC app aims at keeping a low computation effort on the user device to reduce the energy consumption of the app. Furthermore, the architecture is platform independent to enable as many users as possible to use the app. To this end, MiC makes use of established, flexible Web technologies such as Web-based user interfaces and the MQTT protocol121212http://mqtt.org/. To further ensure platform independence, the app does not directly access sensor data of the mobile devices. Instead, the relatively new, built-in activity recognition APIs of the mobile operating systems are used. In addition to the activity recognition data, the app records fine grained location data.
Figure 2 provides an overview on the data flow. Data recorded by the mobile client is sent to a Web-based endpoint which stores the recorded data in a database. The data is then enriched with information about public transportation, obtained by querying the local information system for public transportation. Finally, a Web-based endpoint provides an API that makes the data accessible for data analytics applications.
5.2. App Design
MiC aims at motivating the users to use the app by providing an appealing and easy to use interface. Figure 3 presents two exemplary screenshots of the MiC app. The start view can be seen in Figure 3(a). We achieve a lightweight user interaction by only requiring users to press one button to start or stop the recording of their movement. Figure 3(b) depicts the view presenting statistics of the individual user. The view presents the fraction of the user transportation modes by providing absolute and relative numbers as well as a graphical visualisation. Ideas for further user statistics include visualisation of the environmental impact of the users mobility behavior or the overall contribution to the urban mobility network. Figure 3(c) presents the map view, where the user can visualise captured trip data, i.e. GPS traces, on a map.
5.3. Piloting
The first closed test of the system was conducted by five participants who made 41 trips over the duration of five days, where the median duration of the recorded trips took 24 minutes. The participants wrote detailed travel diaries for these trips that were then used as a ground truth to assess the accuracy of the mode of travel classification. Due to limited GPS signals paths in underground trains, they were excluded from the study.
Table 2 presents the median duration and accuracy of transportation mode recognition of the recorded trips with respect to mode of transportation, where the total row summarises all recorded trips. Trips that include multiple modes of transportation were divided into individual trips with only one mode of transportation. In addition to classifying the correct transportation mode, we only considered tram and bus trips to be recognised correctly, if the correct entry stop, exit stop and public transportation line was identified by the system.
In total, 86.2% percent of the trips were recognized correctly by the system. The highest accuracy was achieved for the recognition of bicycle trips (100%). This is due to the relatively clear movement signature of riding a bicycle, which is well recognised by the activity recognition of the smartphones. The second highest accuracy is achieved for cars. Even though trams and busses achieve the lowest accuracy, the absolute accuracy is at least 73%. The reduced accuracy for the two classes is likely to be caused by the additional constraints, i.e. the recognition of start stop, entry stop and public transportation line.
The first public test phase of the system is currently ongoing. The application beta testing platforms by Apple and Google are being used to carry out the application. Users were recruited at universities and expositions. Table 3 presents statistics about data currently captured by the MiC app. To today, 78 participants were found to test the system under real-world conditions.
5.4. Data Protection
To ensure data protection compliant to latest regulations (e.g. the General Data Protection Regulation by the European Union), a concept was developed in joint work with researchers from legal informatics that includes the following components:
- •
The consent of app users (Article 6 GDPR);
- •
A privacy policy (Article 13 GDPR);
- •
Data pseudonymisation (Article 89 GDPR);
- •
The agreement between joint controllers (Article 26 GDPR).
The MiC app has an integrated privacy policy and consent form providing app users with the information about the data procession, data rights and possibility to consent. It contains details about the types of data the MiC app collects, the collection procedure, purposes of research, data storage and data sharing within the D4UM project, data rights of the users (i.a. access, deletion, withdrawal of consent) and contact details of the app providers.
The geo-referenced data collected by the MIC app is processed in pseudonymised form. By this, the data is purified from all personal identifiers, such as name, e-mail, IMEI number (a cellphone serial number), etc. Pseudonymisation is chosen as a de-identification measure (in contrast to anonymisation or irreversible de-identification) to enable enforcement of the data subjects’ rights to access and deletion of the data, as may be requested by the users. This is part of an integrated and innovative citizen science approach which enables everyone to actively participate in research and planning processes with a strong effort to establish new standards for data sovereignty of citizens and cities.
6. Use Cases
In this section we discuss two exemplary use cases for the D4UM platform and the MiC app.
6.1. Urban Mobility in Presence of Planned Special Events
Large-scale planned special events such as football games, concerts, etc. are known to have impact on both road traffic (Kwoczek et al., 2014) and public transportation services (Rodrigues et al., 2017). Analysing mobility behaviour in such situations is particularly challenging due to a great number of influence factors (e.g. target audience, weather conditions, public transportation infrastructure). Moreover, large-scale events typically have an impact on all modes of transportation, which are mutually dependent. Thus, a holistic approach, which considers data about all modes of transportation as well as external factors, is required to gain a better understanding of the mobility behaviour in presence of planned special events.
We illustrate the complexity of the problem at the example of a football game that took place in the city of Hannover (Germany) on December 17th, 2017 at 3:30pm. Figure 4 presents the conditions around the football stadium in Hannover for both road traffic and public transportation services. Figure 4(a) depicts the road traffic conditions around the stadium half an hour before the start of the game. As we can observe, the roads with a high load (red color) as well as the roads with good traffic conditions (green color) are present nearby the stadium. This illustrates that the impact of planned special events might be complex and such impact does not necessarily evenly spread around the event venue, but might be subject to other factors such as road topology or availability of parking spaces. Figure 4(b) presents the number of queries to the public transportation information system that were issued for the bus stop near the stadium during the day of the football game. We observe a relatively low number of queries during the morning hours. At 9am, the number of queries starts to rise and reaches their maximum at 2:30pm, one hour before the game starts. We assume that an increased number of queries in the temporal proximity of the event is likely related to the football game. Overall, we observe, that a single factor, i.e. the taking place of a large-scale event, can impact both road traffic and public transportation simultaneously. Therefore, analysis of these effects should take both mobility modes into account. To this extent, the D4UM platform presented in this paper aims at holistic data analysis of urban mobility data.
6.2. Analytics of Urban Mobility Infrastructure
The rapid growth of cities has lead to an increased demand of suitable urban mobility infrastructure. New services such as car sharing, (e-)bike sharing and ride sharing have emerged and begin to gain importance within the mobility services ecosystem. However, data about these modes of transportation is typically only sporadically available. Moreover, intermodal data which captures different modes of transportation on a single journey is evenly rare.
The interdependencies between individual and public transportation increase the complexity of the problem. For instance, a poor coverage of public transportation services might lead to an increased load on the road infrastructure. In turn, an overloaded road system can lead to an increased demand of public transportation services or bicycle tracks.
Therefore it is crucial for city planers and mobility service providers to conduct holistic analysis, which take all modes of transportation into account. Such analysis should include the identification of:
- •
typical mobility patterns
- •
demand of mobility services
- •
interdependencies between transportation modes
- •
coverage of public transportation services
The D4UM platform enables the holistic analysis of urban mobility data in these scenarios. Furthermore, the MiC app presented in this paper is a valuable approach to capture rarely available data about urban mobility behaviour, including bicycle rides and intermodal data and complement available data sources.
7. Related Work
In this section we discuss related work in the area of smart city mobility systems as well as approaches to predict individual aspects of urban mobility.
7.1. Smart City Mobility Systems
(Moustaka et al., 2018) conducted a systematic review of smart city data analysis approaches and provide taxonomies for data sources, data analytic methods and smart city services. Urban CPS (Zhang et al., 2015) is a cyber physical system that integrates floating car data, cellphone data and public transportation data to make prediction about real time traffic speeds. (Lécué et al., 2014) employ semantic technologies to develop STAR-CITY, a system for traffic prediction and reasoning, used for spatio-temporal analysis of the traffic status as well as for the exploration of contextual information such as nearby events. (Berlingerio et al., 2013) presents a system that makes use of cellphone data to analyse the demand of public transportation services within a city. While these approaches focus on predictions for one mode of mobility only and mainly use a single dataset as their primarily data source, we consider the other modes of transportation as well, e.g. bicycle rides or public transportation. (Anthopoulos and Fitsilis, 2010) provides a summary of common enterprise, logical and physical architectures for digital smart city applications. While the authors consider architectures for general smart city applications, we focus on the mobility domain. (Wang et al., 2008) proposed the Compressed Start-End Tree (CSE-tree), a spatio-temporal index structure that can be used to effectively index and retrieve temporal GPS data which is in particular relevant for smart city mobility applications.
7.2. Urban Mobility Analytics
Recently a number of studies addressed several individual prediction tasks at the interface of the urban infrastructure and mobility. We consider these approaches as potential use cases for the D4UM platform.
Traffic Forecasting: Short-term forecasting of urban road traffic has been the focus of numerous studies where data sparsity is a particular challenge. (Wang et al., 2014) tackles the problem by using sparse FCD (floating car data) and a context-aware tensor decomposition approach to estimate travel times for road segments for which no FCD is available. The information is then used to estimate the required travel time for a given route. (Meng et al., 2017) proposed a framework for the city-wide inference of traffic volume. They make use of a semi-supervised learning algorithm that can be used with sparse loop detector data as well as taxi GPS data. Similar, (Wang et al., 2016) employs a hidden Markov model to estimate traffic speeds of a road network based on sparse FCD where the speed to be estimated on a single road is considered as a hidden state.
Social Network Data: Further approaches utilized location-based social network (LBSN) data. (Song et al., 2017) leverages LBSN data to identify functional urban regions. They employ latent Dirichlet allocation and unsupervised machine learning algorithms to determine the regions. (Li et al., 2018) investigated the general predictability of LBSN data. They conducted a case study on Foursquare datasets, where users indicate their geographic location, i.e. the users can indicate that they are at a certain event venue. The authors do not focus on a specialised prediction task, but provide general insights on working with the aforementioned data. (Yin et al., 2017) make use of LBSN data to infer boundaries of functional regions in urban environments. They construct a mobility network from spatial user interactions and delineate boundaries by identifying strongly connected communities within the network space. (Ni et al., 2017) detects events from social media by employing a hashtag-based algorithm. The event information extracted from the social media is then used for prediction of the public transportation flow.
Structural Analysis: Another class of approaches focuses on discovering structural dependencies within cities by analysing urban mobility data. (Anwar et al., 2016) propose a method to keep track of the congestions in urban road networks to identify unstable road segments. (Jin et al., 2016) make use of context-aware tensor decomposition to identify so called urban black holes, i.e. traffic anomalies with a greater inflow than outflow. (Hong et al., 2015) detect urban black holes by using a grid-based index-structure that is build on top of a spatio-temporal graph representing the road network. They extract candidate cells from the index which then are used to determine the exact subgraphs that are urban black holes. (Liang et al., 2017) identify cascading patterns on congested roads. They propose a generative probabilistic model that maximises the likelihood of a cascade to be present with respect to the observed traffic data. (Kempinska et al., 2018) employs topic modelling to analyse urban street works. They proposed the concept of interactional regions, i.e. regions that commonly bound routes within the street networks. (Wang and Li, 2017) leverage taxi flow data to learn vector representation of city regions. The representations are then used to make predictions about the regions such, e.g. crime rate, average income or average house prices. (Ma et al., 2015) employ deep learning techniques, i.e. a combination of restricted Boltzmann machines and recurrent neural networks, to learn high-dimensional congestion patterns from taxi GPS data. (Pan et al., 2013) make use of taxi GPS-trajectories to classify the land use of urban areas. They propose an iterative DBSCAN algorithm to cluster regions with respect to the frequency with which passengers are picked up or set down. They make use of the same information to classify the land use of regions, e.g. the land use for hospitals or commercial districts. (Zhang et al., 2018) proposes the infinite urbanization process model that employs a topic modelling approach to simultaneously discover the function of an urban region (i.e. the distribution of present shops, restaurants, etc.) and to estimate the region popularity (e.g. in terms of real estate prices). Similar, (Sun et al., 2018) extracts urban regions of interest from online map search queries. They propose a spatio-temporal latent factor model which identifies travel patterns that influence points of interest.
Special Traffic Conditions Several approaches target the identification of problematic segments and areas of urban networks under specific conditions (e.g. planed special events). (Kwoczek et al., 2015) propose the use of an artificial neuronal network to identify road segments that are typically affected by planned special events that take place at a particular venue. (Zhou et al., 2016) proposed an approach to detect events from traffic data. They make use of a two-dimensional grid to partition the space. The authors then infer a graph-based representation of the grid that captures the flow of vehicles between the individual cells of the grid where the root of the graph is located at the events location. Finally, (Rodrigues et al., 2017) investigate the effect of public events on the public transportation network. They propose a Bayesian additive model that can be employed to gain an understanding of public transportation demand in the presence of events. I.e. the model is able to predict the number of public transportation trips to the venues where the respective event takes place.
8. Conclusions & Future Work
In this paper we presented our ongoing work towards holistic urban data analytics conducted in the context of the Data4UrbanMobility project. We presented the D4UM platform that facilitates seamless long-term analytics of heterogeneous mobility-related data sources, including but not limited to floating car data, weather conditions, traffic warnings and Web queries. Furthermore, we presented the MiC app - a citizen science application that facilitates complementing this data with intermodal mobility patterns of city inhabitants. In our future work we intend to further increase the MiC app user base, and implement further use cases on top of the integrated D4UM platform.
9. Acknowledgements
This work was partially funded by the Federal Ministry of Education and Research (BMBF), Germany, ”Data4UrbanMobility” project, grant ID 02K15A040.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Anthopoulos and Fitsilis (2010) L. Anthopoulos and P. Fitsilis. 2010. From Digital to Ubiquitous Cities: Defining a Common Architecture for Urban Development. In Proceedings of the 2010 Sixth International Conference on Intelligent Environments . 301–306.
- 3Anwar et al . (2016) Tarique Anwar, Chengfei Liu, Hai L. Vu, and Md. Saiful Islam. 2016. Tracking the Evolution of Congestion in Dynamic Urban Road Networks. In Proceedings of the ACM CIKM 2016 .
- 4Berlingerio et al . (2013) Michele Berlingerio, Francesco Calabrese, Giusy Di Lorenzo, Rahul Nair, Fabio Pinelli, and Marco Luca Sbodio. 2013. All Aboard: A System for Exploring Urban Mobility and Optimizing Public Transport Using Cellphone Data. In Machine Learning and Knowledge Discovery in Databases , Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Železný (Eds.). 663–666.
- 5Dietze et al . (2019) Stefan Dietze, Elena Demidova, and Konstantin Todorov. 2019. RDF Dataset Profiling. In Encyclopedia of Big Data Technologies.
- 6Ellefi et al . (2018) Mohamed Ben Ellefi, Zohra Bellahsene, John G. Breslin, Elena Demidova, Stefan Dietze, Julian Szymanski, and Konstantin Todorov. 2018. RDF dataset profiling - a survey of features, methods, vocabularies and applications. Semantic Web 9, 5 (2018), 677–705.
- 7Feuerhake et al . (2018) U. Feuerhake, O. Wage, M. Sester, N. Tempelmeier, W. Nejdl, and E. Demidova. 2018. Identification of similarities and prediction of unknown features in an urban street network. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4 (2018), 185–192.
- 8Gossen et al . (2015) Gerhard Gossen, Elena Demidova, and Thomas Risse. 2015. i Crawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling. In Proceedings of the JCDL’15 .
