Grids versus Graphs: Partitioning Space for Improved Taxi Demand-Supply Forecasts
Neema Davis, Gaurav Raina, Krishna Jagannathan

TL;DR
This paper compares spatial partitioning methods for taxi demand-supply forecasting, proposing a GraphLSTM approach for variable-sized Voronoi partitions and demonstrating its effectiveness against ConvLSTM with ensemble techniques.
Contribution
Introduces GraphLSTM for variable-sized Voronoi partitions and combines it with ConvLSTM using ensemble learning for improved forecasting performance.
Findings
GraphLSTM performs competitively with ConvLSTM at lower computational cost.
Voronoi tessellation can be effectively modeled using GraphLSTM.
Ensemble of GraphLSTM and ConvLSTM yields superior forecasting accuracy.
Abstract
Accurate taxi demand-supply forecasting is a challenging application of ITS (Intelligent Transportation Systems), due to the complex spatial and temporal patterns. We investigate the impact of different spatial partitioning techniques on the prediction performance of an LSTM (Long Short-Term Memory) network, in the context of taxi demand-supply forecasting. We consider two tessellation schemes: (i) the variable-sized Voronoi tessellation, and (ii) the fixed-sized Geohash tessellation. While the widely employed ConvLSTM (Convolutional LSTM) can model fixed-sized Geohash partitions, the standard convolutional filters cannot be applied on the variable-sized Voronoi partitions. To explore the Voronoi tessellation scheme, we propose the use of GraphLSTM (Graph-based LSTM), by representing the Voronoi spatial partitions as nodes on an arbitrarily structured graph. The GraphLSTM offers…
|
|
|
|
|||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Minimum | 0 | 0 | 0 | |||||||
| Maximum | 630 | 2913 | 1582 | |||||||
| Mean | 14.1 | 10.1 | 19.5 | |||||||
| Median | 13 | 7 | 16 | |||||||
| Skewness | 0.77 | 5.78 | 1.08 | |||||||
| Kurtosis | 1.31 | 114.5 | 3.56 | |||||||
| Standard Deviation | 10.1 | 9.07 | 14.9 | |||||||
| Periodicity (in time steps) | 12, 24 | 12, 24, 168 | 12, 24, 168 |
| Range of Hyper-parameters that are fed to TPE-BO |
|---|
| Number of layers, L = [1, 2] |
| Number of neurons, n = [10, 20, 50, 100] |
| Dropout, D = Uniform (0,0.5) |
| Activation = [Sigmoid, Relu, Linear] |
| Optimizer = [Adam, Stochastic Gradient, RMSprop] |
| Learning Rate = [
|
| Batch Size = [64, 128] |
| Spatio-Temporal Models | Bengaluru Demand | Bengaluru Supply | NYC Demand | |||||||||||||||||||||||||
| MASE | SMAPE | RMSE | MASE | SMAPE | RMSE | MASE | SMAPE | RMSE | ||||||||||||||||||||
| ARIMA | G | 1.31 | 48.5 | 12.36 | 18.0 | 191.1 | 57.3 | 1213.6 | 169.8 | 447.2 | ||||||||||||||||||
| V | 1.13 | 43.5 | 8.14 | 4.68 | 95.4 | 15.8 | 5.01 | 125.5 | 27.6 | |||||||||||||||||||
| ARIMAX | G |
|
|
|
|
|
|
|
|
|
||||||||||||||||||
| V |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||
| LSTM | G |
|
|
|
|
|
|
|
|
|
||||||||||||||||||
| V |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||
| ConvLSTM | G | 0.37 1.99 | 9.1 4.8 | 2.16 1.52 | 1.51 2.23 | 35.2 12.7 | 7.73 2.03 | 40.5 0.05 | 12.8 15.4 | 36.8 10.4 | ||||||||||||||||||
| GraphLSTM | V | 0.72 0.15 | 16.1 3.66 | 4.99 2.35 | 0.93 0.50 | 21.8 4.77 | 6.15 4.96 | 0.68 0.16 | 17.4 4.32 | 6.33 4.51 | ||||||||||||||||||
| G | 0.73 0.20 | 15.6 5.1 | 6.2 4.75 | 0.92 0.38 | 20.7 5.9 | 6.79 5.91 | 0.71 0.68 | 11.9 4.56 | 48.2 29.3 | |||||||||||||||||||
|
P |
0.32 0.90 |
8.5 3.7 |
2.25 1.90 |
0.81 0.96 |
17.7 4.51 |
4.61 2.73 |
0.43 0.17 |
8.90 3.6 |
6.48 5.5 |
|||||||||||||||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · ConvLSTM · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
Grids versus Graphs: Partitioning Space for Improved Taxi Demand-Supply Forecasts
Neema Davis, Gaurav Raina, Krishna Jagannathan N. Davis, G. Raina and K. Jagannathan are with the Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. E-mail: {ee14d212, gaurav, krishnaj}@ee.iitm.ac.in
Abstract
Accurate taxi demand-supply forecasting is a challenging application of ITS (Intelligent Transportation Systems), due to the complex spatial and temporal patterns. We investigate the impact of different spatial partitioning techniques on the prediction performance of an LSTM (Long Short-Term Memory) network, in the context of taxi demand-supply forecasting. We consider two tessellation schemes: (i) the variable-sized Voronoi tessellation, and (ii) the fixed-sized Geohash tessellation. While the widely employed ConvLSTM (Convolutional LSTM) can model fixed-sized Geohash partitions, the standard convolutional filters cannot be applied on the variable-sized Voronoi partitions. To explore the Voronoi tessellation scheme, we propose the use of GraphLSTM (Graph-based LSTM), by representing the Voronoi spatial partitions as nodes on an arbitrarily structured graph. The GraphLSTM offers competitive performance against ConvLSTM, at lower computational complexity, across three real-world large-scale taxi demand-supply data sets, with different performance metrics. To ensure superior performance across diverse settings, a HEDGE based ensemble learning algorithm is applied over the ConvLSTM and the GraphLSTM networks.
Index Terms:
Taxi Demand-Supply, Spatial Tessellation, Time-series Forecasting, ConvLSTM, Graph LSTM.
I Introduction
Spatio-temporal forecasting has a wide range of applications, ranging from epidemic detection [1], energy management [2], to cellular traffic [3], among others. Location-based taxi demand and supply forecasting, one of the key components of ITS (Intelligent Transportation Systems), also relies heavily on accurate spatio-temporal forecasting. Mobility-on-Demand services such as e-hailing taxis, which have gained tremendous popularity in the recent years, often face taxi demand-supply imbalances. During peak and off-peak hours, mismatches occur between the spatial distributions of the taxi demand and the available drivers, resulting in either scarcity or abundance of vacant taxis. For example, Fig. 1 presents a case of demand-supply mismatch averaged over all Mondays near the city center in Bengaluru, India. We see that during the day hours, the high demand for taxis is met with inadequate supply. On the other hand, there is a surplus of vacant taxis during night hours, against low customer demand. Accurate demand-supply forecasts can mitigate this imbalance, thereby improving the efficiency of these taxi services. Information regarding the expected future demand and supply in a region can be used to re-route vacant cruising taxis, dynamically adjust the taxi fares, and recommend popular pick-up locations to the drivers.
I-A Related works
The recent popularity of e-hailing taxi services has generated substantial interest in developing efficient taxi demand-supply prediction algorithms [5, 6, 7, 8]. In the past, taxi demand-supply prediction was mainly formulated as a classical time-series forecasting problem. ARMA (Auto Regressive and Moving Average) family of time-series models were applied to taxi demand prediction problems with satisfactory results [9]. However, these time-series models rely on the stationarity assumption, which is often violated by real-world data. The capability of such classical methods to deal with high dimensional, complex, and dynamic time-series data is also limited. Meanwhile, the generalization capabilities of the NNs (Neural Networks) inspired transportation researchers to leverage this tool in the traffic forecasting domain with promising results [10, 11]. The traditional NNs lack the ability to learn temporal dependencies, leading to the design of models that are more suited for sequence data such as RNN (Recurrent Neural Network) and its variants, namely LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) [12, 13]. In fact, RNN has emerged as the preferred machine learning tool to solve many traditional sequence learning problems such as speech recognition [14], text recognition [15] and cellular communication [16]. Since the real-world data often exhibit both temporal and spatial variations, several spatio-temporal extensions of RNNs have been proposed. A widely employed extension used in taxi data forecasting, known as the ConvLSTM (Convolutional LSTM), involves addition of convolutional layers prior to the LSTM framework [32]. The standard convolutional layer can be applied to only a grid-structured input and learns localized rectangular filters. This limits the application of the conventional ConvLSTM to a city space partitioned into fixed-sized grids. Hence, a fixed-sized equally-spaced partitioning is often adopted in the spatio-temporal NN-based models for location-based taxi demand or supply forecasts [17, 18, 8, 19, 20].
In our previous work [4], we explored a variable-sized partitioning scheme in addition to a fixed-sized scheme for taxi demand forecasting. Using classical time-series regression models, we observed a visible enhancement in the prediction performance with a variable-sized tessellation scheme in several scenarios. The real-world data often has a heterogeneous spatial distribution, which may not be captured faithfully with a fixed-sized partitioning scheme that is based on spatial homogeneity assumption. While the generalization capabilities of the RNNs make them powerful tools for spatio-temporal modeling, assuming that the data is homogeneously distributed may limit their modeling capabilities. Hence, it is imperative to explore a variable-sized partitioning scheme in an RNN-based spatio-temporal modeling framework. Most of the currently popular spatio-temporal RNN models are based on the ConvLSTM networks that are incapable of modeling a variable-sized partitioned space. Motivated by this observation, in this work, we develop an LSTM framework that can extract the potential of variable-sized spatial partitions.
While dividing the city space into variable-sized Voronoi tessellations, we take note of the fact that arbitrarily spaced tessellations can be represented using graphs. That is, while the variable-sized partitions cannot be represented as equally-spaced fixed-sized grids, they can be visualized in the form of a graph. The demand aggregated in each Voronoi partition can form a node in an arbitrary structured graph. Therefore, a Graph-based RNN holds great potential in our scenario. In the last couple of years, there has been substantial interest in devising Graph NNs, by extending the convolution operator to suit a more general graph-structured data [21]. In the context of traffic forecasting, Graph CNNs (Convolutional Neural Networks) have been applied to predict flows at traffic sensors [22]. By considering a road network as a graph and traffic sensors as nodes, Graph CNNs have been combined with RNN to capture the spatial relationships between nodes [22, 23, 24]. However, there is limited research on incorporating graph RNNs in location-based taxi demand or supply forecasting. In [25], the authors do apply graphs to model non-euclidean correlations for ride-hailing demand forecasting, but the models are based on equally-spaced fixed-sized grid partitions. In summary, the existing transportation literature on Graph NNs either consider traffic sensors as the graph nodes or learn graph-based correlations in a grid-partitioned space.
Our modeling framework deviates significantly from the existing literature as we employ a GraphLSTM (Graph-based LSTM) [23] model to learn an arbitrarily structured graph, where each node corresponds to the aggregated demand in a spatial Voronoi partition. To the best of our knowledge, in the context of spatial partitioning, Graph-based RNNs have not been explored in the literature. Another important contribution of this paper lies in understanding the impact of different spatial partitioning schemes on the predictive performance of RNNs. To that end, we perform a comparison of the Geohash-based ConvLSTM and the Voronoi-based GraphLSTM. These features set our work apart from the existing literature. 111A part of this work was presented as a poster at the NIPS Workshop on Machine Learning in Intelligent Transportation Systems, 2018 [26].
I-B Our contributions
After the city is divided into fixed-sized rectangular cells and variable-sized polygon cells, we employ the standard ConvLSTM to model the equally-spaced Geohash tessellated city and the GraphLSTM to model the unequally-spaced Voronoi tessellated city. We compare the results with three baselines: (i) the vanilla LSTM based on Voronoi and Geohash schemes, (ii) the ARIMA (Auto Regressive Integrated Moving Average) model, and (iii) the ARIMAX (ARIMA with eXogenous inputs) model. When evaluated across three real-world data sets, the GraphLSTM exhibits competitive prediction performance against the established baseline models, at a lower computational complexity. Interestingly, we see that the prediction models exhibit non-stationary behavior, in addition to dependencies on the choice of data set and performance metric. To tackle this issue, we perform ensemble learning on the time-shifting models using an online non-stationary expert combining dHEDGE algorithm [27]. By using a combination of prediction models, the algorithm picks the best model for each time step in the forecasting horizon. The main contributions of this paper are the following:
- •
This work is the first to demonstrate the potential of Graph RNNs within a location-based spatial partitioning and forecasting framework.
- •
The GraphLSTM offers competitive prediction performance against ConvLSTM at a lower computational complexity, across data sets using different performance metrics.
- •
The Voronoi-based GraphLSTM outperforms Geohash-based GraphLSTM and ConvLSTM in data scarce locations.
- •
Prediction accuracy of irregular graph based GraphLSTM is at least as good as that of regular graph based GraphLSTM, highlighting the potential of irregular graphs in location-based forecasting.
- •
Applying the dHEDGE algorithm in conjunction with the ConvLSTM and GraphLSTM models ensure consistently superior prediction accuracy, across all scenarios considered.
The rest of the paper is organized as follows. Section II defines the problem statement. The spatial tessellation schemes are explained in Section III, along with a brief description of the data sets used in this study. In Section IV, the spatio-temporal LSTM, ConvLSTM, GraphLSTM and the baseline models are discussed. The experimental settings and results are elaborated in Section V, followed by a description of the dHEDGE algorithm in Section VI. We conclude our results in Section VII.
II Problem setting
We formulate our problem as follows. For a location-based forecasting, the city space is tessellated into regions. The set of regions = can be fixed-sized grids, variable-sized polygons, zip codes, etc. We employ (i) fixed-sized rectangular grids called geohashes and (ii) variable-sized polygon partitions called Voronoi cells. Let the demand and supply aggregated in every form the sets = and = . We assume that the data in the region of interest is related to its historical data and the data in its first-order neighboring regions. In this work, our objectives are two-fold. First, given the set of all geohashes, our goal is to learn a function , mapping the demand (or supply) data in any geohash to its temporal and spatial neighbors. For the Geohash-based fixed-sized equally-spaced spatial structure, we use ConvLSTM to learn this function as:
[TABLE]
where, is the forecast horizon. Second, for exploring Voronoi-based variable-sized unequally-spaced spatial structure, we represent the tessellated city space by an undirected graph , where = (, , ). The regions will form a graph with vertices and edges. The connectivity between nodes is represented by an adjacency matrix . The adjacency matrix is defined as follows:
[TABLE]
By default, . For the graph learning task, our choice of modeling tool is the GraphLSTM. Here, the taxi demand-supply forecasting problem can be represented as learning the mapping function that maps the historical demand (or supply) to future predictions, given a graph depicting the Voronoi partitions:
[TABLE]
The predictive performance of the models is compared using three error metrics, namely Symmetric Mean Absolute Percentage Error (SMAPE), Mean Absolute Scaled Error (MASE) and Root Mean Square Error (RMSE). These metrics are defined and discussed in Section IV. Fig. 2 shows the flow chart of the study to be conducted in this paper.
III Spatial Tessellation schemes
Three real-world data sets are considered for our study. We use the taxi demand-supply data sets from the city of Bengaluru, India and publicly available taxi demand data set from the city of New York, USA.
III-A Description of the data sets
The Bengaluru taxi demand and driver supply data sets are acquired from a leading Indian e-hailing taxi service provider. The demand data contains GPS traces of taxi passengers booking a taxi by logging into their mobile application. The supply data contains GPS traces of fresh log-ins of taxi drivers, representing available supply. The data sets are available for a period of two months; from of January 2016 to of February 2016. The data sets contain latitude-longitude coordinates of the passenger/driver, session duration and time stamp. The latitude and longitude coordinates of the city are 12.9716° N, 77.5946° E, with an area of approximately 740 . The publicly available New York yellow taxi data set [28] contains GPS traces of a street hailing yellow taxi service. For our study, we extract the pick-up locations and time stamps from the period of January-February 2016. The latitude and longitude coordinates of the New York city are 40.7128° N, 74.0059° W, with an area of approximately 780 . Some key statistical properties of the data sets are given in Table I.
III-B Voronoi tessellation
A prerequisite for the Voronoi tessellation is a set of generating sites that can be used to define the Voronoi cells. For this purpose, we use the K-Means clustering algorithm [29]. The algorithm has a linear memory and time complexity, which is ideal for our very large data sets, and performs reasonably well in comparison with other clustering algorithms [30]. The data is classified into a predefined set of clusters, and the centroids of these clusters act as generating sites for the Voronoi tessellation.
The K-Means algorithm aims to minimize the squared error function given as:
[TABLE]
where is the Euclidean distance between a data point and its center , and is the total number of data points. For efficient rerouting of vacant taxi drivers, we partition Bengaluru into 740 clusters and New York city into 780 clusters so that the average cluster area remains close to 1 . Note that instead of applying a separate K-Means algorithm on the Bengaluru supply data set, we associate the supply data points with its nearest demand centroid. It enables us to do a comparative analysis of the demand and supply patterns associated with every demand region of interest. Then, the Centroidal Voronoi tessellation divides the space according to the nearest neighbor-rule, based on the K-Means centroids. Based on the closeness of centroids, this tessellation strategy produces polygon partitions of varying areas, with a time complexity of O(nlogn).
III-C Geohash tessellation
Geohash tessellation is an extension of the basic grid partitioning technique with a naming convention. Each latitude-longitude coordinate is encoded into an alphanumeric string, where the length of the string denotes the level of the geohash. All latitude and longitude coordinates mapped to a specific string will form a unique fixed-sized rectangular grid. For example, a 5-level geohash spans an area of 4.9 km 4.9 km and a 6-level geohash covers an area of 1.2 km 0.6 km. For consistent comparison with Voronoi cells of average area 1 , we employ 6 level-geohashes for our study. Regarding time complexity, this algorithm is O(1).
The Voronoi and Geohash heat maps are plotted in Fig. 3, where the scale denotes the data volume in each cell. Geohash strategy produces rectangular grids of fixed area, resulting in several demand dense and scarce cells. Voronoi strategy tends to uniformly distribute data and produces polygons of variable area.
IV Spatio-Temporal Models
In the previous section, we saw that the spatial distribution of the data in each partition varies significantly with the partitioning technique employed (Fig. 3). In this section, we examine whether this variation in spatial distribution has an impact on the performance of the prediction models employed in these partitions. Both the ConvLSTM and GraphLSTM models derive heavily from the LSTM network [31]. The RNN cell, from which the LSTM is developed, considers its present input and the output of the RNN cells preceding it, for its present output. The LSTM network overcomes several shortcomings of the plain RNN and learns long-term temporal dependencies. This property makes it a suitable candidate for time-series analysis. An LSTM cell has four NN units, called gates, that interact with each other. See the equations below:
[TABLE]
where is the input and , , and represent the input weights, recurrent weights and bias of the gate respectively. The forget gate, given by Eqn. (2), decides the amount of historical information to be discarded. The input gate in Eqn. (3) decides the values to be updated in the cell state, and the output gate outputs the cell states in Eqn. (4). The nonlinear activation functions , and squish the outputs to recommended ranges, which are usually [0,1] or [-1,1]. Matrix multiplication and element-wise product operations are denoted by and operators respectively. Eqn. (5) calculates a set of new candidate values to be added to the present cell state . After the cell state is updated using Eqn. (6), the new hidden state output is given by Eqn. (7).
IV-A Geohash-based ConvLSTM
The Convolutional LSTM (ConvLSTM) network [32] combines the aspects of both CNN and LSTM. It extends a fully connected LSTM network to have convolutional structures in both input-to-state and state-to-state transitions, to learn spatial dependencies. The key equations are as follows:
[TABLE]
where, the convolution operator is denoted by . In mathematical terms, a ConvLSTM replaces the matrix multiplication operations in the feed-forward equations of the vanilla LSTM to convolution operations. If we consider the centroid of each partition as a node on a graph, the entire city can be represented by an undirected graph . Each node represents a partition and will hold a value equal to the aggregated demand or supply for that partition. See Fig. 4 for cross-sections of the graphs obtained using the Voronoi and Geohash schemes. The Voronoi tessellated city will generate an irregular graph, where all the nodes need not have the same number of neighbors. We note that the number of neighbors varies from 3 to 10 for each Voronoi partition. On the other hand, the Geohash tessellated city forms a highly regular graph where each node has 8 neighbors, equidistant from each other. This fixed structure of a Geohash-based graph allows us to apply standard convolution operations and hence, can be modeled using a standard ConvLSTM. For an arbitrarily structured graph like the Voronoi-based graph, localized rectangular filter operations cannot be applied. A graph-based LSTM that utilizes the adjacency matrix to depict the structure of a graph can be employed in such scenarios.
IV-B Voronoi-based GraphLSTM
The primary step of a GraphLSTM framework [23] is to define the neighborhood. A k-hop neighborhood can be used to gather information from nodes that are k hops away from a node of interest. In this study, we gather spatial information from the first-order neighbors, i.e., the neighbors who share a common boundary with the partition of interest. Hence, a 1-hop neighborhood is considered for the implementation of our GraphLSTM. The 1-hop neighborhood matrix for any graph is same as its adjacency matrix . To make the nodes self-accessible in the graph, the identity matrix is added to , to form . Then, the 1-hop graph convolution at time can be defined as follows:
[TABLE]
where, is the 1-hop weight matrix for the 1-hop adjacency matrix, and is the demand or supply at time , where is the number of Voronoi partitions. The features extracted from the graph convolution are fed to the LSTM network. We see that the structures of the forget gate , the input gate , the output gate , and the input cell state at time are similar to the vanilla LSTM. The input is replaced by the graph convolution features . A new cell state to incorporate the contributions of neighboring cell states is added to the framework, where is the corresponding weight matrix. The main equations are as follows:
[TABLE]
With the addition of in Eqn. (20), the influence of the neighboring cell states will be considered during the recurrent updates of the cell state. We, then, compare the ConvLSTM and GraphLSTM networks against LSTM networks modeled using Voronoi and Geohash features. The ARIMA and ARIMAX models are also used as baselines to explore the assumption that the relationship between the data in adjacent partitions is linear.
IV-C LSTM
For a region (Voronoi cell or geohash), we first feed the demand/supply from alone to the LSTM network. Then, to analyze the effect of spatial neighbors, we feed data from and its first-order neighbors to the LSTM network. We vary the number of first-order neighbors to arrive at the best spatial configuration.
IV-D ARIMA and ARIMAX
For linear modeling, we consider two models: (i) ARIMA (Auto Regressive Integrated Moving Average), and (ii) ARIMAX (ARIMA with eXogenous inputs). After examining several regression models, we observed that ARIMA is a satisfactory fit for a majority of Voronoi cells and geohashes. Hence, we aim to fit a single ARIMA and ARIMAX model for the entire city. The ARIMA model belongs to the class of statistical modeling, with an Auto Regressive (AR) part to model the changing variable as a regression on its own lagged values, an Integrated (I) part to produce stationary series, and a Moving Average (MA) part to incorporate the dependency between an observation and the residual errors obtained from a moving average model applied to lagged observations. The model is represented as:
[TABLE]
where is the demand/supply to be predicted at time t, is the differenced form of , and are the order and parameters of the AR process, and are the order and parameters of the MA process, and is the forecast error. Historical information from the variable of interest alone is taken into consideration for the standard ARIMA model. The ARIMAX model is an extension of ARIMA that provides a framework to include information from the neighboring regions (i.e., covariates). We employ the ARIMAX to analyze the extent of spatial information captured with different tessellation schemes. The ARIMAX model is defined as:
[TABLE]
where is the covariate at time , and the parameter includes the lagged versions of the covariate. The time-sequences from the positively correlated first-order neighboring regions serve as covariates in our study.
V Experiments
In this section, we discuss the hyper-parameter optimization techniques for the models, evaluation metrics, and inferences obtained on performing the comparison study.
V-A Experimental settings
Before modeling the data using any NN model, it is necessary to set the optimal hyper-parameters. Hyper-parameters are the model-specific properties that are to be fixed before the training phase of the model. They define the high-level properties of the model such as the time complexity or the learning rate. Out of the various algorithms available for hyper-parameter optimization, Bayesian Optimization is widely used in the recent machine learning literature [33]. In this study, we use the Tree-structured Parzen Estimator Bayesian Optimization (TPE-BO) [34] approach for tuning the hyper-parameters. This algorithm uses Parzen estimators to model the error distribution as non-parametric densities. The range of the hyper-parameters fed to the TPE algorithm is given in Table II. Additionally, for ConvLSTM, we vary the number of filters from 16 to 258. The RMSprop is shortlisted as the optimization function for our data sets. The choice of the activation function at the output dense layer is Relu. The Relu activation is recommended for data sets such as passenger count and taxi supply as it allows the output to vary linearly, with a minimum at zero. That is, the output is zero if the input to the activation function is less than zero. In case the input is greater than zero, the output is equal to the input. The dropout values, number of layers and learning rates are optimized for each data set, tessellation technique, and NN model. In addition to the hyper-parameter values suggested by the TPE-BO, we manually tune the parameters to arrive at the best prediction accuracy.
The K-Means clustering algorithm generates a set of N regions of interest. The variable N takes values 740 and 780 for Bengaluru and New York City respectively. For each n N, two time-sequences are generated using the Voronoi and Geohash tessellation strategies. The data is aggregated over 60 minutes for 60 days, generating time-sequences of length T = 1440 time steps. To implement GraphLSTM for Voronoi tessellation, we pick the 1-hop neighbors for every node n. The GraphLSTM receives inputs of the form X , along with an adjacency matrix . The adjacency matrix encapsulates information from the first-order neighbors. For the GraphLSTM, we consider a hidden layer with dimension equal to the number of nodes in the graph. For ConvLSTM, we consider frames of size 33. This particular configuration allows us to capture information from a 6-level geohash of interest (the center pixel) and 8 first-order neighbors. The ConvLSTM framework receives inputs of the form X , where frame sizes are of dimension 3 3, along a single channel. For the LSTM network, the inputs are of the form X , where S is the number of spatial neighbors. Note that while a geohash has 8 fixed number of first-order neighbors, a Voronoi cell has a variable number of first-order neighbors. This corresponds to a S value of 8 for Geohash LSTM. For consistent comparison, while training Voronoi input based LSTM, we pick features from the top 8 positively correlated Voronoi neighbors. For Voronoi cells with less than 8 neighbors, we compensate for the lack of features by introducing invalid feature vectors to differentiate them from useful information.
For ARIMA and ARIMAX models, we vary the AR and MA parameters between the range [0, 5] and the time-sequences are differenced whenever non-stationary behavior is encountered. While fitting the LSTM and ARIMAX models to the city, we varied the number of spatial Voronoi and Geohash features included in the model, to find the best spatial configurations for each data set. All the NN models are trained with a batch size of 64 for 500 epochs and repeated 5 times to compensate for the random initialization of network weights. MinMax scaling is applied to the input before they are fed to the various networks. Early stopping mechanism is employed to prevent over-fitting.
V-B Evaluation metrics
For each data set, data from the first 59 days is used for training the models. The models are then tested on the day. We keep aside 10% of the training data for validation purposes. We employ three widely used performance metrics to evaluate the models:
Symmetric Mean Absolute Percentage Error (SMAPE):
[TABLE] 2. 2.
Mean Absolute Scaled Error (MASE):
[TABLE] 3. 3.
Root Mean Square Error (RMSE):
[TABLE]
where is the forecast horizon, is the seasonal period, is the actual demand, is the length of the training set, and is the forecast at time . The RMSE gives relatively high weights to large errors. The SMAPE is an accuracy measure based on percentage errors. Both RMSE and SMAPE are scale dependent errors. The MASE compares the forecast errors of the test set with the in-sample forecast errors from the standard Naïve model, making it scale independent. Since these performance metrics are based on the -norm and -norm errors, we define the loss function over which the optimization is performed as:
[TABLE]
Since the loss function is the sum of the Mean Squared Error (MSE) and Mean Absolute Error (MAE), the models will optimize for both and errors.
V-C Experimental results
The Table III summarizes the numerical results of the comparison study. The results are obtained by applying the regression and NN models on the three data sets aggregated using Voronoi and Geohash tessellation schemes. The standard deviation factor accounts for the variability across different locations and multiple runs. The main inferences drawn from the study are as follows:
Even though the overall performance of linear regression models is sub-par with that of the non-linear neural models, the high computational speed and comparable performance in certain scenarios are to be noted. 2. 2.
The entire set of first-order neighbors may not be necessary to achieve the best spatial model configuration. 3. 3.
The prediction performance of the GraphLSTM is better than that of the standard ConvLSTM on the majority of test cases, and this was achieved with lower computational complexity. 4. 4.
Across data sets and metrics, the irregular Voronoi graph based GraphLSTM performs comparable to or better than the regular Geohash graph based GraphLSTM, suggesting at the potential of irregular graphs in location-based forecasting. 5. 5.
The lack of a universal winning tessellation strategy is noted.
The ARIMA model achieves better prediction accuracy with Voronoi tessellation based input features than with Geohash features. The overall accuracy improved on incorporating spatial information through ARIMAX models. However, we notice that ARIMAX models resulted in performance deterioration for Bengaluru demand data set. To investigate this behavior further, we modeled the top 50 demand scarce and demand dense regions in Bengaluru using independent ARIMAX models and saw clear improvements in accuracy. This points towards the inability of a linear regression model to satisfactorily capture spatial information using a single model for the entire city. Hence, with regression models, we do not recommend modeling the entire city with a single model. Voronoi and Geohash based LSTM models show consistent improvements in accuracy on incorporating information from spatial neighbors. We notice that all the first-order neighbors might not be required to arrive at the best spatio-temporal model configuration.
The ConvLSTM model based on the Geohash strategy achieves good prediction accuracy on the Bengaluru data set but fails to perform well for the New York city data set. This trend is seen across deeper neural layers and different filter depths. On the other hand, Voronoi-based GraphLSTM exhibits consistently high prediction performance across multiple scenarios for both cities. The poor performance of Geohash-based ConvLSTM with NYC demand data can be attributed to the highly skewed spatial data distribution. 90% of the total data is concentrated around the Manhattan borough, leaving the other four boroughs with 10% of the total demand. Employing a fixed-sized partitioning scheme in a non-uniform data distributed space is not an efficient model setting. Geohash partitioning results in a large number of demand scarce cells in some boroughs, affecting model performance. Meanwhile, K-Means based Voronoi tessellation attempts to uniformly distribute data in the partitions, resulting in a lower number of demand scarce cells. GraphLSTM based on such an efficient model setting achieves high prediction performance.
For further validation of this observation, we represent the Geohash partitions as nodes on a regular graph and conduct Geohash-based GraphLSTM modeling. We find that the Geohash-based GraphLSTM is also unable to model the data satisfactorily, resulting in high variability in the RMSE and MASE. This highlights the importance of choosing an appropriate tessellation strategy, irrespective of the modeling technique used. In this case, the right choice of the partitioning technique resulted in 80% improvement in RMSE. The Bengaluru demand and supply data sets have a uniform spatial distribution, and hence, a Geohash partitioning scheme based model works sufficiently well. Therefore, we conclude that the choice of the spatial partitioning technique depends on the data distribution. Even then, the Voronoi partitioning scheme based model exhibits competitive prediction performance at a lower computational cost. This shows that an appropriate tessellation strategy can reduce the complexity of the network to be built, without compromising on the prediction accuracy. The GraphLSTM has roughly the computational complexity of the vanilla LSTM, which is much lesser than that of the ConvLSTM. While the ConvLSTM has additional convolutional layers that increase the number of matrix operations, GraphLSTM builds on the vanilla LSTM architecture, with one additional gate and some changes to the input. Note that we measure the computational complexity in terms of the matrix operations to be performed. The lower error variance of the GraphLSTM in comparison with that of the ConvLSTM suggests that the consistency in predictions is also maintained across various locations in the city. To summarize, the consistent performance of GraphLSTM, combined with low computational complexity, across scenarios in the context of location-based passenger demand and driver supply forecasting merits attention and needs further exploration.
While we stress on the significance of selecting an appropriate tessellation strategy while modeling, we observe that the best tessellation strategy and hence, the prediction model, varies with the choice of data set and performance metric. While the GraphLSTM has an overall favorable performance across data sets, there are instances where ConvLSTM outperforms GraphLSTM (e.g., in the Bengaluru demand data). For ensuring good prediction performance across all scenarios, we explore ensemble learning in the next section. Fig. 5 shows the GraphLSTM based demand-supply predictions plotted against the actual patterns in a top Voronoi cell in Bengaluru. We see that the supply predicted from the historical data is a better match than the actual supply for the existing demand patterns in that region. Hence, a rerouting decision based on the predicted supply may reduce the demand-supply mismatch.
VI Combining models with dHEDGE algorithm
The applicability of ensemble learning in a non-stationary environment that involves tessellation strategies was put forward in [4], where we applied dHEDGE ensemble learning algorithm to combine the tessellation strategies. We observed that the regression models based on any one of the tessellation strategies were unable to yield optimal results for the entire forecast horizon. On exploring various LSTM networks, we note that this observation extends to the performance of RNN models as well, thereby strengthening the claim put forward in [4]. In Fig. 6, we see that the best prediction technique varies with the time of the day. The best strategy switches between the Voronoi and Geohash tessellation based techniques throughout the forecast horizon. We find that irrespective of the modeling tool used, there is no universally superior tessellation strategy. This is in addition to the dependency of the strategies on the data set and performance metric (Table III).
To compensate for the lack of a winner strategy, we use a variant of the well-known HEDGE algorithm [35] suited for the non-stationary environment known as the dHEDGE algorithm [27]. By exponentially reducing the weights associated with each expert (i.e., prediction model), the dHEDGE takes into account the non-stationarity of the process. In our case, we have three experts, Voronoi and Geohash based GraphLSTM models, and the ConvLSTM model. The weight initialization can be performed either uniformly or based on some prior knowledge about the experts. We initialize the weights based on a holdout validation set. The weights are updated based on the previous weights, a discounting factor , a learning factor , and a loss function . The loss function is based on the prediction errors observed by the experts at time . Thus, for each time step , the weights are updated for the expert as:
[TABLE]
The discounting and learning factors are chosen based on the validation set. The performance of the algorithm can be seen in Fig. 6. At each time step in the forecasting horizon, we pick the expert with the highest weight and use its predictions. We find that the algorithm picks the best shifting expert, by giving more weightage to the behavior of that expert in the recent past. The interested reader can refer to [4] for a detailed analysis of the algorithm. The prediction accuracy after applying the dHEDGE algorithm on the three experts can be seen in Table III. The algorithm consistently results in an accuracy close to the best expert in the pool. To demonstrate the flexibility of our algorithm in adapting to various scenarios, we evaluate the performance of our algorithm on the New York demand data set. While Voronoi GraphLSTM achieves good prediction accuracy, the two Geohash-based models perform poorly. When the dHEDGE is applied on these three experts, it is remarkable to note that dHEDGE achieves an accuracy close to the Voronoi GraphLSTM, and is unaffected by the poor performance of the other two experts. Further, in some use cases, we note that combining the experts produces accuracy levels better than that of any of the individual experts. This behavior is attributed to the time-dependent behavior of the models. In conclusion, our algorithm provides consistent performance across data sets and performance metrics, eliminating various dependencies of the prediction models.
VII Concluding Remarks
In the context of e-hailing taxi services, generating accurate demand-supply forecasts is instrumental in minimizing customer wait times and maximizing driver utilization. Neural Network-based taxi demand or supply forecasting commonly uses a fixed-sized equally-spaced spatial partitioning scheme. In this paper, we explored the impact of different spatial partitioning schemes on the predictive performance of LSTM (Long Short-Term Memory) models. By comparing ConvLSTM (Convolutional LSTM) and GraphLSTM (Graph-based LSTM), we draw attention to the potential of learning the partitioned city structure as a graph and applying Graph-based Neural Networks.
When evaluated on three large-scale real-world data sets, GraphLSTM emerged as a promising candidate for location-based taxi demand-supply forecasting. The GraphLSTM offered competitive prediction performance against ConvLSTM at a much lower computational complexity. The comparison between GraphLSTM models based on regular and irregular graphs revealed the potential of irregular graphs in the context of location-based forecasting.
In addition to the proposal to use irregular graph based GraphLSTM for taxi demand-supply forecasting, the findings in this paper recommend exploration and selection of a suitable tessellation strategy prior to fitting a Neural Network model. The choice of a suitable prediction model was found to depend on the properties of the data set and the performance metric employed.
To achieve superior performance across all scenarios, we recommend the dHEDGE based ensemble learning algorithm. By employing dHEDGE in conjunction with the GraphLSTM and ConvLSTM models, we consistently achieved a prediction accuracy close to the best model at each time instant across the data sets considered, with different performance metrics.
VII-A Avenues for further research
This paper was directed towards accurate forecasting of taxi demand and supply, where we highlighted the potential of Graph-based LSTM networks. A detailed analysis of GraphLSTM can be performed, using more real-world data sets. Further, we note that demand-supply mismatches also occur when there are unexpected spikes in demand. In our future work, we aim to detect and include such anomalous events in the prediction model to achieve better predictions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] K. L. Colborn, E. Giorgi, A. J. Monaghan, E. Gudo, B. Candrinho, T. J. Marrufo, and J. M. Colborn, “Spatio-temporal modelling of weekly malaria incidence in children under 5 for early epidemic detection in mozambique,” Scientific Reports , vol. 8, p. 9238, 2018.
- 2[2] A. A. Ezzat, M. Jun, and Y. Ding, “Spatio-temporal asymmetry of local wind fields and its impact on short-term wind forecasting,” IEEE Transactions on Sustainable Energy , vol. 9, pp. 1437-1447, 2018.
- 3[3] X. Wang, Z. Zhou, F. Xiao, K. Xing, Z. Yang, Y. Liu, and C. Peng, “Spatio-temporal analysis and prediction of cellular traffic in metropolis,” IEEE Transactions on Mobile Computing (Early Access) , 2018.
- 4[4] N. Davis, G. Raina, and K. Jagannathan, “Taxi demand forecasting: A HEDGE-based tessellation strategy for improved accuracy,” IEEE Transactions on Intelligent Transportation Systems , vol. 19, pp. 3686–3697, 2018.
- 5[5] C. Kamga, M. A. Yazici, and A. Singhal, “Analysis of taxi demand and supply in New York City: implications of recent taxi regulations,” Transportation Planning and Technology , vol. 38, pp. 601–625, 2015.
- 6[6] B. Jäger, M. Wittmann, and M. Lienkamp, “Analyzing and modeling a city’s spatiotemporal taxi supply and demand: A case study for Munich,” Journal of Traffic and Logistics Engineering , vol. 4, pp. 147–153, 2016.
- 7[7] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, and W. Lv, “The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms,” in Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 1653–1662, 2017.
- 8[8] J. Ke, H. Zheng, H. Yang, and X. M. Chen, “Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach,” Transportation Research Part C: Emerging Technologies , vol. 85, pp. 591–608, 2017.
