Approximating Optimisation Solutions for Travelling Officer Problem with Customised Deep Learning Network
Wei Shao, Flora D. Salim, Jeffrey Chan, Sean Morrison, Fabio, Zambetta

TL;DR
This paper proposes a novel deep learning approach to approximate solutions for the Travelling Officer Problem, transforming it into a classification task and demonstrating its effectiveness on real-world data.
Contribution
It introduces a customized deep neural network architecture for the Travelling Officer Problem and analyzes key architectural factors influencing performance.
Findings
The network effectively approximates traditional solutions.
Architectural components significantly impact performance.
Demonstrated on real-world parking violation data.
Abstract
Deep learning has been extended to a number of new domains with critical success, though some traditional orienteering problems such as the Travelling Salesman Problem (TSP) and its variants are not commonly solved using such techniques. Deep neural networks (DNNs) are a potentially promising and under-explored solution to solve these problems due to their powerful function approximation abilities, and their fast feed-forward computation. In this paper, we outline a method for converting an orienteering problem into a classification problem, and design a customised multi-layer deep learning network to approximate traditional optimisation solutions to this problem. We test the performance of the network on a real-world parking violation dataset, and conduct a generic study that empirically shows the critical architectural components that affect network performance for this problem.
| #Nodes | Greedy | DNN-Greedy | ACO | DNN-ACO |
| 10 20 30 40 50 | 3.12s 6.99s 9.23s 11.99s 16.10s | 0.24s 0.57s 1.04s 1.33s 1.72s | 74.89s 152.51s 231.50s 323.39s 405.44s | 0.27s 0.45s 0.89s 1.21s 1.53s |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Parking Systems Research · Infrastructure Maintenance and Monitoring · Autonomous Vehicle Technology and Safety
Approximating Optimisation Solutions for Travelling Officer Problem with Customised Deep Learning Network
Wei Shao1111Contact Author
Flora D. Salim2
Jeffrey Chan2,3&Sean Morrison4
1RMIT University
{first, second}@example.com, [email protected], [email protected]
Abstract
Deep learning has been extended to a number of new domains with critical success, though some traditional orienteering problems such as the Travelling Salesman Problem (TSP) and its variants are not commonly solved using such techniques. Deep neural networks (DNNs) are a potentially promising and under-explored solution to solve these problems due to their powerful function approximation abilities, and their fast feed-forward computation. In this paper, we outline a method for converting an orienteering problem into a classification problem, and design a customised multi-layer deep learning network to approximate traditional optimisation solutions to this problem. We test the performance of the network on a real-world parking violation dataset, and conduct a generic study that empirically shows the critical architectural components that affect network performance for this problem.
1 Introduction
The travelling officer problem (TOP) is a variant of travelling salesman problem (TSP) Shao et al. (2017) but provides a way to use contextual and historical data. Nowadays, parking violation has become a prominent challenge for administration in most big cities. Parking officers need to stick infringement notices on violating cars before they leave the parking zone, but this can be challenging for many reasons. Firstly, the majority of infringing vehicles leave within a short period. Secondly, many violation events occur at the same time in a large area. As shown in Figure 1, parking officers must balance the travelling time between the officer’s location relative to the infringing vehicles, and the probability that these cars will leave.
Shao et al. Shao et al. (2017) previously defined this problem, and two heuristic solutions were demonstrated for generating paths using spatio-temporal data collected from on-ground sensors in parking spaces. The path generated by these optimisation methods (E.g. Red line in Figure 1 ) were shown to be better than the First-Come-First-Serve solutions (E.g. Yellow line in Figure 1), and achieved a higher return on parking fines when evaluated on the real-world parking dataset Shao et al. (2017). Despite this, the efficiency of these optimisation solutions cannot satisfy the requirements of real-time application. Given the short-term nature of parking violations, this makes application of more traditional methods difficult.
Deep learning models can achieve real-time performance since their training and inference are broken up into two distinct sessions. Training can be done offline, and inference runs in constant time, making neural networks ideal for problems like the TOP where fast evaluation is desired. However, a significant portion of current classification approaches tend to solve supervised learning problems for which the solution is known and provided as a training dataset. In addition, the TOP is a typical optimisation problem, and deep learning models are not typically used to solve optimisation problems directly. In order to leverage deep learning models, the TOP problem needs to be transformed into a supervised classification problem in order to obtain good potential paths as labels.
To overcome the above challenges, we use solutions generated by optimisation methods as labels for training. We propose a spatio-temporal data segmentation approach to transform the optimisation problem into a classification problem, and design a deep feed-forward neural network to approximate optimisation solutions. Training deep neural networks to replace optimisation has many advantages in this case: the computationally expensive optimisation problem can be solved as part of the training session, and once this is done, the test session can roll out trajectories that approximate those of the original optimiser with simple feed-forward computation. Moreover, neural networks scale very well, allowing such a technique to take advantage of a huge amount of additional contextual and temporal information and explore the unclear structure of this data.
We choose deep neural networks (DNN) rather than traditional machine learning classifiers such as Random Forest Breiman (2001) and SVM Suykens and Vandewalle (1999) because DNN can be customised to our purpose. To approximate specific optimisation solutions, we can customise different DNNs which can learn each operation of the specific optimisation method.
There are challenges in both the transformation and learning tasks. Traditional orienteering problems only focus on the spatial domain, usually in the form of a static 2D graph. The TOP also considers the temporal domain, which consists of many temporal views of this same spatial graph. We need a method that works with both spatial and temporal information. Secondly, classification methods need the input features and the corresponding labels; the TOP does not provide any existing solutions or features. It is non-trivial to integrate the optimisation solution with classification to solve this problem. Finally, it is unclear how to effectively use deep neural networks to approximate the optimisation solution in orienteering problems. As a result, the design of an appropriate deep learning architecture for this type of problem is needed.
This paper explores the use of deep learning techniques for the solution of the TOP, and makes the following contributions:
- •
We propose a generic framework to solve travelling officer problem incorporating optimisation approaches and deep neural networks.
- •
We propose a novel segmentation method to transform a spatio-temporal graph into a sequence of features.
- •
We are the first group to customise a neural network to approximate the greedy algorithm.
- •
We validate our claim that the TOP can be solved using a combination of both optimisation solutions and neural network classifiers through extensive experiments with a large real-world dataset, including a comparison with traditional machine learning methods.
The paper is organised as follows: Section 2 discusses the related work; some preliminary studies are shown in Section 3; Section 4 presents the problems and corresponding methodologies in both data representation and deep learning architecture; Section 5 shows the experiments and comparison studies; Section 6 discusses the limitations of this method, and future work; Section 7 concludes the paper.
2 Related Work
Related work in this area falls into two categories: 1) Neural network solutions for the TSP, and 2) general work in the intersection of deep learning and optimisation.
There is an extensive body of research in applying neural networks to TSP variants going back to Hopfield and Tank (1985), though the networks used in these studies typically fall into the category of Hopfield networks, and self-organising feature map networks (see La Maire and Mladenov (2012); Abdel-Moetty (2010); Potvin (1992)). Hopfield networks are fully recurrent, and memorise training examples by minimising an energy cost function. However, many recent advances in deep learning have been made with feed-forward neural networks, on the back of better optimisation algorithms and the ability to train on large datasets. Though Hopfield networks have been used for classification tasks, their performance is not as good as modern deep learning techniques. On combinatorial tasks, the number of neurons required by a Hopfield network scales with (where is the number of nodes in the graph) which can be problematic for larger graphs. Our study is different from these previous works in two ways: firstly, we are approximating a solution to a more difficult, time-dependent version of the TSP (the TOP). Our problem is focused on maximising a temporally-dependent reward, rather than navigating a geographically-fixed set of nodes. Using different temporal views of the data to generate training samples can dramatically increase the size of the training set (this is inspired by Liu et al. (2016)). Secondly, rather than using Hopfield networks, we propose re-framing the problem as a supervised learning task for classification. Under this framework, we use an optimisation algorithm to generate the training set for a classifier, which is then trained to generate a trajectory through a given graph.
Additionally, there numerous recent works in effectively combining optimisation with deep learning. Fischetti and Jo modelled deep neural networks as a 0-1 mixed integer linear program Fischetti and Jo (2017). Galassi et al. used a deep neural net to learn the structure of a combinatorial problem, and mentioned that such research is still at an early stage Galassi et al. (2018). Our work makes a small contribution to this area.
3 Background
3.1 Travelling Officer Problem
The Travelling Officer Problem describes the problem of a parking officer traversing a fully connected graph to maximise a cumulative reward (in this case, parking violations). There is a time cost associated with travelling from node to (or parking lot to , where denotes all nodes in the graph or all parking lots in the area) that is dependent on the officer’s walking speed (we do not assume that the officer stops at intermediary nodes).
The officer must choose between chasing for the potential reward for catching a parking violation at a given node by considering the probability that the violation may no longer exist by the time he/she arrives, and the opportunity of cost saving from not travelling to other nodes containing parking violations. The solution of the TOP aims to find a path that maximises the number of valid nodes with time limits (e.g. working hours), and the time-varying state of each node. The valid nodes denote the car at parking lot is in a state of violation.
Let be the total travelling budget, and denotes whether there is an infringement at node at time . We denote a solution as the path travelling over nodes, where and denote the node in the path, and denote the time when officer arrives at the node (note Time is whatever time unit/division you are using). Because in the TOP, is deterministic from the path of nodes visited, we can infer from just the visited nodes, and we will simplify our path to . Let denote the infringement fine amount (assuming each infringement cost the same). In this paper, we assume the is a constant value.
Then the TOP problem is to find a path S that maximises the total return, satisfying the total travelling time budget. A formal definition of this problem is as follows:
[TABLE]
3.2 Heuristic Optimisation
Previously, Shao et al. Shao et al. (2017) discussed two heuristic optimisation methods (greedy and ACO) to solve the Travelling Officer Problem; it was shown that both algorithms performed well at the task of collecting parking violation fines. In order to take advantage of temporal information in the TOP, the authors proposed a dynamic temporal probability model and integrated it with traditional optimisation methods. The proposed greedy algorithm can be formalised as a single function as follows:
[TABLE]
where denotes the overstayed time of cars at node , and denotes the route distance between node and the current position of the parking office. is a constant to denote the speed of the parking officer, and is a parameter which is set by historical data analysis. The proposed greedy algorithm seek for nodes calculated by Eq. 4 as the next position for parking officer.
In the ACO algorithm, the ants decide the next node by the pheromone distribution left by previous ants. The probability of these choices was modelled as being proportional to , where is the probability of a node being invalid by the time the officer arrives. The greedy algorithm used a similar dynamic probability model to estimate the most promising node, and would greedily select the best one. The details of both algorithms are shown in Shao et al. (2017).
4 Methodology
In this section, we propose a framework to transform the TOP – a spatio-temporal orienteering problem – into a classification problem that can be solved by a customised deep neural networks incorporating optimisation solutions. Figure 2 illustrates the overview of our framework. It primarily consists of three parts: spatio-temporal feature extraction, optimisation-based search, and a customised neural network. Spatio-temporal feature extraction aims to extract features and build a training dataset from the public parking violation historical dataset. Classification labels are generated by using optimisation to find a path from any given map. The DNN then uses the state and relative distance as the input features, and uses the optimisation solution as the ground truth label to learn the optimisation algorithm. We outline the details of each component in the following subsections.
4.1 Spatio-temporal Feature Extraction Method
A significant difference between traditional sub-path planning problems and the TOP is temporal dependence. Traditional sub-path problems generally consist of spatial data such as the location of the vertices, or the edges of the graph. For such static graphs, information is limited; however, time is continuous, meaning that even finer-grained slices can be taken to generate further data points when the temporal dimension is included. Each time frame becomes a static 2-D graph at timestamp , corresponding to a data sample .
As outlined in Figure 3, we extract a vector = , where is the number of parking lots (the number of nodes in the graph) from parking violation events at time and slicing time frame every .
For example, at node , there are violation events , where is the start time of the violation interval at node , and is the end time of the violation interval at node . For any time , there only two cases: 1) is located between one of violation events interval. That is, and , or 2) time is not located in any violation events. In this first case, we set the value . In the second case, We set the value as if there is no violation for parking lot at time . The number of time lots is denoted as , where is the maximum time constraint and denotes the time step we choose for slicing the temporal domain.
Except for temporal features, we also extract spatial features. The first component in the second row of Figure 2 shows a officer position matrix, where position filled with red colour in each row denotes the current location of the parking officer at time . For simplicity, the officer will not change the path between two nodes. Therefore, the possible position of officers are the same as the parking lots positions. The relative distance matrix shown in the bottom row in the Figure 2 store the distance between current parking officer location and other nodes . Therefore, for any time , we can get a relative distance vector , where measure the route distance between the parking officer and node .
4.2 Optimisation
The TOP is an NP-hard problem that can be solved by traditional optimisation methods under an objective function and constraints. As mentioned previously, optimisation solution is series of nodes at time . However, a complete path should be broken into multiple decisions to transform an optimisation problem to a classification problem. That is, at any time , we need to run the optimisation algorithm and only select the first node in the path . As a result, we can get a series of labels , where is the first node of path which is given by optimisation method at time .
The classification problem aims to learn a categorical likelihood over a set of classes from a training set. In this case, if we regard each node as a class label and use state matrix and relative distance matrix as the features for the training, the classifier should learn to choose the next node.
4.3 Customised Neural Networks
In order to achieve similar performance with optimisation methods, it is not enough to use an existing general neural network because the general neural networks are sensitive to the hyper-parameters and architecture. Therefore, we propose a customised neural network which is designed with each operation of the greedy algorithm in TOP. Figure 4 shows the completed architecture of neural networks designed for replacing the greedy algorithm in TOP. The first layer is the input layer concatenated by two vectors: a relative distance vector , and the state vector which is calculated in Section 4.1. The first hidden layer aims to learn a linear combination of and . If we pay attention to Eq. 4, we can find that this layer can approximate the function . Universal approximation theorem Csáji (2001) states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of . This is a continuous function on a real value dataset. Therefore, we design a hidden layer to approximate this function. Then we use a sigmoid function which is also a non-linear activation function to approximate exponential function. This is also a non-linear function Ito (1991)Ferrari and Stengel (2005). Layer now consists probabilities to denote the capture chance by parking officer for node . Since is not a continuous function, we cannot use the hidden layer. Fortunately, softmax function is a perfect function to choose the max value from Tokic and Palm (2011). Therefore, the output of the network becomes the next node that the officer should travel to.
For other components of the neural network, we used Adam Kingma and Ba (2014) for optimisation, and early stopping for regularisation. We used dropout at each layer to prevent overfitting Srivastava et al. (2014).
5 Experiments and Results
In this section, we evaluate the performance of our proposed model, and compare it with traditional classification methods and optimisation-only solutions on a real-world dataset. The rules and assumptions are outlined in Shao et al. (2017).
5.1 Dataset
We tested the proposed model on the Melbourne parking event dataset, published by the Melbourne City Council, and used previously in Shao et al. (2016), Shao et al. (2017). A detailed description of this dataset is included in Shao et al. (2017). We took time slices throughout the week for training, and tested the performance of the classifier using different time slices to ensure that the test and train sets were drawn from the same distribution. For reward evaluation, we randomly chose a week’s worth of data from a year-long data set. For other experiments, we randomly selected a single day. For the rewards study, we chose nodes from all vertices in the graph, and extracted sampling data at second intervals.
5.2 Evaluation Metric
We use two criteria to evaluate the performance of our proposed method: rewards, and classification accuracy. The definition of rewards is given in Shao et al. (2017). It denotes how many cars in violation can be caught by parking officers. Since we use the optimisation solution as the ground truth, we also use the classification accuracy to measure the degree to which the classifier learns the optimisation algorithm.
5.3 Experimental settings
We applied both ACO and greedy algorithm which is used in Shao et al. Shao et al. (2017) to the dataset and our customised DNN, Support vector machine (SVM) Smola and Schölkopf (2004) and Random Forest (RF) Breiman (2001) to learn the optimisation solutions.
5.4 Classification Model Comparison
In the first experiment, we evaluate all classifiers over a week by rewards and categorical accuracy. There are two sub-set of experiments. First experiment compares the greedy algorithms and classifiers that learn from the greedy algorithm. Second experiments evaluates the ACO and classifiers that learn from ACO.
Figure 5 shows the weekly rewards obtained by optimisation solutions – greedy and ACO. It also shows achieved rewards from different classification methods learned from the optimisation solutions. customised DNN outperformed the other techniques. Interestingly, we found that classification was more accurate on weekends compared to weekdays because weekends average a lower number of violations than weekdays, largely due to less stringent parking rules. Overall, The DNN achieves similar performance to the greedy on this problem as expected.
5.5 Evaluation of Model Components
We also studied the effect of varying the parameter settings of the problem, as measured by classification accuracy and rewards achieved:
- •
Number of nodes: the number of nodes indicates the depth of the search space in orienteering problems. In this case, the number of nodes is also associated with the number of rewards, and the size of the training set;
- •
Minimum time step: we extract training samples from the dataset with a time step. For each time step , we extract a temporal image from the dataset and add it to our training set.
We evaluated our classifiers on graphs with sizes varying from to nodes. Figure 6 shows that the classification accuracy drops significantly as the size of the graph is increased, which suggests that classification methods imitate optimisation solutions well in smaller search spaces. It is possible that in larger search spaces, the limited number of training examples is the limiting factor preventing the classifiers from being able to learn better an approximation of the optimisation routine. Though the rewards increase with the number of nodes in the problem, this is likely the result of greater potential rewards due to the presence of more nodes in the graph. Notably, the gap in performance between the optimisation solution and the classification solutions become larger for these larger problems. The DNN outperformed both the SVM and RF solutions by increasing its total reward along with the graph size. In contrast, the SVM and RF performance dropped under the same conditions. Interestingly, we find that all classifiers achieve higher accuracy when they learn from greedy than ACO. This phenomenon may be caused by the complexity of the optimisation solution. This is planned for future study.
We also evaluated the model on data from a single day with varying time step sizes from to seconds. Figure 7 shows that smaller time steps resulted in better overall accuracy on the validation set, but this did not necessarily translate to better rewards. This is potentially because smaller time steps provided more training data. However, it does not suggest that testing accuracy is higher.
5.6 Computational Complexity Analysis
Finally, we evaluated the computational efficiency of both the optimisation method and the neural network, by varying the number of nodes from to and then measuring the execution time of the program. Figure 1 shows that the running time for customised DNN is much faster than the optimisation algorithm since we exclude the training time. We only consider the test session time as the running time of DNN because the training session can be done offline. That is, we can use historical data to train the DNN before using it. Optimisation methods cannot be applied to the real scenario before we know it. Therefore, it is fair to compare the testing time of DNN and running time of the optimisation methods.
It shows that testing time is significantly shorter than other algorithms and does not change with the number of nodes. Therefore, DNN-based model is much more efficient than optimisation solution in the real-world scenario.
6 Discussion and Future Work
The neural network performed reasonably consistent across all days, though in general, it fared worse compared to the greedy algorithm during the week. Notably, we found that when the performance of the greedy algorithm was low, the gap between the neural network and the optimisation algorithm was very small.
In this paper, we only design a customised DNN for the greedy algorithm which may not suitable for ACO and other optimisation algorithms such as greedy algorithm is one of the simplest optimisation algorithms. Our next goal is to design different neural networks architectures for different optimisation methods and generalise current DNN to more existing optimisation problems.
7 Conclusion
A technique was shown for reformulating an orienteering problem as a classification problem, and using conventional optimisation to generate labels for training. We took finer time slices of the dataset to increase the amount of training data, and this was shown to improve accuracy. We are the first group who design a neural network to approximate the greedy algorithm in TOP. We evaluated on a large real-world dataset by sampling at a different time interval to generate a distinct test set. It was shown that our customised DNN could be used to approximate the greedy algorithm in TOP.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abdel-Moetty [2010] SM Abdel-Moetty. Traveling salesman problem using neural network techniques. In Informatics and Systems (INFOS), 2010 The 7th International Conference on , pages 1–6. IEEE, 2010.
- 2Breiman [2001] Leo Breiman. Random forests. Machine learning , 45(1):5–32, 2001.
- 3Csáji [2001] Balázs Csanád Csáji. Approximation with artificial neural networks. Faculty of Sciences, Etvs Lornd University, Hungary , 24:48, 2001.
- 4Ferrari and Stengel [2005] Silvia Ferrari and Robert F Stengel. Smooth function approximation using neural networks. IEEE Transactions on Neural Networks , 16(1):24–38, 2005.
- 5Fischetti and Jo [2017] Matteo Fischetti and Jason Jo. Deep neural networks as 0-1 mixed integer linear programs: A feasibility study. ar Xiv preprint ar Xiv:1712.06174 , 2017.
- 6Galassi et al. [2018] Andrea Galassi, Michele Lombardi, Paola Mello, and Michela Milano. Model agnostic solution of csps via deep learning: A preliminary study. In International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages 254–262. Springer, 2018.
- 7Hopfield and Tank [1985] John J Hopfield and David W Tank. Neural computation of decisions in optimization problems. Biological cybernetics , 52(3):141–152, 1985.
- 8Ito [1991] Yoshifusa Ito. Approximation of functions on a compact set by finite sums of a sigmoid function without scaling. Neural Networks , 4(6):817–826, 1991.
