Complete-Coverage Path-Planning Algorithm Based on Transition Probability and Learning Perturbation Operator
Xia Wang, Gongshuo Han, Jianing Tang, Zhongbin Dai

TL;DR
This paper introduces a new algorithm for robotic path planning that reduces path length and repetition by using transition probabilities and learning perturbation.
Contribution
The novel CCPP-TPLP algorithm combines transition probability and learning perturbation to improve complete coverage path planning efficiency and quality.
Findings
CCPP-TPLP reduces total path length and repetition rate in robotic coverage tasks.
The algorithm improves planning efficiency and quality compared to five other algorithms.
It performs well in various map environments, including those with height information.
Abstract
To achieve shorter path length and lower repetition rate for robotic complete coverage path planning, a complete-coverage path-planning algorithm based on transition probability and learning perturbation operator (CCPP-TPLP) is proposed. Firstly, according to the adjacency information between nodes, the distance matrix and transition probability matrix of the accessible grid are established, and the optimal initialization path is generated by applying greedy strategy on the transition probability matrix. Secondly, the population is divided into four subgroups, and different degrees of learning perturbation operations are carried out on subgroups to update each path in the population. CCPP-TPLP was tested against five algorithms in different map environments and in the working map environment of electric tractors with height information The results show that CCPP-TPLP can optimize the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18- —the Basic research project of Science and Technology Department of Yunnan Province
- —Open Subject of Yunnan Key Laboratory of Unmanned Autonomous Systems
- —the National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Soil Mechanics and Vehicle Dynamics · Control and Dynamics of Mobile Robots
1. Introduction
In recent years, with the rapid development of intelligent technology and the continuous promotion of industrial wisdom upgrading, intelligent mobile robots have been widely used in more and more fields, such as cargo handling, intelligent production, intelligent life, abnormal environment detection, underwater operation, space exploration and so on [1]. Complete coverage path planning (CCPP) is a key research area in modern robotics, aiming to enable the mobile robot find a collision-free shortest path in a specific environment, meanwhile, it should traverse the entire accessible working area to form a continuous path that encompasses all accessible areas. In CCPP research, based on environmental characteristics, it can be categorized into CCPP problems in static environments and that in dynamic scenarios. Although dynamic scenarios are more common in real-world environments, there are also numerous practical applications in static scenarios. Examples include industrial robotic arms planning coverage paths on fixed workpieces for uniform welding or spraying tasks, automated guided vehicles (AGVs) optimizing paths in warehouses with fixed shelf layouts to cover all cargo access areas, and robots following fixed orchard tree distribution paths for pruning, harvesting, or monitoring. In these scenarios, the environment is typically fixed. Research on static CCPP methods can optimize efficiency and reduce time and energy consumption. Additionally, studies on static environments serve as a foundation for dynamic environment research. Many algorithms for dynamic environments require first addressing basic path planning in static environments before incorporating obstacle avoidance and real-time adjustment modules. If algorithms for static environments are inefficient, their performance in dynamic environments will also be affected.
CCPP algorithms can be categorized into three categories, classical algorithms, heuristic-based algorithms and hybrid algorithm. Classical algorithms are mainly as follows: the random walk algorithm [2], chaotic coverage path planner [3], the boustrophedon coverage algorithm [4], the interior screw algorithm [5], and the domain decomposition algorithm [6]. These algorithms are easy to operate and the calculation speed is fast; however, when it comes to practical application, these algorithms have poor efficiency and a high coverage repetition ratio for the planned path [7]. To solve the above problems, many scholars have improved the classical algorithms. Hou et al. [8] divide the target coverage area into boundary and center regions for separate coverage. The boundary region is covered using a spiral path, which do not generate overlapping; in the center region, two steering paths with low overlap rates are designed, to solve the problem of existing spiral CCPP algorithms not meeting the kinematic constraints of robots and have a high overlap rate in the center region. Chaotic path planning is a prospective modern approach that use chaotic dynamical systems to generate trajectories throughout an environment. Moysis et al. [9] proposed a chaotic path planning method for 3D area coverage using modified Logistic map and a modulo tactic. An area-coverage path planning which is controlled by a novel memristor-based double-stroll chaotic system has been proposed in [10] to generate higher unpredictability, more even and faster scanning path. Chaotic CPP ensures high coverage efficiency in the entire workspace in terms of the robot’s trajectory, guaranteeing faster coverage in the working space because the motion is pre-determined [11].
With the rapid advancement of technologies associated with artificial intelligence, heuristic algorithms have been increasingly and extensively applied in CCPP. By integrating intelligent optimization strategies, this type of algorithms can effectively address the challenge of CCPP in complex environments. It particularly demonstrates substantial advantages in scenarios characterized by dynamic environments, multiple constraints, and high real-time requirements. It mainly includes A* algorithm [12], D* algorithm [13], Theta* algorithm [14], evolutionary algorithms, swarm intelligence algorithms, the biologically inspired neural network algorithms and reinforcement learning, etc. In [15], the objective function is optimized based on evolutionary algorithms such as the genetic algorithm (GA) and ant colony optimization (ACO) of the traveling salesman problem (TSP) and estimates the shortest path that connects all waypoints. Considering full-load travel distance, unloading position distribution and the area of the turning area simultaneously, a full coverage path planning methods of harvesting robot with multi-objective constraints is proposed by using the shuttle method and the improved ACO algorithm [16]. The biologically inspired neural network algorithms do not require training the model, nor do they need to predict environmental information, and have a relatively fast computing speed. Therefore, it has been widely studied and applied by many scholars. In this algorithm, the map is shown as a grid map, and each grid is considered as a neuron. The adjacent neurons are connected, and the robot’s mobile path is chosen by calculating the neuronal activity value. Muthugala et al. [17] put forward an online path planning approach based on the Glasius Bio-inspired Neural network to enhance the energy efficiency and coverage effect of hull maintenance robots. By introducing a comprehensive energy model and a dynamic tracking method, this approach not only boosts the adaptability and energy efficiency in a dynamic environment but also strengthens the global coverage capability. Zhang et al. [18] brought in the domain neuron status criterion, which increased the biologically inspired path planning efficiency near isolated island obstacles. Li et al. [19] raised a 2D adaptive cell decomposition method, which strengthens the positive correlation between the obstacle density and the closure of the map grid. They incorporated the consideration of obstacle distribution and grid level into the bypass equation of the biologically informed neural networks (BINN) algorithm, thereby enhancing the efficiency and effectiveness of the CCPP for the bulldozer. On the other hand, the rapid development of reinforcement learning and deep learning brings a whole new solution to path planning. Elfoly et al. [20] proposed to use a modified Q-learning geometric learning algorithm to generate an optimal path by iteratively exploring and exploiting the state-action space. An improved deep reinforcement learning algorithm, the re-DQN algorithm, is proposed by Wang et al. [21] to solve the traversal order of each region. The reinforcement learning method possesses a strong ability for decision making and can find the global optimal solution; however, it costs much more time during the learning process than a generic algorithm, so its application has not been widely used [7].
Recent trends include the use of hybrid algorithm methods, combination of classical algorithms and heuristic algorithms, or combination of both heuristic algorithms to optimize CCPP solutions. Although the hybrid algorithm incurs a high computation time, it delivers better coverage efficiency. A convolutional neural network (CNN) with long short term memory (LSTM) layer was trained using the actor-critic experience replay (ACER) reinforcement learning algorithm in [22], and a complete area coverage planning module for the modified hTrihex, a honeycomb-shaped tiling robot, based on the deep reinforcement learning technique is proposed. Xu et al. [23] proposed a complete coverage neural network (CCNN) algorithm, which simplified the calculation process of neural activity. It also combined a modified A* algorithm to improve the covering efficiency. Tan et al. [7] presented a biologically inspired neural network algorithm based on Q-learning, which proceeds with path planning based on the biologically inspired neural network algorithm, and when it comes to the position change of a reachable path point, processes path optimization by adopting Q-learning to avoid falling into a locally optimal solution. Cheng et al. [24] proposed an improved PSO combined gray wolf optimization (IPSO-GWO) algorithm with chaos and a new adaptive inertial weight. Qian et al. [25] combined the interior screw search and the biologically inspired neural network algorithm, which reduced the calculating time of the path repetition ratio. Kvitko et al. [26] presented a path-planning algorithm based on the chaotic behavior of the Courbage–Nekorkin neuron model with a coverage control parameter to reduce the number of iterations required to cover the chosen investigated area.
Having said all of above, the current CCPP methods exhibit diverse characteristics when responding to the demands of different scenarios. Classical algorithms are known for their fast computing speed, but they are difficult to meet the requirements of efficient coverage due to the high path repetition rate. The A* algorithm and the D* algorithm are relatively sensitive to heuristic functions, and the computational overhead of the D* algorithm is relatively large. Evolutionary algorithms and swarm intelligence algorithms maintain the excellent global exploration performance of the Monte Carlo method and have a strong local exploitation capability. However, balancing exploration and development is a challenge, and there is a hidden danger of falling into local optimum. The self-learning and parallelism characteristics of the neural network algorithm can improve the coverage efficiency of the CCPP algorithm, while it can realize automatic obstacle avoidance and escape from the deadlocks [27]. however, it is more computationally intensive [19]. Moreover, there is no optimal value for parameters of the neural network model, which can only be determined by repeated experiments [28]. The continuous motion of chaotic CCPP enables the robot to move in searching and finding the target effectively with a more uniform coverage density. However, the existing literature only highlighted the coverage rate, ignoring the cost of coverage time. The unpredictable trajectory is also hugely dependent on the kinematic motion of the robot subjected to kinematic constraint, and it needs to be studied [11]. In addition, the notable dependence of the space coverage rate on the starting point of the autonomous robot’s path is the key shortcoming of chaos-based path-planning methods [26]. The hybrid algorithm attempts to balance efficiency and quality, but the increase in complexity and scene dependence have become new bottlenecks. Therefore, this paper proposes a complete-coverage path-planning algorithm based on transition probability and learning perturbation operator (CCPP-TPLP) to achieve efficient path planning within a concise algorithm framework. In the initialization stage of the algorithm, a greedy initialization strategy based on transition probability is proposed to generate high-quality initial solutions. The creation process of each solution takes both the adjacent properties of the grid and the requirement of the shortest path distance into account, ensuring that each initial solution is approximate optimal and laying a solid foundation for the subsequent optimization of the algorithm. A learning perturbation operator is proposed for algorithm iteration, thereby enhancing the diversity of the population and the convergence speed. Simulation results show that compared with five representative optimization algorithms, CCPP-TPLP achieves the best effect of complete coverage path planning under various obstacle densities and map scales. In the CCPP problem of an electric tractor in complex agricultural terrain, the complete coverage optimal path with the lowest energy consumption, shortest path length, and lowest repetition rate can be acquired by CCPP-TPLP.
This paper is structured as follows: Section 1 presents the problem description of CCPP; Section 2 introduces the related distance calculation method; Section 3 describes the complete coverage path planning method based on transfer probability and learning perturbation operator; Section 4 validates the effectiveness of the proposed algorithm in this paper through the ablation experiments, algorithmic comparison experiments, and algorithmic simulation experiments in 3D maps of motorized tractors; and Section 5 concludes with the summary and discussion.
2. Description of the Problem
CCPP pertains to obtaining the shortest path that traverses all areas except for obstacles within a closed region. The commonly employed map construction methods include the grid method [29], the topological method [30], and the geometric method [31]. Among them, grid maps possess duality and can represent the occupancy of obstacles in the workspace, thus being widely utilized in robot path planning. In this paper, the grid method is employed to quantize the robot’s working environment into several grids. The obstacles are represented by regular and complete grids, and two state values of 0 and 1 are used to represent the open space grid and the obstacle grid respectively.
In the grid map, the size of the grid is set to the size of the robot’s own coverage area. It is assumed that once the robot passes through the center of the grid, the grid is regarded as having been covered. The grid map consists of rows and columns. The row and column coordinates corresponding to the grid are denoted by , where represents the sequence number of the grid. Then the transformation relationship between and grid coordinates is presented in Equation (1).
where represents the number of grids contained in a column of the map. Let denote the set of all grid indices, and represents the maximum number of grid. denote the set of subsets formed by picking s elements from the set with replacement, which . If the set , then the number of elements in is , and the total number of complete permutations for all elements in is . Let be a complete permutation of all elements in . It is known that . represents a path formed by connecting the grids in sequentially. Then, the mathematical model of the CCPP problem can be expressed as shown in Equation (2).
where denotes the path length of the path . Since already contains all grid indices, and is the set of subsets formed by picking s elements from with replacement, in other words, every element of has already appeared once in , thus, the smaller is, the fewer overlapping elements exist between and are, the less the number of repeated grids in the path is, and the lower repetition rate of the nodes in the path is.
3. Basic Theory
At present, the frequently employed distance calculation methods in the CCPP problem comprise Euclidean algorithm [32], Manhattan distance algorithm [33] nearest neighbor algorithm [34], Dijkstra algorithm [35], Floyd algorithm [36], and so forth. Among them, Euclidean algorithm and nearest neighbor algorithm are employed to calculate the straight-line distance between two points, regardless of the existence of obstacles. However, they do not give thought to the existence of obstacles. Manhattan distance algorithm merely reflects on the actual total moving distance from one point to another in the horizontal and vertical directions, without considering diagonal movement and the presence of obstacles. Dijkstra algorithm addresses the single-source shortest path problem, where each execution calculates the shortest paths from one starting node to all other nodes in the graph. While Floyd algorithm solves the all-pairs shortest path problem, computing the shortest paths between all node pairs in a single run. In the CCPP problem for grid maps, assuming the total number of grids is , it is necessary to obtain the shortest distances between all nodes. If Dijkstra algorithm is used, the time complexity for a single execution is . Calculate the shortest distances between all nodes requires running Dijkstra algorithm once for each node as the starting point, resulting in a total time complexity of . In contrast, Floyd algorithm computes all-pairs shortest paths in a single run with a fixed time complexity of . Since Dijkstra algorithm exhibits significantly reduced computational efficiency in grid maps with a large number of nodes and may lead to local optima in complex scenarios, this work adopts Floyd algorithm to calculate the shortest distances between grids.
Floyd algorithm is a classic algorithm for addressing the shortest path problem in weighted networks, and its core lies in an idea of gradual approximation. By gradually incorporating intermediate nodes between any two nodes, the initial direct path distance between the two nodes is gradually extending to the path distance connected by multiple intermediate nodes, and the shortest path is chosen and retained throughout the process, so as to obtain the shortest distance between the two nodes. The basic steps are as follows.
(1)Initialize the distance matrix
The distance matrix is constructed, where is the number of nodes in the graph, and denotes the distance from grid to grid . If there is a direct edge connection between and , represents the weight of the edge. If there is no weight value, then . If there is no direct edge connecting and , then . Specifically, if , then .
(2)Update the distance matrix
For each pair of grids and , and for each potential intermediate grid , let iterates from to to update the shortest distance between grids and . When the shortest distance between and can be attained via the intermediate node , then is updated, and the update formula is Equation (3).
After the traversal and iteration of all nodes, the distance matrix is updated, and represents the shortest path from node to node .
4. CCPP Method Based on Transition Probability and Learning Perturbation Operator
4.1. Greedy Initialization Strategy Based on Transition Probability
In the path planning problem, if the population is randomly initialized, that is, a complete coverage path is randomly generated, the length of the initial path may be long, and too many nodes are visited repeatedly. Utilizing such a path as the initial one to participate in the subsequent iteration will significantly reduce the search efficiency of the algorithm. If the length of the initial path is short, the quality of the initial population can be improved. To this end, this paper proposes a greedy initialization strategy based on transition probability. At first, the map is transformed into a grid map. Then for all accessible grids, the shortest distance between any two grids is calculated by Floyd algorithm. Finally, during the path initialization process, the subsequent accessible grids are selected one by one from the starting grid. The grids closer to the current grid have a higher probability of being selected, thereby obtaining a short initial path.
The specific implementation procedures of the greedy initialization strategy based on transition probability are as follows.
(1)State matrix and adjacency matrix
The map is converted into a grid map, and it is assumed that the processed map contains N × M grids, as depicted in Figure 1a, where the black grids represent obstacles and the white grids represent accessible grids. The state matrix of the map is constructed based on whether there is obstacle in the grid, as shown in Figure 1b.
represents the element located in row and column of the state matrix , , . The state matrix is constructed by Equation (4).
Based on the state matrix , the adjacency matrix of the accessible grids is further constructed. Suppose that there are elements with a value of 0 in , that is, there are accessible grids. The adjacency matrix with size of needs to be established, where the element located in row and column of is denoted as and represents the adjacent relationship between grid and grid . Suppose that the position indexes of grid and grid in the state matrix are and respectively, the Manhattan distance between grid and grid is calculated by Equation (5) to determine their adjacent relationship, then the adjacency matrix is constructed.
(2)Distance matrix and transition probability matrix
For all accessible grids, the adjacency matrix and Floyd algorithm are employed to calculate the shortest distance between any two grids and obtain the distance matrix , as shown in Equation (6).
Here, represents the shortest distance between grid and grid . Based on the distance matrix , the transition probability matrix is defined as presented in Equation (7).
where represents the probability of transfer from grid to grid , which is calculated in accordance with Equation (8).
where is defined as a greedy factor. The smaller is, the larger is, and the higher the probability that grid will be selected when the next transfer grid is chosen from grid .
(3)Population initialization of greedy strategy
Since there are accessible grids, the individual of creating population is , which represents a path covering all accessible grids, and its arbitrary dimension variable is the sequence number of accessible grids. is the sequence number of specified starting grid. To complete the individual initialization, set the degree of greediness and generate the random number . If , grid is selected as in accordance with Equation (9).
Otherwise, the cumulative sum probability of each row in is computed, and a accessible grid is chosen as through the roulette—wheel selection method.
According to the aforesaid method, the subsequent grid numbers are determined in turn to complete the initialization of the individual. Populations are created based on the population size and the dimensions of decision variables. For each individual initialization, random number is generated. If , an individual is generated based on the starting point and transition probability matrix , using the above-mentioned greedy strategy. Otherwise, each decision variable is randomly generated to construct a random individual. By setting the degree of greediness , the algorithm can strike a balance between the greedy strategy and the random strategy. If is low, the algorithm is more inclined towards the greedy strategy, which is conducive to finding high-quality solutions rapidly. While is high, the algorithm prefers random selection, which is conducive to increasing the diversity of the population and avoiding premature convergence.
Through the greedy initialization strategy based on transition probability, a high-quality initial population can be constructed simply and effectively. The creation process of each individual takes into account the adjacent characteristics of grids and the requirement that the path length must be the shortest, ensuring that each individual is an approximately optimal solution.
4.2. The Operation of Learning and Perturbation
To update the population, this paper proposes a learning perturbation operator to enhance the diversity and convergence of the population and assist the algorithm in escaping the local optimum. The objective function of CCPP is the shortest path length, and the main factor influencing the path length is the selected sequence of accessible grids. When the selection order of the accessible grids varies, it will result in obvious differences in the path length. The individuals are classified into different subgroups based on their fitness. Subsequently, different degrees of learning or perturbation strategies are employed for individuals in different subgroups to alter the sequence of the accessible grid, thereby influencing the fitness value of each individual. The specific implementation procedures are as follows.
(1)Hierarchical division of the population
The individuals within the population are managed in a hierarchical manner based on their fitness values. This hierarchy can assist the algorithm in exploring the solution space more effectively, enhance the search efficiency, and facilitate the maintenance of population diversity. In this paper, the individuals within the population are sorted based on the fitness value from small to large, and the individual possessing the best fitness value is classified into the first subgroup, namely, .The remaining individuals are categorized into three subgroups, namely , and . The proportion of the hierarchy can be set according to the need. Different update strategies will be employed for individuals in different subgroups. Figure 2 presents a schematic illustration of the hierarchical division of the population, with different colors signifying different subgroups.
(2)Perturbation operation
Perturbation operation includes strong perturbation and weak perturbation. The aim of strong perturbation is to cause individuals to undergo significant random changes, thereby encouraging them to escape from the local optimal state as much as possible and explore a new solution space. While the weak perturbation is intended to decelerate the convergence process of the individual to the current optimum, thereby expecting to discover new and better solutions.
For , because of its optimal fitness value, it has a strong guiding effect on the entire population. To prevent the population from falling into the local optimum, the order of the accessible grid of the individual in is randomly adjusted to cause a significant change in its fitness value, which is defined as the strong perturbation operation. Take Figure 3 as an example. Suppose that the current individual is x, and the accessible grid index to be operated is k. Firstly, random number rand3 within the range of [1,K] is generated. Then the accessible grid indexed by rand3 in x and the accessible indexed by k are swapped to complete the strong perturbation operation and obtain a new individual X_new_. As depicted in Figure 3, the calculation of the fitness value of x encompasses the shortest distance between grid 8 and grid 15, between grid 15 and grid 12, between grid 14 and grid 7, as well as between grid 7 and grid 10. However, for the perturbed individual X_new_, the above-mentioned four distances have transformed into the shortest distance between grid 8 and grid 7, between grid 7 and grid 12, between grid 14 and grid 15, and between grid 15 and grid 10. It can be observed that the strong perturbation operation alters the order of the accessible grids in the original individual to the greatest extent, and it will affect the shortest path among the four grids, thus having the greatest impact on the individual’s fitness value.
For , which is the farthest from the current optimum, in order to slow down its convergence towards the current optimum and thereby have the opportunity to discover new and more optimal solutions, it is operated with weak perturbation. Take Figure 4 as an example. Suppose that the current individual is , and the accessible grid index to be operated is . Generate a random number between , all the accessible grids between the position of the index in and the position of the index are reordered in reverse order to complete the weak perturbation operation, so as to obtain a new individual . In Figure 4, the fitness value calculation of encompasses the shortest distances between grid 8 and grid 15, as well as between grid 7 and grid 10. However, for , the above two distances turn into the shortest distance between grid 8 and grid 7, and between grid 15 and grid 10. It can be observed that the change in the order of the accessible grid in the original individual resulting from the weak perturbation operation is less significant than that from the strong perturbation operation, for it only leads to the alteration of two distances. Therefore, the impact on the individual fitness value is minor.
(3)Learning operation
Learning operation includes learning from the optimal individual and learning from the random individual. For , to better maintain excellent genes, it should learn from the best individual . However, for , to slow down its convergence to the current optimum, it is made to learn from random individuals within the population. Learning is accomplished through gene fragment learning, the specific learning operation is depicted in Figure 5. Suppose that the current individual is , the individual being learned is , the index of the accessible grid to be operated in is , whose grid sequence number is . In , locate index , whose grid sequence number is also , select the grid sequence number , whose index is in , and then return to the to find the index , whose grid sequence number is in . Finally, all the accessible grids in between the position of the index and the position of the index are reordered in reverse order, such that and become adjacent accessible grids in . It can be observed that through the learning operation, the individual can acquire part of the gene fragment from the learning object. If the learning object is , the learned gene fragment might be an outstanding gene combination. And if the learning object is a random individual, the diversity of the individual can be enhanced.
4.3. Steps of the Algorithm
The implementation steps of the complete-coverage path-planning algorithm based on transition probability and learning perturbation operator are as follows.
Step 1: Define the iteration number , the maximum iteration number , the population size , the degree of greediness , and the greedy factor . Obtain the map information, set the starting and ending points of the path, and calculate the state matrix of the map by means of Equation (4). The adjacency matrix and distance matrix of the map are respectively calculated by Equation (5) and Equation (6). The transition probability matrix for each accessible grid transferring to other accessible grids is computed by Equation (7).
Step 2: Create an individual , and utilize the specified starting grid number as the first element of , i.e., . Generate the random number . If , then employ the greedy initialization strategy based on transition probability to generate the individual and proceed to Step 3. Otherwise, the individual is randomly generated and proceed to Step 4.
Step 3: Generate the random number . If , the next node in is generated according to the transition probability matrix and Equation (9). Otherwise, the cumulative sum probability of each row of is calculated, and an accessible grid is selected as the next node in by using the roulette—wheel selection method.
Step 4: Determine whether the individual encompasses all accessible grids. If yes, go to Step 5. Otherwise, execute Step 3.
Step 5: If the population size reaches , Repair population to guarantee that the starting grids and end grids of all individuals meet the path requirements. Then Step 6 will be executed. If not, Step 2 will be executed and new individuals will be generated.
Step 6: If , go to Step 9. Otherwise, go to Step 7.
Step 7: Arrange the fitness of all individuals in descending order and divide the population into four subgroups according to this order. Different learning or perturbation operations are carried out for individuals in different subgroups to generate a new population.
Step 8: Repair population to guarantee that the starting grids and end grids of all individuals meet the path requirements. Increase the number of iterations by one and return to Step 6.
Step 9: Output the optimal solution of the population.
The repair operation mentioned in Steps 5 and 8 is designed to ensure that the starting grids and ending grids of all individuals comply with the path requirement. For each individual in the population, it is necessary to guarantee that the path begins at the designated starting grid and terminates at the specified ending grid. As illustrated in the figure below, assuming the map’s starting point is grid 1 and the endpoint is grid 10, a generated path shown in Figure 6 does not start at grid 1 nor end at grid 10. This requires repair operations, which is swapping the position of grid 1 with the first node in the path and exchanging grid 10 with the last node in the path, as demonstrated in Figure 7. Through these repair operations, the starting grid is placed at the beginning, and the ending grid is positioned at the end of the path. Figure 8 presents the flowchart of the CCPP-TPLP algorithm.
5. Simulation Experiments and Analysis
In this section, the experimental setup is initially presented, including the experimental environment, the selected comparison algorithm, the settings of algorithm parameters, and the evaluation metrics. Then, the greedy initialization strategy based on the transition probability and the learning perturbation operator are verified independently. Finally, the algorithm is compared with representative optimization algorithms such as ant colony optimization (ACO) [37], grey wolf optimizer (GWO) [38], student psychology based optimization (SPBO) [39], discrete just another yet another (DJAYA) [40], and discrete tree seed algorithm (DTSA) [41] under different environment to verify their optimization effect on CCPP problem. In addition, to verify the algorithm’s capacity in solving practical issues, the CCPP problem of an electric tractor is addressed, and the solution outcomes of each algorithm are compared and analyzed.
5.1. Experimental Setting
(1)Experimental Environment
All experiments in this paper are performed on a PC with AMD Ryzen 7 5800H with Radeon Graphics, 3.20 GHz CPU, 16 GB RAM, and Windows 11 MATLAB R2021b.
(2)Comparing algorithm parameter settings
The parameter settings of the five algorithms selected in the comparative experiment are shown in Table 1.
- (3)Evaluation indicators
The performance of the algorithm is evaluated by means of four indicators: coverage rate, coverage repeat rate, path length, and Nic.
- Coverage rate
The coverage rate depicts the percentage of the area covered by the path. Since the entire map includes accessible area and obstacle area, the algorithm aims to achieve complete coverage of the accessible grid. The entire accessible area, the area covered by the path, and the obstacle area are respectively denoted as , , and , and the coverage rate ( ) is defined by Equation (10).
Coverage repeat rate
The coverage repeat rate is the ratio of the area repeatedly covered in the path to the area of the accessible region. The definition formula of the regional repetition rate is given in Equation (11).
Path length
The path length is also the objective function defined in Section 1.
The number of iterations to convergence (Nic)
Nic refers to the number of iterations when the algorithm converges.
(4)Map environment
To comprehensively assess the performance of the CCPP-TPLP in diverse environments, four grid maps with different sizes and complexities are devised, whose complexity is characterized by three aspects: the map size, the proportion of accessible area, and the number of discrete obstacles. The proportion of accessible area represents the ratio of the number of accessible grids to the total number of grids in the entire map. The quantity of discrete obstacles represents the number of independent obstacles shown on the map. The specific parameters of the map environment are presented in Table 2. Parameters of four map environments, and the schematic diagrams of the maps are shown in Figure 9.
5.2. Validation and Analysis of the Ablation Experiments
5.2.1. Verification of the Greedy Initialization Strategy Based on Transition Probability
In order to verify the effectiveness of the greedy initialization strategy based on transition probability, the random initialization strategy and the initialization strategy of ACO are selected for comparison experiments. Set up a 20 m × 20 m two-dimensional map with 42 grids serving as obstacles for the experiment. The initialization paths are respectively generated by the random initialization strategy, the initialization strategy of ACO, and the greedy initialization strategy based on transition probability. The population size is set at 50, and each strategy is independently repeated 30 times for the experiment. In each experiment, the length of the shortest path in the initial population generated by each strategy is recorded. The 30 path results for each strategy are averaged to obtain the results in Table 3. The optimal paths generated by the three strategies are respectively plotted in Figure 10 where the green circle indicates the starting grid, the red circle represents the ending grid, the red arrow indicates the moving direction, the white grid represents the accessible grid, the yellow grid represents the repeated passing grids, and the black grid represents the obstacle grid. The more yellow grids there are, the higher the node repetition rate of the path is. From Figure 10 and Table 3, the following conclusions can be drawn:
(1)As can be seen from Figure 10 that the paths generated by the random initialization strategy and the initialization strategy of ACO contain numerous repeated passing grids, while the paths obtained by the greedy initialization strategy based on transition probability have relatively few repeated passing grids. This is because the random initialization strategy fails to take into account the relationship between grids, and the path constructed by it has strong randomness. Although ACO employs pheromones to guide the search process, in the initialization phase, the pheromone concentration on all paths is initialized to the same value, leading to a lack of guidance in the initial search. The greedy initialization strategy based on transition probability presented in this paper takes into account both the adjacent properties of grids and the requirement of the shortest path distance when selecting accessible grids, ensuring that the generated path length is shorter.(2)It is observable from Table 3 that the initial paths of the three strategies can achieve 100% coverage. Among them, the greedy initialization strategy based on transition probability performs best in terms of path length and , and can obtain initial paths with higher quality. Moreover, the standard deviation of the 30 paths obtained by this strategy is the smallest, indicating that this strategy demonstrates higher stability and reliability in multiple experiments.
In summary, through the comparative analysis of the three initialization strategies, the greedy initialization strategy based on transition probability demonstrates superior performance in terms of path length and . This strategy effectively utilizes the distance and adjacency information between grids, significantly optimizes the selection of grids, reduces the repetition of paths, and thereby enhances the efficiency and quality of initial path planning.
5.2.2. Validation of the Learning Perturbation Operation
In order to validate the effectiveness of the learning perturbation operation, ACO and SPBO algorithms are chosen for comparative experiments. Set up a 20 m × 20 m two-dimensional map with 42 grids serving as obstacles for the experiment. The initial population is generated by the ACO algorithm, and the population size is 50. For the same initial population, ACO, SPBO, and learning perturbation operation strategy are respectively employed to update the population, and the number of update iterations is 100 as the same. Each strategy is independently repeated 20 times for the experiment. In each experiment, the length of the shortest path in the population updated by each strategy is recorded. The 20 path results for each strategy are averaged to obtain the results in Table 4. The optimal paths obtained by the three strategies are plotted in Figure 11 respectively.
Figure 11 and Table 4 display that the proposed learning perturbation operation has more advantages for population renewal. For the same initial population, through the update of the learning perturbation operation, the population with lowest and path length can be obtained, and the standard deviation results of the above two indicators are the smallest, indicating that the stability of this strategy is also better.
5.2.3. Impact of Parameters on Algorithm Performance
In order to verify the impact of the greedy factor , the degree of greediness and different perturbation strategies on the performance of the CCPP-TPLP algorithm, set up a 20 m × 20 m two-dimensional map with 42 grids and a 30 m × 30 m two-dimensional map with 90 grids serving as obstacles for the experiment. Four perturbation strategies are defined as ① Both and adopt strong perturbation, ② Both and adopt weak perturbation, ③ adopts weak perturbation and adopts strong perturbation, and ④ adopts strong perturbation and adopts weak perturbation. The population size is 50, and each experiment of parameter value is independently repeated 10 times. The results of the optimal path found by the CCPP-TPLP algorithm under different parameter values or strategies are recorded and the 10 path results obtained under each parameter value or strategy are averaged to obtain the results in Table 5, Table 6 and Table 7.
It can be seen from Table 5, in Environment I, when is set to 5, the algorithm achieves optimal mean values of , path length and Nic with the smallest standard deviations, indicating that the CCPP-TPLP algorithm attains the best performance in optimality, stability, and convergence. In Environment II, while the mean values of , path length and Nic remain optimal when = 5, the standard deviations of and path length are not minimized, suggesting that although the algorithm maintains the best performance in optimality and convergence, its stability is slightly compromised. Overall, the comprehensive performance of the CCPP-TPLP algorithm is optimal when is set to 5. Therefore, the value of the greedy factor is set to 5 in the subsequent experiments.
It can be known from Table 6, in Environment I, when is set to 1, the algorithm’s , path length and Nic achieve the optimal mean values with the smallest standard deviations, indicating that the CCPP-TPLP algorithm reaches the best performance in optimality, stability and convergence. In Environment II, while the mean values of , Path length and Nic remain optimal at = 1, the standard deviations of and path length are not the smallest, suggesting that the algorithm maintains the best performance in optimality and convergence, but experiences a slight decline in stability. Overall, the CCPP-TPLP algorithm exhibits the best comprehensive performance when is set to 1. Therefore, the value of is set to 1 in the subsequent experiments.
As shown in Table 7, applying different perturbation strategies to and affects the algorithm’s performance. In Environment I, perturbation strategy ④ achieves the optimal mean values of , path length and Nic, but the standard deviations of and path length are not the smallest. This indicates that the CCPP-TPLP algorithm maintains the best performance in optimality and convergence, though its stability is slightly compromised. In Environment II, perturbation strategy ④ yields optimal mean values and standard deviations of and path length, demonstrating that the algorithm achieves the best performance in both optimality and stability. However, Nic result is suboptimal, suggesting a slight impact on convergence speed. Therefore, the perturbation strategy ④ is adopted in the subsequent experiments.
The t-test is carried on the ablation study results regarding the greedy factor , the degree of greediness and perturbation strategy. Specifically, comparative analyses are performed between results of each parameter group (excluding the optimal results) and the optimal results to examine the significance difference under various parameter configurations. The corresponding test outcomes are presented in Table 8, Table 9 and Table 10.
According to Table 8, in Environment I, the p-values of Nic are all greater than 0.05, indicating that has no significant influence on the convergence efficiency of the algorithm However, the p-values of path length and are mostly below 0.05 across different values of , suggesting that significantly influences path length and . In Environment II, the p-values for path length and exceed 0.05, implying a weaker impact of on these metrics. Conversely, the p-value of Nic drops below 0.05, demonstrating that significantly affects Nic in this scenario. Overall, the significant influence law of on path length, and Nic across environments is not obvious.
It can be seen from Table 9, in Environment I and Environment II, the p-values of path length and are all below 0.05, indicating that the value of has a significant impact on path length and , while no significant effect on convergence speed is observed.
It can be known from Table 10, compared with perturbation strategy ① and ③, perturbation strategy ④ demonstrates a significant influence on algorithm performance. However, when compared to perturbation strategy ②, the differences caused by perturbation strategy ④ are not statistically significant.
5.2.4. Impact of Population Hierarchical Division Ratio on Algorithm Performance
In order to verify the impact of the population hierarchical division ratio on CCPP-TPLP’s performance, set up a 20 m × 20 m two-dimensional map with 42 grids serving as obstacles for the experiment, and the population size is 50 and the number of iterations is 600. The population is divided into four subgroups, among which the first subgroup is the optimal individual and is a single individual. Thus, the influence of the division ratios of the other three subgroups on the performance of the algorithm is mainly examined. The population is divided by adopting different division ratios respectively, and the experiment is independently repeated 10 times for each division ratio. The results of the optimal path found by the CCPP-TPLP algorithm under different division ratios are recorded and the 10 path results obtained under each division ratio are averaged to obtain the results in Table 11.
It can be observed from Table 11 that the population hierarchical division ratio has a minor influence on the path length. When the division ratio is 2:2:1, is the lowest and the path length is also the lowest. In the subsequent experiments, the population hierarchical division ratio of the three subgroups is set as 2:2:1.
5.3. Comparative Experimental Results and Analysis of CCPP-TPLP
In order to validate the performance of the CCPP-TPLP algorithm proposed in this paper in path planning, the CCPP-TPLP algorithm along with ACO, GWO, SPBO, DJAYA, and DTSA algorithms are respectively tested in four grid maps. The population size is 50, and the number of iterations is 600. Each experiment of different algorithms is independently repeated 10 times. Figure 12, Figure 13, Figure 14 and Figure 15 respectively present the optimal path obtained by each algorithm in environment I, environment II, environment III, and environment IV. The convergence curves of each algorithm in the four map environments are shown in Figure 16.
From Figure 12, Figure 13, Figure 14 and Figure 15 and Table 12, the following conclusions can be drawn:
(1)The CCPP-TPLP algorithm demonstrates significant advantages in the two indicators of and path length. Specifically, in comparison with other algorithms, of the CCPP-TPLP algorithm is reduced by 1.1416.26%, and the path length is reduced by 1.1815.75% in environment I. In environment II, of the CCPP-TPLP algorithm is decreased by 3.2522.99%, and the path length is reduced by 3.1620.29%. In environment III, of the CCPP-TPLP algorithm is reduced by 4.0524.68%, and the path length is decreased by 4.0222.70%. In environment IV, of the CCPP-TPLP algorithm is decreased by 4.3523.07%, and the path length is reduced by 4.5422.23%. As the complexity of the environment increases, the performance advantages of the CCPP-TPLP algorithm become more evident.(2)As shown in Figure 16, both the convergence speed and convergence accuracy of the CCPP-TPLP algorithm are superior to those of the other five comparison algorithms.
5.4. CCPP Problem of Electric Tractor
Literature [42] takes the three-dimensional path space as the research object, uses the grid method to store the height information of the three-dimensional path, and combines the rectangular coordinate method and the sequence number method to construct the 2.5-dimensional working environment model of agricultural electric tractor. The composite fitness function is constructed by weighting the flat driving path length and the total height difference. For this model, the CCPP-TPLP algorithm and the five algorithms, i.e., ACO, GWO, SPBO, DJAYA, and DTSA, are respectively employed for optimization.
5.4.1. Settings of Height Information in the Map
Any point on the tractor’s operation path can be expressed as discrete coordinates , where contains height information. In the rasterized two-dimensional plane model of the tractor working environment, height information is set for each grid and stored in a corresponding two-dimensional matrix. The tractor working environment generated in literature [42] is adopted, as shown in Figure 17.
5.4.2. Fitness Function
The fitness function fully considers the energy consumption constraint, which is mainly related to the flat driving path length, the number of turns, and the total total elevation difference. The details are as follows.
The relationship between the length of the flat driving path length and energy consumption component is characterized as Equation (12),
Among them, the definitions of each variable are presented in Equation (13), Equation (14) and Equation (15).
where represents the mass of the tractor, indicates the gravitational acceleration, signifies the slope, expresses the friction coefficient, denotes the height value of the grid , and symbolizes the coordinate distance between two adjacent grids.
The relationship between the total height difference and the energy consumption component is defined as Equation (16).
During the turning process of the tractor, the turning time can be used to indicate the turning consumption of the tractor, and the turning time is directly proportional to the number of turns. Therefore, the tractor’s turning energy consumption can be indicated by the number of turns. The relationship between the number of turns and the energy consumption component can be expressed by Equation (17).
Among them, is the energy consumed in one turn, which is calculated by Equations (18) and (19).
where represents the time of one turn, indicates the voltage of the tractor, indicates the current consumed by the tractor in one turn, and represents the speed of the tractor during turning.
In summary, the total energy consumption of tractor driving can be expressed as Equation (20).
On this basis, the fitness function of the optimization model can be gained by Equation (21).
Among them, , it can be observed that the greater the value of the fitness function is, the lower the total energy consumption of the tractor will be.
5.4.3. Experimental Results and Analysis
In the experiment, the population size is 50 and the number of iterations is 600, , [43], , , , , [44], , , [45]. Each experiment of different algorithms is independently repeated 20 times. The tractor is capable of traveling only in the four directions: forward, backward, left, and right. The results of 20 repeated independent experiments are statistically presented in Table 13, and the optimal paths obtained by each algorithm are shown in Figure 18.
It can be seen from Table 13 and Figure 18. that the CCPP-TPLP algorithm proposed in this paper has achieved the best results in terms of , path length and fitness value. Particularly in terms of , the CCPP-TPLP algorithm is significantly lower than other algorithms, only 4.74%, which is reduced by 1.05~12.91% compared with other algorithms. The experimental results indicate that the CCPP-TPLP algorithm can acquire the optimal complete coverage path with the lowest energy consumption, the shortest flat driving path length, and the lowest in the CCPP problems of electric tractors.
6. Summary and Discussion
In this paper, a complete-coverage path-planning algorithm based on transition probability and learning perturbation operator (CCPP-TPLP) is proposed. Its innovation is demonstrated in two aspects: One is the greedy initialization strategy based on transition probability, and the other is the learning perturbation operation for achieving population update. The shortest distance between each pair of accessible grids in the map is computed, and the grid closer to the current grid is selected as the next path node by employing the greedy strategy, thereby generating higher-quality initial path. On the basis of fitness value ranking, the individuals are classified into different subgroups, and different learning or perturbation operations are adopted for different subgroup to accomplish the update of population. Through ablation experiments, it is confirmed that the greedy initialization strategy based on transition probability enhances the efficiency and quality of initial path planning in terms of path length and . Through the update of learning perturbation operation, a population with a lower and shorter path length can be obtained. Compare with five representative algorithms, CCPP-TPLP demonstrates obvious advantages in the path planning in various map environments. Tests are carried out on the multi-objective weighted CCPP problem of electric tractor, and the findings indicate that CCPP-TPLP could obtain the optimal complete coverage path with the lowest energy consumption, the shortest path length, and the lowest repetition rate. A future research direction lies in exploring the potential of the proposed algorithm in multi-robot cooperative path planning issues, with the aim of achieving parallel computing in a multi-map environment to simultaneously find the optimal path for multiple robots.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Zhou C. Huang B. Fränti P. A review of motion planning algorithms for intelligent robots J. Intell. Manuf.20223338742410.1007/s 10845-021-01867-z · doi ↗
- 2Kegeleirs M. Garzón Ramos D. Birattari M. Random walk exploration for swarm mapping Proceedings of the Annual Conference Towards Autonomous Robotic Systems London, UK 3–5 July 2019 Springer International Publishing Cham, Switzerland 2019211222
- 3Ahuraka F. Mcnamee P. Wang Q. Ahmadabadi Z.N. Hudack J. Chaotic motion planning for mobile robots: Progress, challenges, and opportunities IEEE Access 20231113491713493910.1109/ACCESS.2023.3337371 · doi ↗
- 4Viet H.H. Dang V.H. Laskar M.N.U. Chung T. BA*: An online complete coverage algorithm for cleaning robots Appl. Intell.20133921723510.1007/s 10489-012-0406-4 · doi ↗
- 5Zheng R. Liu Z.P. Yi B. Yang Y. Adaptive spiral path planning method for additive manufacturing Comput. Integr. Manuf. Syst.20212720162022
- 6Zhou L. Wang Y. Zhang X. Yang C.Y. Complete coverage path planning of mobile robot on abandoned mine land Chin. J. Eng.20204212201228
- 7Tan X. Han L. Gong H. Wu Q. Biologically inspired complete coverage path planning algorithm based on Q-learning Sensors 202323464710.3390/s 2310464737430561 PMC 10222496 · doi ↗ · pubmed ↗
- 8Hou T. Li J. Pei X. Wang H. Liu T. A spiral coverage path planning algorithm for nonomnidirectional robots J. Field Robot.202510.1002/rob.22516 · doi ↗
