Basic Pathfinding Explained With Python

How do we find a way through a maze? What’s the shortest drive from our place to the nearest pizzeria? Can we move the game character to the exit without walking through a wall?

Pathfinding is a common programming challenge with a wide range of uses. We know it mainly from navigation and games. However, once you know the core algorithms, you'll find that they apply to more abstract optimization and sequencing problems.

In this tutorial, we'll look at a basic pathfinding algorithm, based on Dijkstra's algorithm. Also known as a best-first search algorithm, the core logic is shared with many algorithms, such as A*, flood filling, and Voronoi diagrams.

Here, we consider a practical application. You only need basic programming and Python knowledge to follow along.

Python Setup

The code for this tutorial is located in the path-finding repository. You should clone that repository and switch to the tutorial_1 branch. Also install the pygame package, which is required for the graphics.

python3 -m pip install -U pygame
git clone git@github.com:mortoray/path-finding.git
cd path-finding
git checkout path_finding_1

To verify you're set up correctly:

python3 find-basic.py

You should see a window with boxes and numbers in it. You can close this window now. We'll get back to it later.

The Maze

Pathfinding is about getting from location A to location B. This might be a person walking through a park, a car driving through a city, or a game character tracking the player. Let's reduce all these environments to an abstract and call it a maze.

In this tutorial, we’ll create an Euclidean maze, which is a two-dimensional grid of cells. In the maze, we can only move in four directions to the immediately neighboring cells. We will need to navigate from a start point to an end point.

In the diagram, the starting point is marked with "0" and a yellow box. All valid moves are marked with a "1". We must navigate to the green box, which is our destination.

Starting the code

The code provides a function that creates this basic maze for us. We can get into how we can use path-finding to generate more exciting mazes in a future article, but for now, let’s call the create_wall_maze function.

import mortoray_path_finding as mpf

maze = mpf.create_wall_maze( 20, 12 )

We’ve created a maze of size 20x12. The maze.board field is a grid of Cell objects. We need a way to display our maze.

finder = mpf.Finder()
finder.set_board(maze.board)
finder.run()

Source File: tutorial_1_1.py

The Finder is a utility that displays the maze for us. The gray squares are open, meaning a path can go through. The brown squares are walls, which means a path cannot go through. Each cell also has a zero in it, this is the Cell.count value for that position. We'll use that to do pathfinding.

Measuring distance

A lot of path-finding comes from Dijkstra's original algorithm. There are many variations, such as Floyd Warshall’s algorithm or B*. They share a similar approach by using lists of nodes and distance counts. Various techniques can be mixed to address a variety of situations.

The algorithm we’re currently looking at works by measuring the distance from a starting location. As we visit cells, we'll track how many steps it took to get there, with each iteration of the algorithm calculating the distance. Eventually, we'll either find the distance to the destination cell, or have measured the distance of all cells in the grid.

Below is the general process. Each step is interdependent with the others, so it can be hard to understand each step on their own. You may have to refer back to this list a few times when reading before seeing how all the steps fit together.

Get the node with the lowest distance from the open node list
Calculate the distance to each neighboring node
If the neighbor has a lower distance, add it to the open node list

Terms

Let’s look at a few of the terms first, as they may be new to you.

A “node” is a generic term that applies to all graph types. These can be the intersections of roads on a map, or in our case, the cells of a maze. They have a location and a path to nearby nodes. They are also called “vertex” in graph theory.

The nodes connected to each other by a path are neighbors. The path between neighbors has a distance, which is called a “cost” in more generic terms. The path is called an “edge” in graph theory.

An “open node list” is literally a list of nodes. The “open” means the algorithm still needs to check them. For example, you’re looking for christmas decorations in boxes in your basement, you start with a list of all the boxes and as you look in each box, you tick it off the list. If you find more boxes inside the larger boxes, you can add them to the list.

Algorithm

The algorithm implemented in the function is called fill_shortest_path. It's helpful to have that code open while reading this explanation. This function doesn't directly find the shortest path, but rather, measures the distance from a starting location to other cells in the maze. We'll see how this information is used to generate the path later.

def fill_shortest_path(board, start, end, max_distance = math.inf):
  nboard = board.clone()
  nboard.clear_count(math.inf)

The "open node list" is a vector of positions in the grid. It contains all the locations we need to search for a path. We’ll initialize the list with the maze's starting location. We’ll also set the count to 0, since the start is a distance of zero.

nboard.at( start ).count = 0
    
open_list = [ start ]

That's enough to start the algorithm. We loop until the open_list is empty.

  while open_list:
    	cur_pos = open_list.pop(0)
    	cur_cell = nboard.at( cur_pos )

Wait, doesn't the algorithm say to take the node with the lowest distance? This code simply takes the next node on the list. The algorithm optimizes the node with the lowest distance node. It works even if you take a random node, even if it has a much higher time complexity. In our code it makes no difference, as the next node happens to be the one with the lowest distance. We'll see why that is later.

For each node, we look at all of its neighbors.

# (x,y) offsets from current cell
neighbours = [ [-1,0], [1,0], [0,-1], [0,1] ]
for neighbour in neighbours:
  ncell_pos = mpf.add_point(cur_pos, neighbour)
  if not nboard.is_valid_point(ncell_pos):
continue
  
  cell = nboard.at( ncell_pos )

Each item in neighbors is the Euclidean offset from the current cell to a neighbor. For example, [-1,0] is the neighbor to the left. If the cur_pos is [ 5, 7 ] then adding [-1, 0] to it yields [4, 7], which is one to the left. Likewise, [1,0] is moving in the positive x direction, which is one cell to the right. [0,-1] affects only y and is one position up, while [0,1] is one below.

By using a numeric abstraction of the directions, we can handle everything the same way in the algorithm. We can also add more directions, such as [1,1], which moves diagonally up and right. We must take the edges of the grid into consideration though, which is what is_valid_point does. For example, if we’re at the right edge of the grid, then the offset [1,0], which moves one to the right, is no longer on the graph, so we’ll skip that.

We’ll also skip any cells that isn’t empty, as these are the walls in the maze and we can't walk through them.

if cell.type != mpf.CellType.Empty:
  continue

The distance calculation comes next.

dist = cur_cell.count + 1

As we are always moving in a straight line, one cell away, you'll see references of the "Manhattan distance," which is the distance between two points when you’re only allowed to move in either x or y, and never both at the same time. The name comes from the city streets of its namesake, which are arranged in a grid-like fashion. You can't walk diagonally through buildings but are forced to stick to the streets. The Manhattan distance applies to Euclidean geometry, like the grid we have.

If this distance is less than the one from its neighbor, we’ll update the neighbor and add it to the open list.

if cell.count > dist:
  cell.count = dist
  cell.path_from = cur_cell
  open_list.append(ncell_pos)

We use the generic cell.count field to measure the distance, and will also update the path_from field, which indicates which path we took to get here. We'll talk about this later, as it's not technically of interest to the algorithm right now.

We said before, that the first node in the open_list is always the one with the lowest count value. It should be possible to see why that is now as each neighbor is exactly 1 away. The starting node has a count of 0, so we add several nodes with 1 to the open list. Now, because we process these in order, we're to add several 2s to the list next, until all the 1s are consumed. That leaves us with only 2s in the list. We’ll iterate over these and append 3s to the list.

This simple effect ensures an efficient solution. But as we expand on distance calculations and add heuristics, we will not be able to rely on this. This will be discussed in a future article. Don't worry though, the core algorithm doesn't change much. We end up using a priority queue, rather than a vector for open_list.

Visualize it

Let's take a moment to visualize this. We'll call fill_shortest_path from the distributed code.

import mortoray_path_finding as mpf
    
maze = mpf.maze.create_wall_maze( 20, 12 )
filled = mpf.tutorial_1.fill_shortest_path(maze.board, maze.start, maze.end)

finder = mpf.draw.Finder()
finder.set_board(filled)
finder.run()

Source File: tutorial_1_2.py

You'll see a window like below. The start cell is outlined in yellow and has a zero in it. The numbers increase, in increments of one, away from that spot. All cells in the grid are labeled with their Manhatten distance from the starting point. Locate the destination, which is outlined in green, and see how far away it is from the start.

Getting the path

We don't have the shortest path yet, but there are a couple of ways to get this.

We can find a path back to the start from the destination node by scanning the neighbors and picking the one with the lowest number. From that node, repeat the process until you get to the start. As the numbers are the distance to the start, following the lowest number is the shortest path back.

There might multiple options at several points. For example, you might be at 10 and have two 9's beside it. There are many equally valid paths.

An alternate approach is tracking the path as we fill in the distances. This is the code we ignored before.

if cell.count > dist:
  cell.count = dist
  cell.path_from = cur_cell
  open_list.append(ncell_pos)

Whenever we add a node to the open list, we also set cell.path_from. This keeps a record of the shortest incoming path, which makes backtracking from the destination easy.

def backtrack_to_start(board, end):
  """ Returns the path to the end, assuming the board has been filled in via fill_shortest_path """
  cell = board.at( end )
  path = []
  while cell != None:
    	path.append(cell)
    	cell = cell.path_from
   	 
  return path

We can add a call to this function in our main code so the window will display the path from start to finish.

import mortoray_path_finding as mpf
    
maze = mpf.maze.create_wall_maze( 20, 12 )
filled = mpf.tutorial_1.fill_shortest_path(maze.board, maze.start, maze.end)
path = mpf.tutorial_1.backtrack_to_start(filled, maze.end)

finder = mpf.draw.Finder()
finder.set_board(filled)
finder.set_path(path)
finder.run()

Source File: tutorial_1_3.py

A curious point here: the fill_shortest_path function calculates the distance from the start for every cell, not only the destination. We can use the same backtracking code to find the shortest path to any of the nodes. You'll find that many uses of pathfinding benefit from having this complete knowledge. However, if optimization was our priority, we'd adapt the algorithm to stop looking once it finds the destination. There's also an algorithm called A* that uses a heuristic to avoid scanning the entire map.

Conclusion

In this tutorial, we looked at how to find a path through a basic two-dimensional maze. The core algorithm tracks an open node list, measuring the distance to neighbors and updating shorter routes. This core logic is a flexible search algorithm. It is adaptable to many related problems, such as heuristic pathfinding, flood filling, and Voronoi diagrams.