Chapter 8. What is motion planning?

With the tools developed so far, we have developed a good understanding of robots at static poses, but we have not yet developed an understanding of how they move. Later we will cover the topic of dynamics and control, which will seek theories to model and regulate the velocities, accelerations, and forces acting on the robot to generate motion. But for the moment, we can still develop a great deal of understanding about robot motion using the kinematics we have developed so far. In this section of the book, we will consider the central question of motion planning, loosely defined as follows.

Motion planning: the use of computational methods to generate a robot's motion to achieve a specified task.

Motion planning is a critical component of any autonomous system that exhibits intelligent behavior. It plays a central role in the ecosystem of components of a robot, since the inputs to a planner include the robot’s state (coming from localization), environmental conditions (coming from perception), dynamic predictive models (coming from calibration and simulation), and task specification (coming from systems engineers and human supervisors), and the output of a motion planner is a low-level motion representation (sent to the controller). In short, motion planning governs how all of the knowledge available to the system influences all actuation decisions.


Figure 1. The role of planning is to consume mission objectives, perception outputs, and prior information to generate a representation of the robot's motion. The motion will be executed by the controller.

The three inputs into a motion planner are the system state, the system model and the task specification. For example, in the context of autonomous vehicles, the system state may specify the vehicle’s state, the map around it, and the state and intentions of other agents. The system model predicts how the system state changes in the future, conditioned on the robot’s decisions. The task specification includes cost function(s) that measure the quality of candidate motions as well as constraints that the motion must meet. For example, an autonomous vehicle may wish to maximize ride comfort while also remaining collision-free, staying within a lane, etc.

In Chapters 9 and 10, we will focus mainly on the problem of kinematic motion planning (aka path planning), defined as follows:

Kinematic motion planning (aka Path planning): the use of computational methods generate a continuous path between two robot configurations while respecting joint limits and avoiding obstacles.

This is purely a geometric problem that sidesteps the complexity of timing and dynamics. The resulting path is much like a route planned on a map, which instructs you the sequence of routes and turns to take to reach the goal, but not how quickly to drive. Although this doesn't solve the motion planning problem in its entirety, the resulting algorithms are still quite useful for robot manipulators and high-level routes for mobile robots. Indeed, the techniques studied for this simplified problems are often used in practice because it is relatively straightforward to construct schemes for some robots (e.g., industrial robots and differential drive mobile robots) to track a geometric path. Also, by studying the kinematic problem we introduce computational representations, algorithms, and complexity theorems used ubiquitously in robot decision making.

Highly dynamic robots like autonomous cars and agile drones will require reasoning about dynamics upfront during the planning procedure. We will discuss how to plan for these systems in Chapter 11. Later will revisit the problem of dynamics and uncertainty in Section IV.

In the final chapter of this section we discuss specialized applications of planning to multi-agent systems, planning for manipulation and legged locomotion, and integrating motion planning with high-level reasoning.

Why is planning hard?

Because it is natural for humans to navigate their environments (at least outside of hedge mazes!) the motion planning problem at first glance may appear to be somewhat straightforward. Our brains quite naturally build mental maps of the space of possible movements when looking at obstacles around us. However, this spatial reasoning is nontrivial for robots, which must break down the navigation task into a sequence of low-level actions.

Initial attempts to solving this problem might look something like a script: move forward toward the goal until it is reached or an obstacle is encountered; if an obstacle is reached, then move left, then turn right once the obstacle is cleared, and repeat from step 1, etc. Although scripting behaviors may work with very simple obstacles, this approach becomes fraught with brittleness once obstacles become more varied or complex.

More sophisticated deliberative reasoning is needed to successfully navigate in more complex environments, like indoor navigation or outdoor terrain. In order to do so, the robot must be able to hypothesize various future states that it might be in, and actions it might take. Planning is somewhat like playing a game of chess, in that you imagine possible future states of the board as a result of various actions that you might take. This cannot be done efficiently in a naive fashion. If $n$ steps were needed to reach a goal, and $m$ actions were available at each step, then there would be $m^n$ possible future action sequences! This can rapidly become huge, even for small values of $m$ and $n$ (not to mention the fact that robots can perform a continuously infinite number of actions.) As a result, we must be more careful about how to structure the deliberative reasoning process.

Stating the motion planning problem

Let us assume the robot has configuration space (C-space) $\mathcal{C}$. A motion planning problem is specified with the following inputs:

  • A kinematic and geometric description of the robot

  • A geometric description of the robot's environment

  • Constraints on how the robot is allowed to move

  • The robot's start configuration

  • A description of the robot's goal (usually a configuration, a point in task-space, or a subset of configuration space)

  • Optionally, an optimality criterion that should be minimized or maximized (such as minimum path length, maximum clearance from obstacles, or minimum energy expended)

A feasible path is a continuous curve in C-space $y(s):[0,1]\rightarrow \mathcal{C}$ that does not violate any constraints. A solution path is a feasible path beginning at the start configuration and ending at the goal. If an optimality criterion is provided, then a solution path should also be optimal (e.g., minimize a cost function) or at least close to optimal.

There are numerous types of motion constraints that may be specified, which can be categorized by the manner in which they constrain the robot's path:

  • Local (kinematic): each point along the path must satisfy some condition, e.g., avoid collision with obstacles.

  • Differential (dynamic): the derivative along the path must satisfy some condition, e.g., bounded curvature.

  • Global: the entire path must satisfy some condition, e.g., path length must be within some bound.

When both kinematic and dynamic constraints are present, this is known as a kinodynamic planning problem.

(a) Local (kinematic) (b) Differential (dynamic) Global
fig:PlanningConstraints1 fig:PlanningConstraints1 fig:PlanningConstraints1
Figure 2. Illustrating the three classes of constraints. Local constraints include collisions with obstacles. Differential constraints limit the derivative of the robot's path. Global constraints are imposed on the entire path, e.g., the robot must return to its base before its battery is exhausted.

Point-to-point path planning

In the most basic path planning problem, which is the topic of most of the next chapters, we consider problems that are point-to-point, feasible, and kinematic. In other words, the motion constraints include only obstacle avoidance and joint limit avoidance, the robot's goal is a single configuration, and there is no optimality criterion.

In path planning we can speak of a subset of C-space that contains all configurations that do not violate any motion constraints. This is known as the free space $\mathcal{F} \subseteq \mathcal{C}$ defined such that $$\mathcal{F} = \{ q \in \mathcal{C} \,|\, q_{min} \leq q \leq q_{max} \text{ and } R(q) \cap E = \emptyset \}.$$ Here, $q_{min}$ and $q_{max}$ are the lower and upper joint limits, respectively, $R(q) \subset \mathbb{R}^2$ or $\mathbb{R}^3$ is the subset of workspace occupied by the robot, and $E$ is the subset of the workspace occupied by the environment.

The complement of the free space is known as the forbidden space $\mathcal{C} \setminus \mathcal{F}$. Our problem can be restated as finding a continuous curve $y(s)$ such that:

  • $y(s) \in \mathcal{F}$ for all $s \in [0,1]$ (feasibility)

  • $y(0) = q_I$ (initial configuration)

  • $y(1) = q_G$ (goal configuration).

It may be somewhat difficult for a computer to reason about all possible curves, since a curve is a collection of an infinite number of points. Instead, it will be necessary to represent paths in a finite number of parameters. We will describe various path representations below.

Completeness and optimality

A motion planner is an algorithm that solves a class of motion planning problems. Motion planners are characterized by the class of problems addressed, as well as their performance in solving them.

The first question to consider is that of completeness: "will the planner find a path (if one exists)?" Specifically:

Complete planner. A planner is complete if it finds a path in finite time if one exists, and it terminates with "no path" when no path exists.

We shall see in the next section that it is difficult to develop complete planners outside of relatively simple scenarios. A relaxed version of completeness is resolution-completeness, in which the planner is guaranteed to find a path if one exists as some resolution parameter $h$, say, the size of a grid cell, approaches 0. However, if the planner returns "no path" for a given value of $h$, then we cannot tell whether there truly exists no path, or $h$ was not set small enough.

Resolution-complete planner. A planner is resolution-complete if it accepts a resolution parameter $h>0$, such that if a path exists, it will find one in finite time given a value of $h$ made sufficiently small.

Another relaxed version is probabilistic completeness, in which the likelihood of failing to find a path, when one exists, approaches 0 as more time is spent planning. If such a planner returns "no path" after some amount of time, then we cannot tell whether there truly exists no path, or not enough time was spent planning to yield a high chance of finding a path.

Probabilistically-complete planner. A planner is probabilistically complete if, when a path exists, then the probability that it fails to find one decreases toward 0 as more time is spent planning.

The other property to consider that of optimality: "will the planner find an optimal path?"

Optimal planner. An optimal planner terminates in finite time and computes the optimal path, if one exists.

Hence, optimal planners must also be complete, but the converse is not necessarily true. Given the difficulty of developing complete planners, we also consider relaxed versions of optimality.

An approximately-optimal planner will terminate with a path whose cost is no more than $(1+\epsilon)$ times the optimal cost, where $\epsilon > 0$ is some constant. An asymptotically-optimal planner will, given enough time, find a path whose cost is within a factor of $(1+\epsilon)$ times the optimal cost for any specified value $\epsilon > 0$. In other words, given more time, the planner will produce progressively better paths.

Explicit vs implicit plans (paths vs policies)

The classical form of planning described above computes a single path (or declares "no path found"). The planned path is then sent to a controller (described in more detail in Section IV) which will then try to execute the plan by generating actuator commands to follow it.
Over the course of execution, the robot's state will trace out an executed path, which may or may not be close to the original plan. For some systems and situations, everything goes "according to plan" and the robot will follow the path faithfully, and predictions about the world's behavior are accurate (Fig. 3). But in many settings, trusting the plan is risky.

(a) Imperfect plan execution (b) Imperfect prediction about other agents in the world
fig:ImperfectExecution fig:ImperfectPrediction
Figure 3. A planned path may differ from an executed path. (a) The robot's execution (dotted line) does not track the plan (solid line) well, triggering a replan that requires it to back up to pass through the obstacles. (b) The robot executes its path perfectly, but its initial plan predicts that the pedestrian will continue walking (dotted line). When the pedestrian stops, the robot must detect that the plan is invalid, and take an alternate path.

We shall cover two methods for planning under imperfect execution / prediction in Chapter 11. The first idea addresses the problem at the level of the sense-plan-act loop by frequently replanning. Replanning can happen either on a fixed schedule or when the robot deviates from the plan; if it can be done fast enough, then the robot can respond to changes in the environment before a catastrophic failure.

The second idea is to modifying the planning problem so that it generates a policy, which is a sensor-based process that is designed to respond to changes in the robot's state or the environment. We will discuss policies in more detail in Chapter 11) and when we cover control in Section IV. To plan a policy, the robot must be able to anticipate when and how execution can go wrong, and determine contingency plans for all possible outcomes. Suffice to say, finding complete / optimal policies is at least as hard as finding complete / optimal paths.

Heuristic methods

As a counterpoint to the more principled approaches that we will begin to study in the next chapter and beyond, it is worth mentioning two heuristic methods that are often used in practice. By "heuristic" it is meant that these methods are often useful to achieve satisfactory performance, but offer no guarantees about optimality or success.

Behavior scripting

In behavior scripting, the control signal sent to the robot's actuators is a direct function of its input sensor signals, which is programmed as a body of procedural code; e.g., a set of arithmetic operations and if-else statements. This programmed logic becomes the body of the "planner", which does not actually explicitly plan anything. Behavior scripting has the appeal in that it provides novices with an easy pathway to start experimenting with robot behaviors. For example, to follow a black line on a sheet of white paper, two brightness sensors can be used, one to the left and the other to the right of the line. If the left sensor reads "black", turn the wheel more to the left and reduce the forward velocity. If the right sensor reads "black", turn the wheel more to the right and reduce the forward velocity. Otherwise, turn the wheel toward center, and increase the forward velocity. Properly tuned, this behavior will do a reasonable job at following lines that are not too curved at a slow speed.

A more complex behavior script can include some notion of memory that is maintained from step to step, such as a list of objects seen or past actions taken. A common form of memory is used in a state machine, where the behavior switches between discrete modes depending on the sensor input or past actions taken. In a pure state machine, the only thing that needs to be remembered is the index of the current state. State machines will be discussed in more detail in Chapter 16.

With a bit of ingenuity (and a great deal of testing and tuning), engineers can produce behavior scripts that encode sophisticated behavior. There is some degree of pride in crafting such behaviors, with the intelligence of the engineer manifesting itself in the appearance of intelligent behavior. However, the challenge of this approach is that as the problem domain becomes more complex, the complexity of the program's logic (and the corresponding difficulty of programming good behavior) tends to grow exponentially.


A straightforward way of performing motion planning is a three step process:

  1. Generate a set of $N$ paths $p_1,\ldots,p_N$, each of which starts from the robot's current state $x$.
  2. Score each path using a score function $S(p)$. Let $p_i$ be the path with minimum score.
  3. Execute the first part of $p_i$, and then repeat from step 1.

Unlike behavior scripting, this generate-and-score approach is indeed a true planner, because the Generate step predicts how the robot will behave and the Score step can perform collision checks and handle other constraints.

A huge range of behavior can be obtained by tweaking the path generation and scoring methods. As an example, a random-path generation approach simply generates $N$ paths at random, and the scoring function will assign infinite cost to any infeasible path. For feasible paths, the score will measure a combination of the length of the path and the distance from the end of the path to the goal. What we can observe with this approach is that its success is highly problem-dependent. If the robot is traveling in a wide open space or around small obstacles, then it works fairly well, although there is a little jittering.

This jittering can be reduced by applying some smoothing in postprocessing. Or, rather than generating random paths, we can use a library of motion primitives that are the same at each iteration, slightly modified to start at the robot's current state. For example, a mobile robot can move forward, move backward, turn right, turn left, or perform swerving maneuvers left and right.


Figure 4. A generate-and-score approach for a mobile robot may use a set of local motion primitives, defined in the robot's local frame (egocentric coordinates). The scoring function will check collisions with each primitive, and will choose to execute the primitive that makes the most progress toward the target (restricted to its collision-free subpath).

This approach can be very computationally light, and even effective at solving some simpler problems. However, because the generated paths usually do not reach all the way to the goal, this approach assumes that the problem requires minimal amounts of long-range deliberation. For example, consider the problem where the robot must exit a room before making progress to the goal. Because the robot must sacrifice progress before making progress, it will be incredibly unlikely to find a path out of the first room, either using random paths or motion primitives. This is known as a local minimum, which will be a reoccurring issue throughout robotics.

Representations of paths

A path is a continuous curve

$$y : [0,1] \rightarrow \mathcal{C}$$

interpolating between initial and goal configurations $q_I = y(0)$ and $q_G = y(1)$. There are as many ways to represent paths as there are ways to encode functions, but in robotics we typically use a handful of representations. The most common variants are described here.

Piecewise-linear paths

For most of the motion planning chapters, we assume that a path is represented as a piecewise-linear curve, or for non-Euclidean configuration spaces, a piecewise-geodesic curve in C-space.

This representation is a set of $n$ connected line segments, simply given a list of $n$ configurations $q_0,\ldots,q_n$ satisfying $q_0 = q_I$ and $q_n = q_G$. These configurations are often known as milestones or waypoints. The interpolating curve is given by

$$y(s) = interp(q_i,q_{i+1},sn-i)$$

where $i = \lfloor sn\rfloor$ is the zero-based index of the segment addressed by the parameter $s$.

This representation is quite versatile, as all paths can be approximated arbitrarily well by a piecewise linear path with sufficiently large $n$. Also, path evaluation takes $O(1)$ time. However, derivatives at each milestone are undefined, which means potentially jerky motions and difficulty handling system dynamics.

In [2]:
#Code for the below figures on piecewise linear curves↔


Another common representation is a degree $d$ polynomial

$$ y(s) = poly(\mathbf{c}_0,\ldots,\mathbf{c}_d;s) = \sum_{i=0}^d \mathbf{c}_i s^i $$

where $\mathbf{c}_0,\ldots,\mathbf{c}_d \in \mathcal{C}$ are the polynomial coefficients. This is a parametric representation in which each component of the motion is represented by an independent 1D function. Hence, in 2D the y component of a curve may not be, in general, a proper function of x, even though both x and y are proper functions of u.

In [4]:
#Code for the figure below, showing the parametric nature of multivariate polynomial curves↔

A polynomial is infinitely smooth, but in order to represent complex paths, the polynomial must be of high degree. High degree polynomials often lead to numerical instability, in which small changes to the highest degree terms cause the curve to change wildly. The coefficients are also difficult to choose, as they can be fairly unintuitive.


Splines are a cross between polynomials and piecewise linear curves that gives the "best of both worlds" in terms of the flexibility of piecewise linear curves but also the smoothness qualities of polynomials. They consist of a sequence of piecewise polynomials, all of low degree. The representation of the path is given by a set of control points which control the shape of the curve. From these points, polynomial coefficients are derived in order to enforce continuity.

The general form of a spline with $n$ segments and degree $d$ is given as follows

$$y(s) = poly(C_i;sn-i)$$

where $i=\lfloor sn \rfloor$ is the segment index and $C_i=(\mathbf{c}_{0,i},...,\mathbf{c}_{d,i})$ is the set of polynomial coefficients for the $i$th segment. We do not typically store the polynomial coefficients themselves, but instead represent a list of control points $\mathbf{p}_1,\ldots,\mathbf{p}_k$ from which the coefficients of each polynomial are derived. This is because it is difficult to set up coefficients manually such that the spline is continuous and smooth. Modifying control points is far easier and more intuitive. The method for converting control points to coefficients is the interpolation scheme.

Typical splines are degree 3 or 5, which are known as cubic or quintic splines, respectively. (Degree 1 splines are actually piecewise linear curves.) These degrees are popular because they allow control point interpolation schemes to dictate the order of the spline's smoothness (derivative continuity).

Cubic Hermite splines

The simplest spline is the cubic Hermite spline, which defines the control points to include milestones and the derivative of the spline at each milestone. The Hermite interpolation scheme solves for the polynomials to enforce continuity and derivative continuity (C1 continuity).

In [7]:
# Code for the figure below, showing a parametric cubic Hermite curve↔

Specifically, suppose the segment travels from milestone $\mathbf{m}_0$ to $\mathbf{m}_1$ as the interpolation parameter travels from 0 to 1. We wish to find a cubic polynomial segment that respects the initial tangent $\mathbf{t}_0$ and the terminal tangent $\mathbf{t}_1$. In other words, the terminal conditions are

$$\begin{split} \mathbf{m}_0 &= poly(C_i;0) \\ \mathbf{m}_1 &= poly(C_i;1) \\ \mathbf{t}_0 &= \left. \frac{d}{du} poly(C_i;u) \right|_{u=0} \\ \mathbf{t}_1 &= \left. \frac{d}{du} poly(C_i;u) \right|_{u=1} \end{split} $$

Letting the coefficients $C_i =(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{t})$, we have

$$ poly(C_i;u) = \mathbf{a} + u \mathbf{b} + u^2 \mathbf{c} + u^3 \mathbf{t} $$$$ \frac{d}{du} poly(C_i;u) = \mathbf{b} + 2 u \mathbf{c} + 3 u^2 \mathbf{t} $$

so immediately, we see that $\mathbf{a} = \mathbf{m}_0$ and $\mathbf{b} = \mathbf{t}_0$. We now need to solve for $\mathbf{c}$ and $\mathbf{t}$ such that the linear equations

$$\mathbf{m}_1 = \mathbf{m}_0 + \mathbf{t}_0 + \mathbf{c} + \mathbf{t}$$


$$\mathbf{t}_1 = \mathbf{t}_0 + 2 \mathbf{c} + 3 \mathbf{t}$$

hold. To do so we rearrange the above into the matrix equation

$$\begin{bmatrix}\mathbf{m}_1 - \mathbf{m}_0 - \mathbf{t}_0 \\ \mathbf{t}_1 - \mathbf{t}_{0} \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} \mathbf{c} \\ \mathbf{t} \end{bmatrix} $$

Which has the solution via 2x2 matrix inversion

$$\begin{bmatrix} \mathbf{c} \\ \mathbf{t} \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 2 & 3 \end{bmatrix}^{-1} \begin{bmatrix}\mathbf{m}_1 - \mathbf{m}_0 - \mathbf{t}_0 \\ \mathbf{t}_1 - \mathbf{t}_{0} \end{bmatrix} = \begin{bmatrix} 3 & -1 \\ -2 & 1 \end{bmatrix} \begin{bmatrix}\mathbf{m}_1 - \mathbf{m}_0 - \mathbf{t}_0 \\ \mathbf{t}_1 - \mathbf{t}_{0} \end{bmatrix} . $$

(Check that the inverse is correct by inspection.) In other words, $\mathbf{c} = - 3\mathbf{m}_0 + 3 \mathbf{m}_1 - 2\mathbf{t}_0 - \mathbf{t}_1$ and $\mathbf{t} = 2\mathbf{m}_0 -2 \mathbf{m}_1 + \mathbf{t}_0 + \mathbf{t}_1$.

Another convenient way of expressing the segment is:

$$poly(C_i;u) = (2u^3 - 3u^2 + 1) \mathbf{m}_0 + (-2u^3 + 3u^3) \mathbf{m}_1 + (u^3 - 2u^2 + u)\mathbf{t}_0 + (u^3 - u^2) \mathbf{t}_1.$$

Quintic Hermite splines

The quintic Hermite spline defines control points to include milestones and their 1st and second derivatives. The interpolation scheme enforce continuity of first and second derivatives (C2 continuity).


  1. There is a crucial difference between planning to exit a maze and actually escaping a maze. What is it? (Hint: consider being blindfolded, kidnapped, and then dropped in the middle of a maze. The blindfold is now removed. How does this differ from the information available to a planner?)
  2. Consider a coffee delivery robot, whose goal is to move a coffee cup from a counter to a patron's table. Does this problem require deliberative planning, or could it be solved using a behavior scripting technique? What assumptions could you build in about your environment to help a behavior script? If those assumptions were not built in, that is, the robot is expected to act in a bustling coffee shop, how much forward prediction does it need to perform?
  3. Earlier, we said that if a planner is asymptotically optimal, then "given more time, it will produce progressively better paths." While this statement is true, is the converse also true? Is there a case in which a planner could produce progressively better paths, but not satisfy the conditions of asymptotically optimality?
In [ ]: