Chapter 2. Anatomy of a Robot

A robot can be thought of as a computer on wheels: it is distinguished from a traditional computer by its physical, movable "body", and is distinguished from a traditional machine by its computational "brain."

The body of a robot moves in some 2D or 3D space, by means of applying forces to the robot's own body or its environment. The body can have a wide range of forms, ranging from a ground vehicle with wheels to a complex humanoid robot. Most robots are usually made of metal or some other rigid material, where rigid sections known as links are connected by movable joints at their articulation points.

Link. A rigid component of a robot, often attached to other links via joints.

Joint. An articulation point between two robot links.

The "brain" of a robot consists of one or more computers running software that process inputs from its sensors and computes outputs that drive the robot's actuators. Communication between the processor, sensors, and actuators is accomplished via wires or sometimes wireless communication.

Each of these components is analogous to some component of a biological system. Links are analogous to bones; joints are analogous to, well, joints; sensors are analogous to sensory organs; actuators are analogous to muscles, and wires are analogous to nerves. Just as there are millions of biological species with diverse, sometimes exotic external and internal structures, the range of possible robot designs is endless.

fig:RobotAnatomya fig:RobotAnatomyb
Pioneer mobile robot Honda Asimo humanoid robot
1 link (body), 0 joints 72 links (incl. fingers), 26 joints
Actuated by wheels (differential drive) Actuated by electrical motors + speaker
Sensors: SICK laser, sonar, wheel encoders Sensors: stereo cameras, joint encoders, gyroscope, inertial measurement unit, foot pressure sensors
Figure 1. Two robots with very different "anatomy".

The capabilities of robots have been changing fluidly over the years, as researchers invent new materials, fabrication strategies, circuits, computers, sensors, and actuation devices. Indeed, keeping up with the state-of-the-art can be exhausting!

Like the bones of an animal, a robot's links offer structural support and protection of internal parts. Most often, some sort of metal is used to provide strong structural support, although some robots use other materials like wood, plastic, or composites. Surfaces that are meant to interact with the environment should be shaped appropriately for the desired task, and may be coated in rubber or another high-friction material to ensure consistency of contact.

Joints in robots typically either allow rotation or translation. Rotation usually occurs about a single hinge-joint axis. Ball-and-socket joints like in the human hip or shoulder are much rarer due to the difficulty of fabrication, lubrication, and actuation. Translational joints are fairly common in the form of linear slides used in gantries and Cartesian positioning stages, and in pistons used in many hydraulic and pneumatic robots. To ensure precise movement and durable operation, the joint should be constructed from high-strength materials and high-precision manufacturing. Friction reduction along joints is usually provided using lubrication, bearings, or bushings.

The links and joints of a robot also make up its outward appearance, and for some applications, significant effort is devoted to designing a pleasing appearance. To an engineer this may seem like a frivolous concern, but the success of a consumer product often rests on the appeal of its design! Robots that are designed for direct human interaction, such as haptic devices or exoskeletons, should also be designed with ergonomic concerns in mind.

It should be noted that although links are mostly rigid, all materials do flex and resonate to some extent. This can be problematic for high-precision applications. Flexibility can also be exploited for certain applications, such as impact-tolerant end-effectors. Soft rubber is often used in robot fingertips to encourage a larger surface area to be in contact. Moreover, some researchers have developed soft robots whose bodies are designed explicitly to be flexible, like octopuses or elephant trunks. For flexible links and soft robots, the traditional link-and-joint modeling framework that we will cover in depth in this book does not directly apply. Instead, more complex continuum mechanics models should be used to analyze and predict their behavior.


Sensors are devices that measure some physical quantities of the world and encode the signals into electrical (typically digital) form. There are many kinds of sensors, and more are being invented every year. Common sensor types in robotics are given below.

Type Examples Purpose
Proprioceptive joint encoders (relative or absolute), joint limit contact switches Measure the position of a link relative to another.
Inertial gyro, accelerometer, IMU Measure the acceleration, velocity, or position of a link.
Tactile contact switch, force/torque sensor (load cell), scale, pressure sensor, tactile array Detect external forces on the robot's body.
Visual monocular camera, stereo camera, laser rangefinder, depth sensor Measure light intensity from the robot's surroundings. Some sensors estimate distance from the device to objects in the world via triangulation or time-of-flight.
Other GPS, sonar, temperature (especially motor temperature), gas sensors, battery voltage

Most of the time, sensors are mounted to a link of the robot, and move while the robot moves. The raw data produced by the sensor is often not particularly useful in itself (e.g., cameras will give you pixel colors rather than object identities), and perception algorithms must process this data into meaningful quantities. It is important to build accurate models of the physics behind the sensor (and methods for "inverting" these physics) in order to help a robot correlate sensor signals into an understanding of the physical world.

The position and orientation of the sensor on the link is extremely important to be able to make sense of sensor data. A laser sensor may be saying that an obstacle is 1m away, but is the obstacle ahead, behind, to the left, or above the robot? How close is too close before the robot ends up colliding? We will look into this issue further when we discuss calibration.

Actuators and Transmissions

An actuator is a mechanism that generates a force / torque given electrical signals; a transmission is a mechanism that applies an actuator's forces / torques to the robot's links. Most commonly on industrial robots, actuators are electric motors while the transmissions are some sort of gearing system (e.g., gearbox). On mobile robots, the transmissions from actuators to the ground are actually the wheels (or tracks). It is also common to think of (and purchase) the actuator and transmission as a unit, such as with servomotors.

Common actuator types include:

  • Electric motors: stepper motors, brushed / brushless DC motors, AC motors.

  • Pneumatic actuators. High variation in size and strength.

  • Hydraulic actuators. Very strong.

  • Chemical (combustion) actuators: internal combustion engines in automobiles and prop planes; turbines in jet planes.

  • Piezoelectric actuators.

Actuators can be rotary (revolves around an axis) or linear (translates about an axis). Each type and model of actuator has many subtle differences that should be taken into account when fine-tuning a robot's design, e.g., in the power source, power / weight ratio, control bandwidth, maximum speed, stroke length, continuous / peak torque, complexity of electronics and maintenance, etc. Most often, a roboticist will choose between electric, pneumatic, and hydraulic motors. Electric motors come in a plethora of models with high variation in size, strength, and speed, and are convenient to use. Pneumatic actuators have a higher power-to-weight ratio, and lightweight pistons can be placed on extremities far from heavy pumps using air lines, but require maintaining complex pump / valve systems. Hydraulic actuators are stronger and heavier, and have similar drawbacks.

There are also a variety of problems that arise when an actuator is pushed beyond its safe operating limits. Electric motors burn out when the temperature produced by the motor cannot be dissipated quickly enough, and which burns through the insulation on the motor's wiring. Pneumatic and hydraulic systems can blow seals where the internal pressure overcomes the strength of the seals and gaskets on the pneumatic / hydraulic lines or cylinders. In hydraulic systems, leaks can be quite messy!

Transmission systems can be quite diverse. Common forms of transmission include:

  • Gearing: gearboxes, lead screws, harmonic drives. Can convert rotary to linear motion and vice versa. Must be located directly at the joint.

  • Cables: like tendons in biological systems, cables can connect a motor to a far-off link. Usually outputs linear motion, and can only pull.

  • Wheels / tracks: used for mobile robots. Can be arranged in differential drive, skid-steer, and Ackermann steering arrangements. Omnidirectional (Mechanum) wheels are available on some platforms.

  • Pneumatic / Hydraulic lines: connect a pump / valve system to a pneumatic / hydraulic cylinder.

  • 4-bar linkages: static bars connected by joints, used to convert rotary to linear motion and vice versa.

  • Flywheels: use changes of angular momentum to control orientation, used primarily in space robotics.

  • Propellers, aelierons, and control surfaces: deflect a surrounding gaseous / liquid medium to propel the robot, used primarily for aerial and water vehicles.

  • Thrusters: eject mass at high speed to exert force using Newton's Law, used primarily in space robotics.

  • Feet, hands, wings, etc: bioinspired methods to touch the environment / deflect the surrounding medium to propel the robot.

When designing or choosing a transmission systems, a designer must be cognizant of the:

  • Gear reduction: force multiplier from input to output.

  • Backlash: a dead-band about which force is not productively generated when the motion of direction is switched.

  • Internal friction: losses of motive force.

  • Backdrivability: the ability for forces on the output shaft to be reflected on the input shaft. High backdrivability implies low backlash and low internal friction.

  • Longevity and durability: cheap gear teeth may wear down, materials in cables may exhibit creep (stretch), and tires wear down over time. Transmission systems are often the first parts of robots to break!

Cognitive Architectures

The title of this section is a bit misleading: robots don't perform "cognition" as humans and animals do. However, they do indeed need to process sensor information and make decisions about how to act. This processing is accomplished via one or more computers, rather than biological brains. The nature of a robot's computing hardware is relatively unimportant to its behavior, as long as there are sufficient computing resources available. Instead, what has greater impact is the form and function of information processing elements, which is referred to as a cognitive architecture.

(The terminology used in this chapter is highly influenced by Chapter 2 in Stuart and Russell's excellent textbook Artificial Intelligence: a Modern Approach.)

Sense-plan-act framework

A cognitive architecture resides in the robot's mind, which controls the robot's body, which interacts with the outside world. The body and external world form the information processing environment of the mind. (Note that the mind cannot alter the outside world directly: in other words, no telepathy is allowed!) The role of a robot's mind is to receive sensor information (sense), do some processing (plan) and to output commands to the body's actuators (act). This perspective is also known as the sense-plan-act framework, and is illustrated in Fig. 1.


Figure 1. The sense-plan-act framework structures how the robot's "mind" processes information from its sensors (Sense), calculates an action (Plan), and then performs the action using its body (Act).

Imagining one's brain severed from one's body at the spinal cord, the sense-plan-act framework views the brain's role as processing the input and output that pass through the nerves at the spinal cord. The brain is a highly developed organ that processes information, makes decisions, learns, and adapts in such a way to generate the infinite richness of human behavior. When we program a robot's software, we likewise control the contents and functions of its mind. The question is, how do we construct a mind so that the robot behaves intelligently (or at least, usefully)?

The notion of "useful task" is defined external to the robot itself, and is selected by the robot's designer. We will discuss techniques and issues with measuring performance in later chapters on system engineering; for now we shall assume there is some existing quantifiable performance metric, like tracking a curve as closely as possible, or having a low risk of colliding with pedestrians, etc.

First, we will consider a simple reflexive architecture, in which the inputs are converted to outputs via a set of fixed operations (rules). Examples of this architecture include servomotors, automatic doors, some factory automation components, and other simple electromechanical devices. The input processing can be sophisticated, such as those found in camera autofocus systems. We will cover other sensor processing methods in Section V. Often, simple reflexive machines are not even thought of as "robots" because they operate only on the data that is immediately provided to them, and do not derive any richer understanding.

In the model-based reflexive architecture, the robot estimates the current state of its environment using history. State represents an understanding of the robot's body and world, which may include information outside of what is currently observed. To implement such an architecture, the robot needs both memory of previous state(s) and a model of how its environment changes — that is, how the world behaves on its own and how its body interacts with the world. The architecture contains a state estimator that performs continual state updates. However, it still chooses its actions by fixed rules as a function of its state. Nevertheless, high-performance behavior can be obtained by 1) an accurate model, 2) an accurate state estimator, 3) carefully designed rules. Examples of this architecture include automatic stabilization for quadcopters, aircraft autopilots, and driver assistance systems in modern automobiles (e.g., automatic braking and lane keeping). We will cover dynamic models in detail in Section IV and state estimation in Section VI.

The next level of architecture complexity adds prediction and deliberation into the fold. A deliberative architecture uses a predictive model to estimate how its future actions alter the environment as well as a performance metric to choose which course of action is the best. To generate possible future actions in a systematic way, a planner computes many potential options and their outcomes. By maximizing performance, a deliberative architecture is designed explicitly to achieve some notion of optimality. (Contrast this with a reflexive device, which can perform well only by coincidence or shrewdly designed rules.) These techniques can generate excellent behaviors if 1) the predictive model is accurate, 2) the planner discovers a high-quality sequence of actions quickly, and 3) the metric encoded by the robot is truly a good measure of performance (as judged by human observers). Section III covers planning in detail.

Finally, a learning architecture incorporates sensor data over time to automatically adapt components of its predictive models, state estimators, and/or planners. The amount of learning can be relatively small, such as auto-calibration of a robot's sensors to correct for sensor drift, or large, such as reinforcement learning to adapt the robot's behavior according to observed outcomes. Section VI discusses machine learning, calibration, and reinforcement learning techniques to be used in such architectures.

Perception, planning, and control; hierarchical architectures

The Plan component of the Sense-Plan-Act framework above can typically be broken down into three components:

  • Perception. From sensor data and memory of the past, construct a representation of all the knowledge the robot has available about itself and its surroundings. When there is uncertainty, either construct a best guess, or explicitly model how much uncertainty is available.

  • Planning. Use the robot's knowledge representation and dynamics models to decide what the robot should do now and in the future. Outputs reference motions or feedback policies.

  • Control. At a high feedback rate, execute the reference motions generated by planning by outputting actuator signals. Use immediate sensor feedback to correct for minor disturbances.

In a complex intelligent robot, such as an autonomous vehicle, these components are individually quite sophisticated. To optimize the behavior of each component requires deep knowledge, skills, and effort on the part of its engineering team.

Although these components are described as though they would be separate stages of a pipeline, they actually interact at multiple levels based on the timing requirements of behavioral decisions. The planning component is usually broken down so that mission-level decisions are made at the highest layer, which are then broken into subtasks that should be achieved on the order of seconds to minutes, which are then finally broken down into continuous reference motions at sub-second granularity. In some sense, the control component is the lowest level of the planning hierarchy.

The perception component may also be hierarchical as well. Some sensor signals, like joint encoder values, are processed minimally to be used by the controller. Others, like vision data, are processed with computationally expensive, relatively slow operations like map-building and object recognition so that mission planning can make significant decisions about the robot's behavior.

The end-to-end approach

In contrast to the perception-planning-control pipeline, the philosophy of end-to-end behavior attempts to compress the entire Plan component into a single monolithic function. Because robot behavior is usually too complex to tune by hand, this philosophy is usually approached via the perspective of end-to-end learning, where the robot itself adjusts the behavior of the Plan component to achieve better outcomes. Recently, this is becoming more a plausible approach given the success of deep neural networks in computer vision and related machine learning fields.

However, there is still a long way to go before end-to-end robots are practical, and in fact there are no commercially-significant robots in the world today that are developed using the end-to-end approach. Why might this be the case? One may be that human scientists and engineers have developed exceptionally precise models for robot behavior that can't be matched using model-free approaches. Another reason may be that the development process for the classical pipeline does a better job of exploiting the ingenuity of human engineering teams by imposing clear roles, responsibilities, and performance metrics for each component. Indeed, it is very hard to determine why a system built entirely on machine learning made a mistake, which makes debugging and development much more difficult. A final reason could be that the machine learning community simply does not yet have the right model for robotic cognition, and some breakthrough awaits just around the corner! (Personally speaking, I enjoy the robot engineering process, and hope for the sake of myself, my colleagues, and my students that this latter scenario does not occur any time soon.)