Kinematics Dynamics and Control
Homework 2
Benjamin Stephens
Objective
Implement feedback control in simulation of a Segway-like robot using manual gain tuning, linear quadratic regulators, and dynamic programming.
Model
The Segway robot consists of two rigid bodies: a wheel and a vertical bar, shown in the figure below. There is a single input torque at the joint between the two bodies that applies a counterclockwise torque to the wheel and a reaction torque in the clockwise direction to the bar. The equations of motion are:

These dynamics make create a nonlinear discrete state
transition
function,

Manual Control Tuning
I would do this but it would be a waste of time. Basically, there are 4 gains to tune, the same produced by LQR. Hand-tuning the gains is time consuming and will unlikely produce better results than LQR, as far as optimal control goes. However, hand-tuning could be used if specific overshoot specifications are required.
LQR Design
LQR is an optimal control method for linear systems. Our system is nonlinear, but because it usually remains upright, we can linearize it about that point. The linearized discrete state transition function has the form
![]()
For a Segway with m1=1kg, m2=1kg, r=0.5m, L=1m and time-step DT=1e-2, the linearized matrices are
and 
The optimality of the control is determined by a cost
function,
, that is quadratic in the state and actions,
![]()
By choosing values for Q and R, the behavior of the optimal control is changed. For the Segway, the most important property is that the bar be held as close to upright at all times – because it represents a person. The Q and R matrices used and resulting K are
and
and ![]()
The figure below shows the Segway following a square wave between x=±1.

Dynamic Programming
Using dynamic programming, we don’t need to linearize the system. The value function determines the optimal control according to a given cost function,
![]()
The value function has stored values at discrete points in the state space. During calculation, we use continuous actions and states and use multi-linear interpolation to approximate the value function. The cost function used is identical to the cost function used by the LQR design above.
Controlling Motion from Point to Point
This controller uses a mixture of the controllers described above. First, the Segway starts from rest and transitions to a small tilt with a forward velocity. Then, it moves at this constant tilt and forward velocity until it reaches the goal. At this point, it transitions to a stop at rest. These three controllers are described in more detail below.
Find the Optimal Tilt Angle
Before this controller can be constructed, the desired tilt angle, δ, must be known. This is found through an optimization. The Segway is simulated for 1 second at a range of tilt angles using an LQR gain matrix. Whichever tilt angle has the lowest cost for the desired forward velocity is used. The graph below shows how this angle varies with different forward velocities.

Transition between
Rest and Tilt
Now dynamic programming is used to find an optimal policy
for this transition. Transitioning from
rest to tilt and tilt to rest are really the same problem if it is assumed that
the boundary conditions,
, are the same. For
calculations, the goal is defined as the resting state,
. The policy that is
generated applies to the tilt-to-rest transition. The policy for the rest-to-tilt transition is
just the negative of this policy. The
figure below shows a result.

Cruising
Now the Segway starts at the state
and moves to a goal
state
using an LQR
controller. The figure below shows this
controller working to keep the Segway moving at a forward velocity of 1 m/s at
its optimal tilt angle using the LQR controller.

Conclusions
This assignment has shown the value of LQR control and dynamic programming for feedback control of nonlinear systems. Controlling the Segway was easily achieved using LQR. Dynamic programming proved to be difficult due to the “curse of dimensionality.” Even with randomly sampled actions and interpolation, computation time, convergence and state resolution are problems. The results for the rest-to-tilt and tilt-to-rest transitions show that perhaps the most optimal trajectory is appears under-damped and oscillates around the goal. However, using the LQR controller results in an almost over-damped response, with the rise time and settling time are greatly lower. The figure below shows what happens when the two controllers are summed together. The dashed lines represent the LQR controller alone.
The code for this assignment can be downloaded here: segwaycode.zip
