Kinematics Dynamics and Control

Homework 2

 

Benjamin Stephens

 

Objective

 

Implement feedback control in simulation of a Segway-like robot using manual gain tuning, linear quadratic regulators, and dynamic programming.

 

Model

 

The Segway robot consists of two rigid bodies: a wheel and a vertical bar, shown in the figure below.  There is a single input torque at the joint between the two bodies that applies a counterclockwise torque to the wheel and a reaction torque in the clockwise direction to the bar.  The equations of motion are:

          

 

These dynamics make create a nonlinear discrete state transition
function,

Manual Control Tuning

 

I would do this but it would be a waste of time.  Basically, there are 4 gains to tune, the same produced by LQR.  Hand-tuning the gains is time consuming and will unlikely produce better results than LQR, as far as optimal control goes.  However, hand-tuning could be used if specific overshoot specifications are required.

 

LQR Design

 

LQR is an optimal control method for linear systems.  Our system is nonlinear, but because it usually remains upright, we can linearize it about that point.  The linearized discrete state transition function has the form

 

 

 

For a Segway with m1=1kg, m2=1kg, r=0.5m, L=1m and time-step DT=1e-2, the linearized matrices are

 

 and

 

 

The optimality of the control is determined by a cost function, , that is quadratic in the state and actions,

 

By choosing values for Q and R, the behavior of the optimal control is changed.  For the Segway, the most important property is that the bar be held as close to upright at all times – because it represents a person.  The Q and R matrices used and resulting K are

 

 and  and

 

The figure below shows the Segway following a square wave between x=±1.

 

Dynamic Programming

 

Using dynamic programming, we don’t need to linearize the system.  The value function determines the optimal control according to a given cost function,

 

 

The value function has stored values at discrete points in the state space.  During calculation, we use continuous actions and states and use multi-linear interpolation to approximate the value function.  The cost function used is identical to the cost function used by the LQR design above.

 

Controlling Motion from Point to Point

 

This controller uses a mixture of the controllers described above.  First, the Segway starts from rest and transitions to a small tilt with a forward velocity.  Then, it moves at this constant tilt and forward velocity until it reaches the goal.  At this point, it transitions to a stop at rest.  These three controllers are described in more detail below.

 

Find the Optimal Tilt Angle

 

Before this controller can be constructed, the desired tilt angle, δ, must be known.  This is found through an optimization.  The Segway is simulated for 1 second at a range of tilt angles using an LQR gain matrix.  Whichever tilt angle has the lowest cost for the desired forward velocity is used.  The graph below shows how this angle varies with different forward velocities.

 

 

Transition between Rest and Tilt

 

Now dynamic programming is used to find an optimal policy for this transition.  Transitioning from rest to tilt and tilt to rest are really the same problem if it is assumed that the boundary conditions, , are the same.  For calculations, the goal is defined as the resting state, .  The policy that is generated applies to the tilt-to-rest transition.  The policy for the rest-to-tilt transition is just the negative of this policy.  The figure below shows a result.

 

 

Cruising

 

Now the Segway starts at the state  and moves to a goal state  using an LQR controller.  The figure below shows this controller working to keep the Segway moving at a forward velocity of 1 m/s at its optimal tilt angle using the LQR controller.

 

 

Conclusions

 

This assignment has shown the value of LQR control and dynamic programming for feedback control of nonlinear systems.  Controlling the Segway was easily achieved using LQR.  Dynamic programming proved to be difficult due to the “curse of dimensionality.”  Even with randomly sampled actions and interpolation, computation time, convergence and state resolution are problems.  The results for the rest-to-tilt and tilt-to-rest transitions show that perhaps the most optimal trajectory is appears under-damped and oscillates around the goal.  However, using the LQR controller results in an almost over-damped response, with the rise time and settling time are greatly lower.  The figure below shows what happens when the two controllers are summed together.  The dashed lines represent the LQR controller alone.

 

The code for this assignment can be downloaded here: segwaycode.zip