Blag Weeks 6, 7: A Hill Too Steep to Climb

Sometimes models don’t generalize well. The team’s path met a gradient too great to climb: LWPR could not generalize the local models (built with receptive fields) to broader regions of the state space for use in the controller. This makes sense, in hindsight, because LWPR is an efficient high-dimensional function approximation approach. It performed worse than regularized linear regression on the forward model identification task, with and without additional basis functions. The new plan is to use Gaussian Process Regression for function approximation locally and a more global approach for regions that have not yet been explored. Subhash is currently working on implementation of GPR.

Ideally, an InfoMax controller would drive the system to regions that have not yet been explored so that the model improves everywhere. If information gain is quantified as a negative cost, as in KL Divergence, exploratory behavior will emerge, indirectly adding a learning component to the controller. However, this must be contrasted with the normal behavior of the the controller. The controller should drive the state to the most relevant configurations, which may in turn inform the model in the most useful ways. So the benefits of directed information pursuit must be weighed against the normal behavior. Mike is reading into different information pursuit procedures, and just finished restructuring the initial SysID and example control code for inclusion in the interlab repository.

ROS and Gazebo communication with Python and MATLAB has been implemented. Scripts capture the state variables of the system and save them to a script repeatedly. Because we do not expect our SysID and Control algorithms to run in real time, at least at first, we’ll have the simulation repeatedly pause, pass the information to SysID in MATLAB, which then passes the model to the controller, which creates an open loop control signal based on the model. This control signal then drives the system in simulation for the next period. Hopefully, the learned model and controller are accurate enough to achieve the task (in the first case, pendulum swing up). Integration between the subsystems is now completed, left is improvement of the individual pieces. Deniz is now doing preliminary work on ROS-ARDrone communication and refactoring of the communication code for export to the other labs in the Lifelong Learning project.

With greater structure in the SysID code, and the integration complete, we should be able to turn around an analysis on the efficacy of GPR much faster than before.