Blag Weeks 8-11: I Heart Gauss

The team gained a greater appreciation for Johann Carl Friedrich in the implementation of Gaussian Process Regression with basis functions. GPR was chosen to replace LWPR because of LWPR’s failure to generalize outside of the state space already explored. The inclusion of basis functions in GPR will allow global semi-parametric modeling of system dynamics with Gaussian Processes modeling the local residuals. We adopted the code and book from Gaussian Processes for Machine Learning by Carl Edward Rasmussen and Chris Williams.

Subhash and Mike have been working on the implementation and evaluation of GPR for system identification.

“Definition 2.1 A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.” It is described entirely by a mean and covariance function (GPML p. 13). Gaussian Process Regression is not a new algorithm, but has been used successfully in robotics dynamics modeling. A Gaussian Process can be thought of as defining a distribution of functions, with inference occurring in the function-space rather than the weight-space of standard parametric regression (GPML pg. 7). Optimal predictions may be made by using Bayesian inference, with tuning of hyperparameters to maximize the posterior log marginal likelihood with respect to the hyperparameters. The theoretical basis for GPR with basis functions was presented in Ch. 2.7 of GPML. However, it did not describe the optimization process with the use of basis functions, nor did the package include basis functions. Mike did the math to find the derivative of the log marginal likelihood w.r.t. the parameterization of the basis function covariance matrix and is within striking distance of completing the integration of basis functions with the GPML package. Subhash has implemented GPR with basis functions separately from the package, and has verified that GPR gives an acceptable model for the single pendulum swing up with generalization outside of the sampled space, which was the needed improvement over LWPR.

The Control Theory aspect of the project was not on the critical path in this stage, so there are no updates there, besides verification that the model learned by ridge regression was sufficient to complete a single pendulum swing up when given to iLQG.

Deniz continued to refine the MATLAB-Python-ROS-Gazebo communication pipeline, and made improvements to experimental pipeline. Now with procedural generation of ROS files to create simulations, sweeps of the parameter space for system dynamics can be done, with clear implications for multi-task and lifelong learning. For instance, a double pendulum’s mass and lengths of links and friction of pivots can be varied. Then, simulations with SysID can be run to find the system dynamics with a wide range of underlying parameters. These models can inform a multi-task or lifelong learning system based on the relations between parameter values and the dynamics models returned.

The plan for the upcoming semester is to integrate GPR+basis functions with ELLA. SysID and model learning transfer will be the theoretical focus, with support from ROS and possibly physical experiments.

Overall, this summer research team has gained the foundational skills and knowledge and built the software architecture and experimental pipeline needed to make advances in the Lifelong Learning space with applications to Fault Tolerant Control.