Blag Week 2: Define the Machine

“I was born not knowing and have had only a little time to change that here and there.” - Richard Feynman

Having been inspired by Paul Ruvolo’s work with “Diego San”, the three feckless “researchers” set down the path of control theory and trajectory optimization, with an eventual goal of extending the lifelong learning framework to fault-tolerant control.

Research collaborators at the Washington State University and University of Pennsylvania have expressed interest in working with robotic systems, Turtlebots in particular. Shared learning among agents, homogeneous or heterogeneous, within dynamic environments would put ELLA to the test in a way that could build the experimental foundation for the validity of the lifelong learning approach.

It may be useful to sketch a hypothetical experiment: several Turtlebots have search and navigation tasks, and the controller has parameters to learn so that the robot adapts its search patterns to the particularities of its environment. There may be dense indoor environments (possibly rooms with a lot of furniture), sparse indoor environments, indoor environments with many subpaths (hallways), open outdoor or indoor environments, and so on. The controller could encode explored space as a reward itself, separate from finding the actual objective. The parameters of the controller could encode the decision process for exploring a path. To incorporate something like ELLA or PG-ELLA, there would be a finite number of parameters corresponding to the dimensions of the shared knowledge basis. A task could be an entire search mission, or a single decision between branching paths. The new state, possibly up until the next fork, would quantify the information gained from searching the space. This could then be a labeled data point, if there were some relative metric for gained information. With many of these labeled points (which become tasks) the shared knowledge basis is updated, and thus trained. Different environments might have different realized values after making a decision, but could still have an underlying structure as previously discussed, thus it could make for a good problem for ELLA’s lifelong learning problem formulation. Obviously, this is only a sketch, but it can give the reader an idea of how an experiment might be designed.

On the control theory side, the team walked through an implementation of a Linear-Quadratic Regulator, and discussed some theory for optimality of the solution. Other topics included the Kalman filter, the HJB and Ricatti equations, Pontryagin’s max/min principle, the control Hamiltonian and many others. The team also talked about PAC bounds and encoding hyperbias through choice of hypothesis spaces and families last week in an effort to gain a greater ML foundation.

Fault-tolerant control work could include taking partial system failures as learning tasks, and attempting to learn in real time how to refine a controller to compensate for a system fault such as failure or loss of a quadcopter’s rotor during flight, leading to new system physics and by extension control laws. After deciding that this was a sufficiently amazing application, we set this as one of the main potential goals. To cover more area and speed up the approach to the problem now that we have defined the machine, Mike is reading more about iterative Linear Quadratic Gaussian control and trajectory optimization with approximate inference, Subhash is reading about system identification theory, and Deniz is setting up the necessary hardware and ROS simulation software with Gazebo.

Next steps are to implement more control algorithms (especially with stochastic systems), do control simulations, and design simple system identification tasks.