Reinforcement learning (RL) is a general model for adaptive control (e.g., autonomous driving, intelligent tutoring or robotics). In such a setting, an agent learns by interacting with an environment by trial and error. Recently, the combination of deep learning and reinforcement learning (called deep RL) has proved to be extremely powerful. Using such techniques, an agent can learn to play video games from visual inputs or the game of go at a superhuman level. Currently, research on RL and deep RL has become very active in the machine learning community, mainly because of the potential of this approach. Ongoing research work notably focuses on making those techniques more practical and efficient such that they could be applied to more diverse domains.
This proposed project is the continuation of an exploratory project started as a collaboration with Huawei. The goal in that collaboration was to combine reinforcement learning methods and reasoning techniques to learn decision-making policies under the form of first-order logic programs. This research was conducted under the assumption that the input of the deep reinforcement was already available in the logic form. In this proposed project, we aim to learn object-centric representations from visual inputs such that the deep reinforcement learning agents can learn from high-level and more compact representations. The potential benefits are as follows: faster deep reinforcement training, more robust solutions, or interpretable policies.
A PhD student in my team has already started some work in that direction and designed a method based on visual transformers (paper under review). The undergraduate students joining this project will collaborate with my PhD student to help further improve the current method and perform more experiments.
The expected work would be as follows (in quarters):
- Q1: Learn the basics of deep reinforcement learning and deep learning; start performing some experiments with the current code written by my PhD student
- Q2: Improve the current method to make it more generic and extract object-centric information from images; perform further experiments with new method
- Q3: Analyze experiments data and tune the new method
- Q4: Demonstrate the effectiveness of the final method on various domains; write a research paper describing it