DeepMind just published a mind-blowing paper: PathNet
Potentially describing what general artificial intelligence will look like.
Since scientists started building and training neural networks, Transfer Learning has been the main bottleneck. Transfer Learning is the ability of an AI to learn from different tasks and apply its pre-learned knowledge to a completely new task. It is implicit that with this precedent knowledge, the AI will perform better and train faster than de novo neural networks on the new task.
DeepMind is on the path of solving this with PathNet. PathNet is a network of neural networks, trained using both stochastic gradient descent and a genetic selection method.
PathNet is composed of layers of modules. Each module is a Neural Network of any type, it could be convolutional, recurrent, feedforward and whatnot.
Each of those nine boxes is the PathNet at a different iteration. In this case, PathNet was trained on two different games using a Advantage Actor-critic or A3C. Although Pong and Alien seem very different at first, we observe a positive transfer learning using PathNet (take a look at the score graph).
How does it train
First of all, we need to define the modules. Let L be the number of layers and N be the maximum number of modules per layer (the paper indicates that N is typically 3 or 4). The last layer is dense and not shared between the different tasks. Using A3c, this last layer represents the value function and policy evaluation.
After defining those modules, P genotypes (=pathways) are generated in the network. Due to the asynchronous nature of A3c, multiple workers are spawned to evaluate each genotype. After T episodes, a worker selects a couple of other pathways to compare to, if any of those pathways have a better fitness, it adopts it and continues training with that new pathway. If not, the worker continues evaluating the fitness of its pathway.
Pathways are trained using stochastic gradient descent with back-propagation throughout a single path at a time. This guarantees a manageable training time.
After learning a task, the network fixes all parameters on the optimal path. All other parameters must be reinitialized, otherwise PathNet will perform poorly on a new task.
Using A3C, the optimal path of the previous task is not modified by the back-propagation pass on the PathNet for a new task. This can be viewed as a safety net to not erase previous knowledge
PathNet does not work on every pair of game (blue cells equal negative transfer). But the important takeaway here is that PathNet actually works for some sets of games, which is already an immense step towards better transfer learning.
We can imagine that in the future, we will have giant AIs trained on thousands of tasks and able to generalize. In short, General Artificial Intelligence.
Link to the paper: https://arxiv.org/pdf/1701.08734.pdf
This post was originally posted by the author here.