Two recently published research papers from DeepMind outline their development of new machine-learning architectures with the ability to imagine, reason, and plan for optimal behaviors within an uncontrolled environment. Unlike AlphaGo and other machine-learning models that operate within environments with clear-cut rules and no unforeseen obstacles, these new architectures are designed to plan for and adapt to the unpredictability the real world.
Currently, model-free deep neural networks exist that can map raw data to values or actions, but this type of reinforcement learning requires massive amounts of training data and the execution of many incorrect predictions before it can yield successful results. In addition, these networks lack the imagination to put the insights gleamed from this data to use in similar tasks within the same environment.
DeepMind’s newly developed I2A architectures overcome the limitations of these machine-learning models with the ability to interpret predictions and use them as additional context in deep policy networks. In other words, I2A uses imagination and past experience to model its environment and reason about it, rather than optimizing decisions based on trial-and-error. This allows the architectures to achieve greater efficiency and minimize mistakes, creating models that can operate in environments where decisions are irreversible or have significant consequences.
These imaginative capabilities are the product of a neural network that can extract information useful for future decisions and ignore irrelevant information. In addition, the imagination encoder can interpret the dynamics of an imperfect environment by extracting useful information about the environment apart from rewards, adapt the number of imagined solutions to suit each problem, and learn a variety of planning strategies with various accuracies and computational costs.
DeepMind is currently testing these models utilizing two computer games that require planning and reasoning and contain irreversible actions. In addition, DeepMind has limited the trial-and-error opportunities for each game by allowing each agent to try each level only once, rewarding imagination and planning rather than real-world testing that can result in mistakes. The results of these tests, along with those included in DeepMind’s research demonstrate vastly improved data efficiency and performance in dealing with imperfect environments as compared to other machine-learning models.