First, go and read this OpenAI blog post. Read it? good!
In the next 10 minutes, I’ll write as much as I can on my thoughts regarding the claims posed in the above mentioned post.
I have a slight cognitive dissonance.. I got used to thinking that RL is very good, and that the results obtained on the Atari games, for example, are extremely high. However, it seems that Evolution Strategies (ES), as are any type of “local search” methods, are so generic and simple, such that they should be the lowest standard for any machine learning algorithm.
Is it correct to take away from this that overall RL is just not very good, but that it’s success is mostly a story of fast supercomputers?
OpenAI mentions that these kinds of local search methods are not good for supervised learning. This means that we do have some tools which are much better than local search, but that they are not easily transferable.
A different explanation could simply be that the Atari games and OpenAI Gym-type games, are specific examples where RL algorithms are not working well. Maybe due to their small action space?