Philip Tetlock, the author of “Superforecasting” and a leading researcher in the psychological theory behind human predictions, classified forecasters into two types – hedgehogs and foxes. The archetypical description is as follows:
- Hedgehogs – people with a specific and highly detailed world view on a topic, with a single source through which to explain everything. They are experts in that field and their theory can be used to give an explanation to any phenomenon.
- Foxes – people who maintain many models of the world, and a lot of data even if it contradicts their working model. They answer questions about the world by combining all of their data and models, weighted by probabilities.
We see that machine learning today can be thought of as a foxy way of obtaining solutions, as opposed to many examples in AI history which had tried a more hedgehog-like approach. This makes a lot of sense, as ML is mainly successful and is measured in it’s ability to make statistical prediction, where (as is true in humans) foxes do a much better job.
However, we do see a benefit in maintaining expert hedgehogs in our society. This may be because of the way humans think, which is verbally and consequentially, and thus a simple (confabulated) explanation for why some phenomenon should occur can help humans solve more problems as it adds to their arsenal of ideas.
So it may be the case that in order to achieve more human like intelligence in machines we should build confabulatory systems with a hedgehog mind.
In this post I collect some disorganized notes on the topic of the security of machine learning algorithms.
These notes are a result of skimming some papers while preparing a lecture on the topic, as well as some random accompanying thoughts.
In this post I’ll collect some initial thoughts regarding the security of Google’s Federated Learning, which is a method for learning a model on a server where the clients do not send their data, but instead they send an updated model trained on their device with their own data. The main points are:
- Knowing the clients update can give information on his training data.
- Knowing the average of some updates is likely to give information on each user’s update.
- If an attacker can send many updates, he can get information on a specific client.
The first two points are acknowledged briefly in the article.
This paper presents a cunning adversarial example attack on an unknown DNN model, with a small amount of black box calls to the model available (which happen before the input-for-deformation is given). The algorithm is basically to build a different model, an adversarial DNN, with some arbitrary choice of architecture and hyper parameters, and learn the parameters on a data set given by oracle calls to the model. The choice of inputs to the oracle is made iteratively by taking the inputs from the previous iteration and choosing points close by that are the closest to the decision boundary of the last learned adversarial DNN.
I think it may be possible to improve the choice of the new inputs. The best choices for a new input are inputs such that they should have a big impact on the decision boundary, weighted by the probability distribution of possible inputs.
Several thoughts regarding “big impact on the decision boundary”:
- The work is entirely done during preprocess, as the (adversarial) model is known.
- Points near (at) the decision boundary are very good.
- A point on the decision boundary can be approximated in log-time.
- It may be possible to find good measures to the extent that a new input has changed the decision boundary.
- For example, maybe a form of regularization where we motivate changing as many parameters by as much as possible is good enough. (I guess not, but it is very simple to test)
Several thoughts regarding the probability distribution of possible inputs:
- It seems like a very important concept to understand deeply.
- It is probably heavily researched.
- If there is an available training set, it may be possible to approximate the manifold of the probable inputs.
- Maybe GANs can help with this problem.
First, go and read this OpenAI blog post. Read it? good!
In the next 10 minutes, I’ll write as much as I can on my thoughts regarding the claims posed in the above mentioned post.
I have a slight cognitive dissonance.. I got used to thinking that RL is very good, and that the results obtained on the Atari games, for example, are extremely high. However, it seems that Evolution Strategies (ES), as are any type of “local search” methods, are so generic and simple, such that they should be the lowest standard for any machine learning algorithm.
Is it correct to take away from this that overall RL is just not very good, but that it’s success is mostly a story of fast supercomputers?
OpenAI mentions that these kinds of local search methods are not good for supervised learning. This means that we do have some tools which are much better than local search, but that they are not easily transferable.
A different explanation could simply be that the Atari games and OpenAI Gym-type games, are specific examples where RL algorithms are not working well. Maybe due to their small action space?