A possible improvement for black-box adversarial example attack

This paper presents a cunning adversarial example attack on an unknown DNN model, with a small amount of black box calls to the model available (which happen before the input-for-deformation is given).  The algorithm is basically to build a different model, an adversarial DNN, with some arbitrary choice of architecture and hyper parameters, and learn the parameters on a data set given by oracle calls to the model. The choice of inputs to the oracle is made iteratively by taking the inputs from the previous iteration and choosing points close by that are the closest to the decision boundary of the last learned adversarial DNN.

I think it may be possible to improve the choice of the new inputs. The best choices for a new input are inputs such that they should have a big impact on the decision boundary, weighted by the probability distribution of possible inputs.

Several thoughts regarding “big impact on the decision boundary”:

  1. The work is entirely done during preprocess, as the (adversarial) model is known.
  2. Points near (at) the decision boundary are very good.
  3. A point on the decision boundary can be approximated in log-time.
  4. It may be possible to find good measures to the extent that a new input has changed the decision boundary.
    1. For example, maybe a form of regularization where we motivate changing as many parameters by as much as possible is good enough. (I guess not, but it is very simple to test)

Several thoughts regarding the probability distribution of possible inputs:

  1. It seems like a very important concept to understand deeply.
  2. It is probably heavily researched.
  3. If there is an available training set, it may be possible to approximate the manifold of the probable inputs.
    1. Maybe GANs can help with this problem.