In this post I’ll collect some initial thoughts regarding the security of Google’s Federated Learning, which is a method for learning a model on a server where the clients do not send their data, but instead they send an updated model trained on their device with their own data. The main points are:

- Knowing the clients update can give information on his training data.
- Knowing the average of some updates is likely to give information on each user’s update.
- If an attacker can send many updates, he can get information on a specific client.

The first two points are acknowledged briefly in the article.

# Learning data from update

I think that it is possible to get a lot of information on the training set from the update round, similar to model inversion attacks. Due to a lack of time, I’ll just write some heuristic comments and maybe return to analyze it thoroughly someday.

We will neglect the “stochastic” part of SGD, but it should be possible to obtain information anyway. We shall also assume that the model is a simple fully connected NN model, with RELU layers, designed for classification. The loss function will be of the form , where is the loss function for a single sample, is the number of classified sample pairs , and are the weights (parameters) of the model.

The update round works by calculating the gradient of the loss function (applied on the training set) with respect to the model parameters, and moving the parameters in the direction opposite to the gradient, for length which is constantly proportional to the gradient’s length. Thus, comparing the two models (the original and the updated), we can compute the gradient.

First, let’s assume that there is a single data point . In this case, it is easily computed that that gradient in the final layer will be positive for all of the inputs for the correct label , but negative for all the rest, and thus we can recover the correct label. Furthermore, computing the gradient as a function of (and ), we see that since the dimension of the parameters is more than the dimension of the feature space, this gradient will be a one to one mapping. Using non-convex optimizations, we can approximate . It may actually be easy to calculate the correct directly.

If there is more than one sample, then the same method will work if the number of samples is small enough. If it is big, then we expect many possible solutions, so the only way out is by using the fact that the distribution of the samples is not random, and there is much less entropy..

# Recovering information from average gradient

It may be possible to obtain information on specific gradients, if their distribution is known. For example, if one gradient out of all of the gradients to be averaged is associated to a person that is extremely interested in elephants, then it may be possible to recover information regarding his searches related to elephants.

A variation of the above idea, could perhaps let the attacker know if one of the individuals involved had trained on something specific which is personal. For example, if I want to understand whether Trump is googling for information regarding climate change (hopefully he is..), and the classifier is classifying personal google search preferences, then the statistics of how often the gradient for “climate change” is positive for batches that contains Trump’s model updates, will give me some information.

# Active attack for recovery of a single gradient

If I want to bypass the averaging problem, and recover a specific user’s gradient, then it may be possible at some situations to do so. Whenever our targeted user sends his model update to the server, then at exactly the same time the attacker may send a lot of updates that keep the model the same (or not the same, it does not really matter) and it may happen that the training batch will consist only of the attackers updates and the victim. Learning how the model changes as a result will let the attacker know the gradient of the victim.

This attack is very easy to be implemented by the server itself. This could be worse, as this design is actually designed more for protecting the user’s privacy from the server.