Marketing Failure

The amount of money spent worldwide on advertisements is astronomical, reaching about 550 billion dollars annually and is expected to continue growing rapidly. In 2016, the top 20 companies with the biggest annual advertising budget spend between 2.7 to 8.3 billion dollars annually.

Is this a result of a race-to-the-bottom-type market failure? If so, can we solve it?

Continue reading

Advertisements

Solving the Cyber-Security Bubble

There seems to be a big bubble in cyber security. Many awful products in the market, and many bad startups easily raising funds. I believe this problem needs to be addressed, either by governmental regulations or by independent companies. In this brief post I lay out the problem and what has been done so far to mitigate this effect.
Continue reading

Effective Global Scientific Research

Scientific advancements are one of the greatest drivers for improving the quality of life of everyone on the globe. In this post I’ll present an overview of an idea, which is basically aimed at improving the allocation of resources on scientific research such that it will correspond better to what is important for society.

We’ll start by a bit of background, and then go to the actual idea. The idea is simply to construct a graph of the important objectives in science, and how each problem relates to other problems.

Continue reading

Security of Google’s Federated Learning

In this post I’ll collect some initial thoughts regarding the security of Google’s Federated Learning, which is a method for learning a model on a server where the clients do not send their data, but instead they send an updated model trained on their device with their own data.  The main points are:

  1. Knowing the clients update can give information on his training data.
  2. Knowing the average of some updates is likely to give information on each user’s update.
  3. If an attacker can send many updates, he can get information on a specific client.

The first two points are acknowledged briefly in the article.

Continue reading

A possible improvement for black-box adversarial example attack

This paper presents a cunning adversarial example attack on an unknown DNN model, with a small amount of black box calls to the model available (which happen before the input-for-deformation is given).  The algorithm is basically to build a different model, an adversarial DNN, with some arbitrary choice of architecture and hyper parameters, and learn the parameters on a data set given by oracle calls to the model. The choice of inputs to the oracle is made iteratively by taking the inputs from the previous iteration and choosing points close by that are the closest to the decision boundary of the last learned adversarial DNN.

I think it may be possible to improve the choice of the new inputs. The best choices for a new input are inputs such that they should have a big impact on the decision boundary, weighted by the probability distribution of possible inputs.

Several thoughts regarding “big impact on the decision boundary”:

  1. The work is entirely done during preprocess, as the (adversarial) model is known.
  2. Points near (at) the decision boundary are very good.
  3. A point on the decision boundary can be approximated in log-time.
  4. It may be possible to find good measures to the extent that a new input has changed the decision boundary.
    1. For example, maybe a form of regularization where we motivate changing as many parameters by as much as possible is good enough. (I guess not, but it is very simple to test)

Several thoughts regarding the probability distribution of possible inputs:

  1. It seems like a very important concept to understand deeply.
  2. It is probably heavily researched.
  3. If there is an available training set, it may be possible to approximate the manifold of the probable inputs.
    1. Maybe GANs can help with this problem.

Single-use code for 3D printing

When 3D printers will be potent and cheap enough, they can make an enormous economical change. In this post I discuss the main reasons for this economical change, and ponder some technological concepts which may restrict it. I am not sure if this restriction is beneficial or not, as we’ll discuss in the summary.

Digitization ⇒ duplicability

If the information of the product is entirely digital, then there are two main consequences:

  • It will be easy to share the product p2p. We see this today in many areas, such as music, film or electronic books, where downloaded copies can be shared freely as torrents or in file sharing sites.
  • It will be easy to “use” the product more then once. We usually take it for granted that this has to be the case, as music. books and the like can be used repetatively once owned. Note that it is not a necessity, and in fact there are many alternatives such as leasing or radio

Economical implications

The impact of digitization is obviously huge, as can be seen in the case of the music industry. The analysis here is important and must be data driven, so it should take a more careful research on the topic which I will postpone.

A relevant question which is not analogue to the case in the music industry is “what are the implications of being able to generate an object more then once”? I’ll leave it open as well.

Single use code

The challenge is to find a way such that users can download a design online, and use it immediately to print the object, but  in such a way that the majority of users can not print the design again.

If the printer is stateless (that is, has no intrinsic memory), then sending the same packet over to the printer will result in the same action of the printer. Hence, even if the driver of the printer acts in different ways, a simple solution to be able to print the same thing many times is by sniffing the communication for the first “legal” print, and repeating it for the next prints. This can be automated somewhat easily, and the program for doing so can be made simple enough so that many users will use it. Thus, we need some level of sophistication in the driver-printer protocol to avoid this attack. It is also clear that the printer’s code and internal state needs to be unmalleable.

The naive idea of having the printer try to remember information about which models it had already printed (say by storing their hash values), and not allow to print the same model again. This is not good enough, as it is easy to make minor changes to the model so that it wont print in the same way. Even if the printer would have a clever algorithm which can tell if two models are the same, which is very hard to do efficiently, these kinds of protections can always be overcome.

We can try to use cryptography to make sure that the printer will not use the same code twice. Assume that the printer has a secret key shared with the printing company. Then whoever wants to publish their design for a unique printing will send it to the printing company, which has a platform for selling designs, and then anyone who buys the design gets it encrypted and signed so that only his printer can decrypt and authenticate the code for the model. In this case, the model can not be shared, and the hashing solution above can protect from duplication. This solution assumes that the vast majority of users will not open their printers and obtain the private key (which can be made extremely complicated). Another version is to sign on the model and the printer ID using public key cryptography, and have the printer only print what is verified as coming from the company and have the correct ID. This version is problematic, as the code itself will be visible.

The main technical problem with the above solution is that it does not allow for printing of free models, or home generated ones, and here is where it gets interesting. Just allowing for printing of unencrypted models has the inherent problem that it only takes one person who manages to recover his own key to be able to spread the model. However, it would still cost money, so it can be still quite good. Another problem is the managing of the keys, but it should be fine.

conclusion

The above scheme is probably fine, but I think a better solution is possible. Eventually, the biggest problem for any such solution is that the printer manufacturer and the platform for the unique printing of models needs to work together, and create a large enough community of buyers and sellers so that new people will choose to but these specific printers.

Interactive Biometric Identification

Intro

Today, we have a problem with the internet: it is terribly difficult to validate another person’s identity. Even to figure out whether an online entity is an actual person can be difficult. Someday I will analyze the importance of identification (maybe as opposed to anonymity, though they are not mutually exclusive), but for now let’s take it for granted that it is worthwhile discussing. It is also important to define what we mean by identity, which I also won’t do now.

The motivation for this idea came out of listening to David Birch’s lecture on How to use Identity and the Blockchain, where he gave one possible definition of identity and talked about how it relates to the internet and Blockchain technologies.

The idea

The idea is trying to solve the validation problem without having the users remember (or store) a secret password, using biometric data instead, but avoiding some inherit security problems.

One possible solution could have been to send a photo of the face of the person whose identity is to be validated, or fingerprint or a voice sample. This data could be validated by an image recognition model owned by the authority that’s making the validation. One problem with this approach is that it may be easy to obtain the person’s biometric data, say by viewing her Facebook profile, so we cannot completely trust the biometric data sent.

My solution to the above problem relies on a challenge-response mechanism, such that the biometric data being sent is dependent on the challenge given by the validating server. For example, the server might send a random sentence which the person will need to say to be validated. Then the server checks both that the voice comes from the correct person and that the words spoken correspond to the challenge initiated.

Two other ideas are to use the flashlight or the vibration of the smartphone. Lets say the server sends Challenge = 0001010111101111000101010111110111001010, and then the user will take a video of himself such that the light (or vibration) is turned on every frame corresponding to a 1  bit and turned off otherwise. In this way the video can be validated as happening in real time and not duplicated, as well as still enabling biometric identification.

Pros and Cons

The pros have been sort of laid out. Let’s list many of the cons which makes this idea disadvantageous:

  • Computational difficulty of validation. Facial and vocal recognition still have a high error rate. Maybe using fingerprints is easier.
  • Big bandwidth usage. Uploading videos takes a lot of bandwidth.
  • Possibility of attacks. It still may be possible to simulate the flashlight on top of an existing video. I also know of attempts to make computer-generated audio that sounds like specific individuals. Also, in many cases it is possible to make fake inputs to machine-learning models that results in a specified classification (see this for example).
  • Inconvenient usage. Just imagine doing a selfie with the flashlight randomly on or off…

Conclusion

While there may be a good idea among these lines, the options laid out have many underlying problems. Because of this I do not think that this idea, as applied to my specific problem, is a very good one. It has been an interesting thought experiment, anyway.