Last month in his keynote talk at the NVidia GPU Technology Conference, Andrew Ng (Baidu Chief Scientist, Stanford Professor, Google Brain team founder) described face comparison technology he and his team at Baidu developed. If you haven’t taken his Coursera machine learning course, that’s ok, this post doesn’t assume technical knowledge about machine learning.
The challenge is to compare two pictures containing faces and decide whether they are pictures of the same person or two different people. In the experiment, 6000 pairs of images were examined, both by humans and by algorithms from various teams around the world. Three teams, including Baidu, Google and the Chinese University of Hong Kong achieved better-than-human recognition performance. The Baidu team made just 9 errors out of the 6000 examples.
Here is a slide from Professor Ng’s talk with images of the ones they got wrong:
You may notice that the top-left image pair is of movie actress Nicole Kidman, which the Baidu system incorrectly classified as two different people. What may be less obvious, is that the second image of Ms. Kidman was taken from her film, “The Hours” in which she is wearing a prosthetic nose.
Here are the overall results from implementations done by teams throughout the world. The dashed line indicates human performance at the same task. People typically think of recognizing faces as a very innate human task, but once again, we can be surprised that machine learning algorithms are now capable of equalling or outperforming humans.
In his talk, Professor Ng credits two major factors as contributing to the vast improvements in machine learning technology over the past five years. As an analogy, think of launching a rocket into orbit: we need 1) both giant rocket engines and 2) lots of rocket fuel.
The rocket engine in his scenario is the incredible computational performance improvements brought to us by GPU technology. The fuel then, is access to huge amounts of data, coming to us from the prevalence of internet connected sensors, online services, and our society’s march toward digitization.
To perform this astounding face comparison judgment, algorithms are trained on facial data. The flow chart in the image above shows that if the algorithm is not performing well on the training data, we need a more powerful rocket engine.
The second part of the flowchart above illustrates that if the algorithm is learning the training data quite well, but performing poorly when presented with new examples, then perhaps more “rocket fuel” is required, so gathering more training data would be the logical approach to improve the system.
Professor Ng compares the benefits of high performance computing with cloud computing and states that better training performance may be achievable say with 32 GPUs in a co-located rack than by a larger number of servers running in the cloud. His reasoning is that communication latency, server downtime and other failures are more prevalent in cloud-based systems because they are spread out across more machines with more network connections.
Machine learning has been demonstrated to be good at many things. The recent improvements are not limited to face comparisons, in fact in this same talk improvements to Baidu’s speech recognition system were shown to perform well in noisy environments.