Nicole Kidman’s Prosthesis Fools Better-Than-Human Face Recognition Algorithm

Last month in his keynote talk at the NVidia GPU Technology Conference, Andrew Ng (Baidu Chief Scientist, Stanford Professor, Google Brain team founder) described face comparison technology he and his team at Baidu developed. If you haven’t taken his Coursera machine learning course, that’s ok, this post doesn’t assume technical knowledge about machine learning.

The challenge is to compare two pictures containing faces and decide whether they are pictures of the same person or two different people. In the experiment, 6000 pairs of images were examined, both by humans and by algorithms from various teams around the world. Three teams, including Baidu, Google and the Chinese University of Hong Kong achieved better-than-human recognition performance. The Baidu team made just 9 errors out of the 6000 examples.

Here is a slide from Professor Ng’s talk with images of the ones they got wrong:

Face Recognition

You may notice that the top-left image pair is of movie actress Nicole Kidman, which the Baidu system incorrectly classified as two different people. What may be less obvious, is that the second image of Ms. Kidman was taken from her film, “The Hours” in which she is wearing a prosthetic nose.

Nicole Kidman in "The Hours"
Nicole Kidman in “The Hours”

Here are the overall results from implementations done by teams throughout the world. The dashed line indicates human performance at the same task. People typically think of recognizing faces as a very innate human task, but once again, we can be surprised that machine learning algorithms are now capable of equalling or outperforming humans.

More Human than Human

In his talk, Professor Ng credits two major factors as contributing to the vast improvements in machine learning technology over the past five years. As an analogy, think of launching a rocket into orbit: we need 1) both giant rocket engines and 2) lots of rocket fuel.

The rocket engine in his scenario is the incredible computational performance improvements brought to us by GPU technology. The fuel then, is access to huge amounts of data, coming to us from the prevalence of internet connected sensors, online services, and our society’s march toward digitization.

To perform this astounding face comparison judgment, algorithms are trained on facial data. The flow chart in the image above shows that if the algorithm is not performing well on the training data, we need a more powerful rocket engine.

High Performance Computing
The Talon 2.0 high performance computing system


The second part of the flowchart above illustrates that if the algorithm is learning the training data quite well, but performing poorly when presented with new examples, then perhaps more “rocket fuel” is required, so gathering more training data would be the logical approach to improve the system.

Professor Ng compares the benefits of high performance computing with cloud computing and states that better training performance may be achievable say with 32 GPUs in a co-located rack than by a larger number of servers running in the cloud. His reasoning is that communication latency, server downtime and other failures are more prevalent in cloud-based systems because they are spread out across more machines with more network connections.

Machine learning has been demonstrated to be good at many things. The recent improvements are not limited to face comparisons, in fact in this same talk improvements to Baidu’s speech recognition system were shown to perform well in noisy environments.




Apple’s ResearchKit Puts Clinical Trials in Your Pocket

Building HIPAA compliant software has never been easy. Modern apps served from the cloud, and enabled for mobile devices presents even greater challenges. But imagine the potential for medical research, given the hundreds of millions of smartphones deployed globally, each equipped with dozens of sensors.

Last year when Apple introduced HealthKit for developers, the iPhone leapt suddenly into the ranks of integrated health tracker, along the lines of Fitbit and Jawbone activity trackers. But the iPhone has one major advantage over most other health tracking devices: built-in internet connectivity.

Whereas with Fitbit, Jawbone, Nike Plus, wifi-enabled scales, blood pressure monitors, and similar devices, users need to complete a multi-step setup process, but the iPhone is ready to send useful data about number of steps walked or run, flights climbed, and many other sensor events straight to the cloud.

The FitBit requires additional software installation.
The FitBit Ultra requires additional software installation.


By providing the iOS Health app for free as part of iOS 8, Apple has given consumers a powerful new toolkit for tracking health data. The only problem is, this data is unavailable to researchers. There has been no way for researchers, doctors, hospitals or health administrators to access health data collected via HealthKit, even if a patient were willing to give consent. Until now…

The iOS Health App
The iOS Health App

ResearchKit, officially launching next month, provides a simplified, streamlined user interface framework for health apps to perform HIPAA-compliant clinical trial consent. According to Apple’s ResearchKit website, “With a user’s consent, ResearchKit can seamlessly tap into the pool of useful data generated by HealthKit — like daily step counts, calorie use, and heart rates — making it accessible to medical researchers.”

Apple has partnered with some impressive names in medical research, listing these on its website: The American Heart Association, Army of Women, Avon Foundation for Women,, Dana-Farber Cancer Institute, Massachusetts General Hospital, Michael J Fox Foundation for Parkinson’s Research, Icahn School of Medicine at Mount Sinai, Penn Medicine, University of Oxford, University of Rochester Medical School, Sage Bionetworks, Stanford Medicine, Susan G Komen, UCLA Jonsson Comprehensive Cancer Center, Weill Cornell Medical College and Xuanwu Hospital Capital Medical University.

So what can ResearchKit do for the researcher? The ResearchKit developer framework is divided into three primary modules: Surveys, Informed Consent, and Active Tasks. A touch-based signature panel allows an app user to perform informed consent right on their mobile device. The survey module provides a builder tool to specify types of questions and answers akin to SurveyMonkey, Google Forms or Wufoo, etc. The Active Tasks module is where active data collection begins.

ResearchKit Signature Panel and Activity Completion

With an active task, ResearchKit allows the user to complete a physical task while the iPhone’s sensors perform active data collection. This data can then be securely transmitted to the cloud for inclusion in the study. For example, Stanford’s MyHeart Counts app has already had tens of thousands of enrollees in just the short time since its launch in March, a feat unequaled by any clinical trial.

This is just the beginning. Data collection will not be limited to the sensors native to the iPhone. External devices, communicating over bluetooth for example, can provide more data such as heart rate, temperature, and weight.

According to VentureBeat, “Google also announced last year that it is developing a contact lens that can measure glucose levels in a person’s tears and transmit these data via an antenna thinner than a human hair.” The New York Times also reports this device is being developed by Google in partnership with Novartis.

Glucose Monitoring Smart Contact Lens
Glucose Monitoring Smart Contact Lens


How to Deploy a Rails Application Anywhere with Chef

What Can You Do with Chef?

Here are a few things we did (and you can do) with Chef:

  • Deploy a Rails application to EC2
  • Quickly set up a Jenkins Continuous Integration Server
  • Allow all developers on your team to work with the same virtual machine for development and testing
In this article we will show how to create your first Chef cookbook and use it to deploy a Ruby on Rails application.

What is Chef?

Chef is a configuration management tool. It is used to automate machine configuration and integration into your infrastructure. You define your infrastructure in configuration files and Chef takes care to set up individual machines and link them together. You can read more about what Chef is here.

Chef architecture

Hosted/Private Chef

Hosted/Private Chef architecture consists of several parts:

  • Chef repository contains all your Chef artifacts. It’s recommended to have it in your version control system.
  • Developer machine issues knife commands. knife allows you to push Chef artifacts to Chef server or query information about your infrastructure from Chef server. You can also use knife to manually execute commands on nodes in your infrastructure.
  • Chef server is a central point of Chef architecture. It has all your cookbooks and settings. It tracks information about all nodes in your infrastructure.
  • Nodes are machines managed by Chef. Nodes pull cookbooks and configuration from Chef server.

Chef Solo

Chef Solo is a lightweight Chef solution. It doesn’t require you to have Chef server. However it’s not designed to manage multiple machines. In Chef Solo mode you need to have your Chef repository on the node you’re going to set up. Later in this article we show you how to use Chef Solo to test your Chef recipes and set up Vagrant virtual machine. You can also use Chef Solo to set up development environment on your computer.

Chef artifacts


Cookbooks are the most important Chef artifacts. They contain default configuration, configuration file templates, resource providers, helper scripts, files and recipes. The most interesting part of cookbook is recipes. Recipes are sets of instruction that perform some kind of procedure – usually installs and configures some service but not necessarily.

Data Bags

Imagine you were setting up a brand new Linux instance by hand. You’d probably create some user accounts – you could store the usernames and passwords in your Chef Data Bag. You’d probably pull your application code from a git repository – you could store the git repository url in Chef Data Bag. You might add an SSH private key – you could store the SSH private key in your Chef Data Bag. Think of the Data Bag as a key-value store containing parameters needed when you set up your new machine, or its applications. If you’re familiar with Heroku, anything that goes into your Heroku configuration is a likely candidate for your Chef Data Bag.

In practice, Chef Data Bags are saved on Chef server and available for all nodes to access. Chef also provides encrypted data bags so your passwords or access keys are secure.


Chef roles define a types of nodes in your infrastracture. They usually correspond to a service that node is running. You can use roles to group nodes. A single node can also be in multiple roles. Typical Rails application deployment infrastructure consists of the following roles:

  • Database server
  • Memcache/Redis server
  • Application server
  • Load balancer

Prepare your kitchen

Before we begin you should have git and Ruby set up on your machine.

First of all, you should start by cloning Opscode Chef repository

  git clone git://

We use a tool called Librarian-Chef . If you’re familiar with Ruby development it’s kind of  Bundler for Chef cookbooks. It downloads and manages community cookbooks that you specify in Cheffile.

Install Librarian-Chef by running (you might need to use sudo command if you use system Ruby)

  gem install librarian-chef

Then initialize Librarian in your Chef repository with

  librarian-chef init

This command will create Cheffile in your Chef repository. We are going to specify all our dependencies in that file. To deploy our example application your Cheffile should look like this:

  site ''
  cookbook 'application_ruby'
  cookbook 'apt'
  cookbook 'user'
  cookbook 'ruby_build'

Now pull these community cookbooks with Librarian

  librarian-chef install

Librarian will pull specified cookbooks and their dependencies to the cookbooks folder and create a Cheffile.lock file. You should commit both Cheffile and Cheffile.lock to your repository. You don’t need to commit cookbooks folder because you can issue install command and Librarian will pull exact same cookbook versions. You should not touch cookbooks folder and let Librarian manage it for you. Librarian will overwrite any changes you make inside that folder.

Create a folder, for example ‘my-cookbooks’, for new cookbooks.

The Application Cookbook

Now we are going to create our first cookbook. Lets start by creating a new folder called ‘example_application’ inside ‘my-cookbooks’ folder. We only need recipes folder inside it. We are going to create a really simple cookbook with just a single recipe.

Inside our new cookbook we create a recipe to deploy our Ruby on Rails application. We will use community cookbooks application and application_ruby. They define a lot of useful resources. It’s worth to take a look at source code of those two cookbooks to get a better sense of how Chef deploys applications.

There are a lot of different ways to install Ruby on your server and each of them has pros and cons.  In this example we choose to build Ruby from source, but here are some options and our opinions on their merits:

  • RVM/rbenv system-wide or user install. While these tools are really great for development environments because they allow you to manage multiple Ruby installations they are not that great on server side. Both RVM/rbenv require you to setup your shell environment in some ways to be able to manage Rubies for you. That’s simple to do on your development machine – just customize the shell you use. However on server side you run processes as different users and they can be invoked from different scripts. For example you have monit/runit or just init.d script that runs your app server that’s usually started as root, then you have tasks invoked by cron by a different user and then you probably have your deploy user running tasks like db:migrate. You need to make sure everything is RVM/rbenv aware.
  • apt/yum package – installs Ruby system-wide. Packages are usually pretty old so you cannot use latest Ruby versions. It’s possible to have multiple rubies but you can only change Ruby version system wide.
  • build from source code yourself – this option has the same problems as package installation but lets you install latest Ruby versions.

We choose to build Ruby from source code. This approach keeps things simple to set up. It saves time and avoids running into errors. We might need to switch to a different approach when we need to host multiple applications on the same server.

Put this recipe to default.rb file under recipes folder:

  # ensure the local APT package cache is up to date
  include_recipe 'apt'
  # install ruby_build tool which we will use to build ruby
  include_recipe 'ruby_build'
  ruby_build_ruby '1.9.3-p362' do
    prefix_path '/usr/local/'
    environment 'CFLAGS' => '-g -O2'
    action :install
  gem_package 'bundler' do
    version '1.2.3'
    gem_binary '/usr/local/bin/gem'
    options '--no-ri --no-rdoc'
  # we create new user that will run our application server
  user_account 'deployer' do
    create_group true
    ssh_keygen false
  # we define our application using application resource provided by application cookbook
  application 'app' do
    owner 'deployer'
    group 'deployer'
    path '/home/deployer/app'
    revision 'chef_demo'
    repository 'git://'
    rails do
      bundler true
    unicorn do
      worker_processes 2

There are various resources provided by application_ruby cookbook that we can configure for our application. The application definition is pretty much self explanatory at high level. You should dive into application and application_ruby source code if you’re interested in how it works.

Deploy to a Vagrant virtual machine

Vagrant is a wrapper for Oracle’s Virtual Box that allows you to create and manage headless virtual machines. It’s main purpose is to help you create reproducible development environments. For that purpose Vagrant lets you use various provisioning tools like Puppet, Chef Solo or Chef Client. It’s also a great way to test your Chef cookbooks. You can Install Vagrant from here.

Once you have Vagrant installed, add a base image by running “vagrant box add <name> <image url>” command. It will download and register a new base image. In this case we use Ubuntu 12.04 image named precise64.

  vagrant box add precise64

Create a new Vagrant environment by running “vagrant init <image name>”. This will create initial Vagrantfile with default environment settings.

  vagrant init precise64

Configure Chef Solo for your virtual machine like this:

  config.vm.provision :chef_solo do |chef|
    chef.cookbooks_path = ["cookbooks", "my-cookbooks"]
    chef.roles_path = "roles"
    chef.data_bags_path = "data_bags"
    chef.add_recipe "example_application"

Start your virtual machine by running

  vagrant up

When this command finishes we will have a virtual machine with our application deployed on it.

Deploy to your local machine using Chef Solo

I recommend to use Vagrant for testing your Chef recipes. However, it’s good to know how to use Chef Solo to be able to run recipes on a local machine. If you don’t have hosted/private Chef setup, you can use this method to set up remote servers too. I use Chef Solo to set up my own machine.

You need to create a Chef configuration file, let’s call it solo.rb:

root = File.absolute_path(File.dirname(__FILE__))
file_cache_path root
cookbook_path [root + '/cookbooks', root + '/my-cookbooks']
role_path root + '/roles'

You need another file with configuration JSON, let’s call it solo.json:

{ "run_list": ["recipe[example_application]"] }

Now you can run Chef with a command:

sudo chef-solo -j solo.json -c solo.rb

Deploy to a new Amazon EC2 instance

(You need to set up hosted/private Chef and knife tool for this)

We found Knife EC2 plugin to be really useful for setting up new Amazon EC2 instances. It’s a Ruby gem that you can install by running

  gem install knife-ec2

Now it’s just a matter of a single command to create and set up a new Amazon EC2 instance with our application:

  knife ec2 server create
    -A 'Your AWS Access Key ID'
    -K 'Your AWS Secret Access Key'
    -S 'The AWS SSH key id'
    -I ami-bb4f69fe # Amazon Machine Image (AMI) name
    -x ubuntu # username to login with
    -d ubuntu12.04-gems # bootstrap script for vanilla OS
    -f m1.small # EC2 instance type
    -r 'recipe[example_application]' # run list for new node

Wrap up

Congratulations! If you were following along, you now have a recipe that you can use to deploy our example Ruby on Rails application on any machine. We hope it helps you to create easily reproducible infrastructure for your applications which allows you to move to a different cloud provider or your own hardware at any time.