In 2012, Google’s famous Google X laboratory announced a breakthrough that stunned the world: after much hard work and focused attention, their artificial intelligence models had learned to correctly identify cats on YouTube. It was an example of just how far AI models had come and, having studied just a few thousand YouTube thumbnails, how quickly these models could now acquire new skills — but it also built upon decades of prior development of facial recognition technology.
The quest to correctly identify human faces, both their existence in a frame and the specific individual to which they belong, was already well underway.
A Type of Biometric Identification
Facial recognition software has historically worked basically like other forms of “biometric” identification like voice, iris, or fingerprint identification; a computer analyzes a particular photograph or other piece of biometric data and looks for a very specific set of markers within it. Comparing aspects of a face in this way is conceptually similar to comparing lines in a fingerprint, though far, far more complicated. If the program finds a critical threshold of similarity between the sample and example patterns, a match is declared — simple as that.
That worked well enough for relatively simple jobs, like figuring out where faces are within a photo, but to actually identify that face as matching another photograph of the same person? That has turned out to be far more difficult. There are, however, a number of methods that have emerged to make it easier.
Facial Recognition Software and Algorithms
One is to essentially replace the image with a version that accentuates the most relevant details to facial identification; in the case of gradients, this involves replacing every pixel with a representation of how that pixel’s brightness compares to the pixels around it. This relative measure of pixel brightness makes it much easier to recognize the same face as being the same face across multiple different lighting situations. Relative lighting attributes tend to hold true between shots, while objective lighting is much more variable — but even with this and other techniques, widely varying lighting conditions are still a point of difficulty for many modern facial recognition systems. (They also present difficulties for human judgement of faces, it should be noted.)
Another approach has to do with so-called “projection” of a 2D photo onto a 3D model, such as a cylinder. Wrapping a face around a third dimension can often reveal forms of symmetry and distinguishing characteristics that are much harder to find in a flat and static image.
Once all this preparation of the image has been completed, the system finally “encodes” the face, or collapses its most distinguishing characteristics and patterns to a smaller, simplified file that exists solely to do cross-checking with other encoded faces. Thus, when shown a photograph of Leonardo DiCaprio, this sort of system would first warp and analyze the photo in various ways to generate an encoded version, then compare that encoded face against a collection of encoded faces on file. It’s these stored faces that are the basis of comparison for finding facial matches, and it’s these stored files that can be pre-associated with information like names and addresses.
Using Deep Learning for Facial Recognition
Even with tricks like encoding, though, human software engineers have been incapable of creating sufficiently fast and accurate processes for comparing two encoded faces and determining whether they are similar enough to be deemed the same person. That’s because developers, being human beings, have no idea of how it is they process raw images into sensible visual information; it’s their brains that do that job, and the developer of their brains was evolution.
So, the field of facial recognition and identification didn’t really take off until developers stopped trying to design the perfect matching algorithm themselves, and instead embraced the then-brand new field of machine learning to evolve that algorithm all over again.
That’s because to do trial and error, you have to be not only undertake a lot of trials, but you have to be able to judge which of those trials were errors. To achieve this, we need a labeled machine learning dataset: a curated and annotated collection of examples that can be used by a machine learning system to provide trial-and-error feedback, and foster productive learning.
(Source: Microsoft Azure Face Identification Demo)
So, a facial recognition dataset might be a collection of photos of human faces — along with some photos of animal faces and face-like objects that are not faces at all. Each of the photos in the dataset will be appended with metadata that specifies the real contents of the photo, and that metadata is used to (in)validate the guesses of a learning facial recognition algorithm. Compiling the datasets to be used by a machine learning system is often far more time-consuming and expensive than actually using those datasets to train the system itself.
There are a number of different algorithms used to turn the guesses of a still-learning facial recognition program into “learned” modifications to the program itself, but the most basic principle is that the program should repeat successes and not mistakes. Correct guesses very slightly increase the likelihood that the approach that led to the correct guess will be used again in future runs, while incorrect guesses slightly decrease the same. “Deep” learning is a more elaborate approach to this system of implementing trial outcomes as processing changes, one that can find seemingly hidden, multi-step solutions.
These deep learning solutions have brought facial recognition into the 21st century. Today, advanced facial recognition technology is working its way into crucial security processes at banks, and the less-crucial ones in consumer mobile phones. When your phone unlocks because it recognized your face staring down at it, it’s using a basic approach to image analysis that was first invented via deep learning. The market now uses a mixture of local facial recognition processes that run on a device itself and remote ones that require the sort of computing horsepower that’s usually only available via the cloud.
Where is Facial Recognition Technology Today?
That explosion in facial recognition uses has sparked a real need for large and comprehensive new image and video datasets to use to train the machine learning systems to meet the incredible demand for AI products. In the dash to build these databases, some companies are starting to go with the lowest bidder, and running into issues like rushed image quality which can dramatically impact learning efficiency. Poor-quality datasets can also introduce biases to the final product; if a facial identification system is trained on racially homogenous pictures, it will end up being worse at identifying people of those races it has yet to see.
Facial recognition technology is already being used to help with searches for criminal suspects and with judgement of job interviewees; Microsoft even has an easy-to-use middleware solution AI emotion analysis, so just about anyone can work advanced sentiment analysis into their projects.
(Source: Microsoft Azure Emotion API)
The sheer power of these machine learning products is beginning to put privacy advocates on edge; Raising questions about the potentially abusive uses of technology that could passively identify any person from a grainy security camera feed. In a fairly small number of years, this technology has progressed to the point that its continued development is necessitating updates to law and urgent public conversations.
… and Where Are We Headed?
This is the case with all revolutionary new technologies: they create, or threaten to create, a collection of new and unforeseeable problems, which leads to justified cultural anxiety. From the printing press to the online store, what has kept these inventions from having the feared apocalyptic impact has always been continued development directed by a robust public conversation. Facial identification technology is seeing both, in spades.
So, the future is looking bright for a technology with more than adequate levels of both the funding and public interest. With constantly evolving abilities and limitations, there is no telling where it could be in a decade’s time, but it will be exciting seeing how it gets there.