SALT LAKE CITY — While facial recognition software has advanced dramatically over the last decade, a new report highlights some areas where the new technology has still not been perfected.

Chief among the findings is evidence that most, but not all, of the technology developed in the U.S. has the highest error rate when assessing the faces of Asians, African Americans and members of native groups.

Related
Pandora’s box? Lawmakers scramble to keep facial recognition tech under control

The National Institute of Standards and Technology is a science-focused division of the U.S. Department of Commerce that specializes in advancing measurement science, standards and technology to promote domestic innovation and industrial competitiveness.

A just-released report by the agency assessed the facial recognition technology of 189 mostly commercial algorithms from 99 developers, a group that represents most of the current industry.

Patrick Grother, a National Institute of Standards and Technology computer scientist and the report’s primary author, said the impetus behind the study was to help illuminate the relative limits of the technology for lawmakers, tech developers and those who may be considering the implementation of facial recognition tools.

“While it is usually incorrect to make statements across algorithms, we found empirical evidence for the existence of demographic differentials in the majority of the face recognition algorithms we studied,” Grother said in statement. “While we do not explore what might cause these differentials, this data will be valuable to policymakers, developers and end users in thinking about the limitations and appropriate use of these algorithms.”

Unlike a pair of widely referenced reports from Georgetown University and the Massachusetts Institute of Technology, the institute’s assessment looked specifically at the performance of algorithms — the “brains” behind facial recognition systems — while also tabulating both false positive and false negative results. The analysis also measured the relative performance of the algorithms in making one-to-one matches and one-to-many operations, where a system looks for a match for a single person among a large grouping of images.

The report details that a false positive means the software wrongly considered photos of two different individuals to show the same person, while a false negative means the software failed to match two photos that, in fact, do show the same person.

Making these distinctions, according to the institute’s authors, is important because the class of error and the search type can carry vastly different consequences depending on the real-world application. Being falsely identified in, say, a law enforcement scan for a homicide suspect, has much more serious potential consequences than a failed biometric security lockout.

“In a one-to-one search, a false negative might be merely an inconvenience — you can’t get into your phone, but the issue can usually be remediated by a second attempt,” Grother said. “But a false positive in a one-to-many search puts an incorrect match on a list of candidates that warrant further scrutiny.”

Report findings also support growing evidence that the most accurate, and least biased facial recognition systems use diverse data sets to “teach” algorithms how to recognize and match digital maps of faces.

Last year, the National Institute of Standards and Technology reported that the accuracy of facial recognition systems had increased some 20 fold between 2014 and 2018, thanks in large part to evolving techniques to improve the way the software analyzes visual images, like photographs of faces.

So-called convolutional neural networks are utilized in the best performing facial recognition systems and were designed in to mimic how biological image processing happens via the visual cortex.

View Comments

The institute’s researchers noted in the latest report that some error biases seemed to be related to the origin of the algorithm’s development.

While systems created by U.S.-based developers showed, overall, a penchant for high false positives when assessing faces of Asians versus faces of Caucasians, systems developed in Asian countries reflected no such bias.

Grother said that while the institute’s study does not explore the relationship between cause and effect, one possible connection, and area for research, is the relationship between an algorithm’s performance and the data used to train it.

“These results are an encouraging sign that more diverse training data may produce more equitable outcomes, should it be possible for developers to use such data,” he said.

Looking for comments?
Find comments in their new home! Click the buttons at the top or within the article to view them — or use the button below for quick access.