While the National Institute of Standards and Technology has been spending a lot of time advancing the technology behind forensics, the agency can so only go so far. With all of the ways people can be identified, researchers still lack sufficient data that would allow them to further already existing technology.
To overcome that burden, NIST has been working on a catalog that would help the agency, academics and other interested parties discover data sets that will allow researchers to further their work. The Biometric and Forensic Research Database Catalog aims to be a one-stop shop for those looking to gather enough data or find better quality data for their projects.
A representative from NIST presented the online catalog at a forensics symposium at the agency’s headquarters Monday, elaborating on how it is still trying to find data to include as the website grows.
NIST, with the help of the National Institute of Justice, is in the process of examining publicly available data sets to be included in forthcoming versions of the catalog. Shannan Williams, a project manager with NIST’s Forensic Science Research Program, said the agency goes through a three-phase system before including data: collecting every data point it can, categorizing that data based on taxonomy and evaluating what can be added to the catalog.
As of January, the catalog has 221 different categories, allowing users to search by modality (face, iris, palm print), data type (image, audio, video), capture method (mobile, fixed) or other miscellaneous characteristics (subject type, demographic, post mortem).
While the catalog only lists 165 separate databases, forensic experts said this data is crucial given how hard it is to find quality data.
Anil Jain, a computer scientist at Michigan State University, said he often has trouble finding sound data because different forensic measures have different levels of complexity. On top of that, he often has to reach out to state law enforcement, which can be problematic.
“[Police] are extremely busy with their caseload, and they don’t have full-time system managers to handle data,” Jain said.
Elham Tabassi, an electronics engineer at NIST, said no matter what type of data is used in forensics research — real forensic collections, known as operational data or synthetic data generated by software — each has its own limitations.
Tabassi said operational data is often low quality, isn’t very diverse and often cannot promote reproducible research. While synthetic data is traceable, can be re-used and doesn’t present privacy issues, it is often hard to use for real-world research because of the complexities that come with real-world forensics.
These problems are what drives the agency to continue developing the site, knowing the power a rich set of data can bring to those looking to advance what forensics can be capable of doing.
“We are using all the information we can collect,” she said.