Researcher releases facial recognition software to identify Civil War soldiers
Kurt Luther, Virginia Tech assistant professor of computer science, has developed a free software platform that uses crowdsourcing to significantly increase the ability of algorithms to identify faces in photos.
Through the software platform, called Photo Sleuth, Luther seeks to uncover the mysteries of the nearly 4 million photographs of Civil War-era images that may exist in the historical record.
Luther will present his research surrounding the Photo Sleuth platform on March 19 at the Association for Computing Machinery’s Intelligent User Interfaces conference in Los Angeles, California. He will also demonstrate Photo Sleuth at the grand opening of the expanded American Civil War Museum, in Richmond, Virginia, on May 4, 2019.
Luther, a history buff himself, was inspired to develop the software for Civil War Photo Sleuth in 2013 while visiting the Heinz History Center’s exhibit called “Pennsylvania’s Civil War” in Pittsburgh, Pennsylvania. There he stumbled upon a Civil War-era portrait of Oliver Croxton, his great-great-great uncle who served in Company E of the 134th Pennsylvania, clad in a corporal’s uniform.
“Seeing my distant relative staring back at me was like traveling through time,” said Luther. “Historical photos can tell us a lot about not only our own familial history but also inform the historical record of the time more broadly than just reading about the event in a history book.”
The Civil War Photo Sleuth project, funded primarily by the National Science Foundation, was officially launched as a web-based platform at the National Archives in Washington, D.C., on Aug. 1, 2018, and allows users to upload photos, tag them with visual cues, and connect them to profiles of Civil War soldiers with detailed records of military history. Photo Sleuth’s initial reference database contained more than 15,000 identified Civil War soldier portraits from public domain sources like the U.S. Military History Institute and other private collections.
Prior to the project’s official launch in August, the software platform won the $25,000 Microsoft Cloud AI Research Challenge and the Best Demo Award at the Human Computation and Crowdsourcing 2018 conference in Zurich, Switzerland, for Luther and his team, which includes academic and historical collaborators, the Virginia Center for Civil War Studies, and Military Images magazine.
According to Luther, the key to the site’s post-launch success has been the ability to build a strong user community. More than 600 users contributed more than 2,000 Civil War photos to the website in the first month after the launch, and roughly half of those photos were unidentified. Over 100 of these unknown photos were linked to specific soldiers, and an expert analysis found that over 85 percent of these proposed identifications were probably or definitely correct. Presently, the database has grown to over 4,000 registered users and more than 8,000 photos.
“Typically, crowdsourced research such as this is challenging for novices if users don’t have specific knowledge of the subject area,” said Luther. “The step-by-step process of tagging visual clues and applying search filters linked to military service records makes this detective work more accessible, even for those that may not have a deeper knowledge of Civil War military history.”
Person identification tasks can be challenging in larger candidate pools because there is a larger risk for false positives. The novel approach behind Civil War Photo Sleuth is based on the analogy of finding a needle in a haystack. The data pipeline has three haystack-related components: building the haystack, narrowing down the haystack, and finding the needle in the haystack. When combined, they allow users to identify unknown soldiers while reducing the risk of false positives.
Building the haystack is done by incentivizing users to upload scanned images of the fronts and backs of Civil War photos. Any time a user uploads a photo to identify it, the photo gets added to the site’s digital archive or “haystack,” making it available for future searches.
Following upload, the user tags metadata related to the photograph such as photo format or inscriptions, as well as visual clues, such as coat color, chevrons, shoulder straps, collar insignia, and hat insignia. These tags are linked to search filters to prioritize the most likely matches. For example, a soldier tagged with the “hunting horn” hat insignia would suggest potential matches who served in the infantry, while hiding results from the cavalry or artillery. Next, the site uses state-of-the-art face recognition technology to eliminate very different-looking faces and sort the remaining ones by similarity. Both the tagging and face recognition steps narrow down the haystack.
Finally, users find the needle in the haystack by exploring the highest-probability matches in more detail. A comparison tool with pan and zoom controls helps users carefully inspect a possible match and, if they decide it’s a match, link the previously unknown photo to its new identity and biographical details.
The military records used by the filters come from myriad public sources, including the National Park Service Soldiers and Sailors Database.
Retracing historical Civil War photos through facial recognition software like Photo Sleuth has broad applications beyond identifying historical photos, too. The software has the potential to generate new ways to think about building person identification systems that look beyond face recognition and leverage the complementary strengths of both human and artificial intelligence.