Virginia Tech leading project to make web archives more valuable to researchers

The Institute of Museum and Library Services recently awarded a $248,451 grant for a collaborative two-year project, Continuing Education to Advance Web Archiving, that will create materials to teach librarians and archivists across the world how to collect, extract, and analyze archived information from the world wide web.

Zhiwu Xie, director of digital library development for the University Libraries at Virginia Tech, is leading the team of library and archive experts to create a curriculum surrounding the technology of web archiving and challenges related to how archivists and librarians can gather the most useful information from archived internet sites and social media.

“The web is the most prominent channel of communication we have today, and web sites change all the time. The web doesn’t have a memory, so a history of time is hard to construct,” said Xie. “Web archiving is about recording that memory.”

Project team member and Virginia Tech professor of computer science Ed Fox believes in providing individuals and libraries the tools to better access and analyze the massive amount of information already archived.

“I view information as a fundamental need of humans,” said Fox, who also serves as the director for the Digital Library Research Laboratory. “The most visible information is what’s available over the World Wide Web, and over time, in its archive. This information is invaluable for researchers studying areas such as trends in elections, technology, and the environment.”

More than tens of petabytes of web content have been collected and archived by memory institutions. All of the project collaborators, including Xie, Fox, Martin Klein from Los Alamos National Laboratory, Michael Nelson from Old Dominion University, Justin Littman from George Washington University, Ian Milligan from University of Waterloo, and Jefferson Bailey from the nonprofit archiving organization Internet Archive, are pioneers in web archiving technology and infrastructure.

“Collectively, we have done a lot of work in creating tools for web archiving; we want to put our work to use and make an impact on society,” said Xie.

“By creating training materials for some of the most innovative and complex tools used in web archiving, it can help lower barriers for institutions wanting to run these technologies locally, either for collecting, or especially, for researcher and user support,” said Bailey, who serves as director of web archiving.

“Suites of open source tools are available to assist researchers conducting analyses and extracting knowledge,” said Xie. “However, these tools require the user to be proficient in big-data processing and analysis. Very few librarians or archivists have been trained to understand, utilize, maintain, and manage these tools.”

By the end of the project, the collaborators will provide a collection of educational resources, a series of in-person and online training workshops, and cyberinfrastructure for deploying tools to support the curriculum and workshops — including source code.

“The curriculum will include project-based learning because people learn better by doing,” said Fox. “During the training, participants will solve problems like they would face while helping patrons. The curriculum will be need-oriented as opposed to system or technology oriented. All of the training and tools will be free to the user.”

“By educating more people and organizations on the technologies of web archiving, the project can contribute to allowing more organizations to build collections of web-published materials,” said Bailey. “This benefits society by ensuring a greater portion of web-published historical documentation is preserved and accessible into the future.”

“Equipped with these skills, library and archive professionals will be able to go beyond their traditional role as information providers or pointers and form deeper alliances with researchers,” said Xie. “This will continue to transform libraries and archives from information repositories to knowledge producers.”

Books from AFP

2018-19 UVA Basketball Preview: Just $1.99 on Amazon!

UVA Basketball finished the 2017-18 season ranked at the top of the national polls. Augusta Free Press editor Chris Graham offers his insight and analysis on the 2018-19 'Hoos, breaking down the roster, the legacy of coach Tony Bennett, and how the loss to UMBC could fuel a run through March Madness next spring.

The Worst Wrestling Pay-Per-View Ever: Just $3.49 on Amazon!

Chris Graham offers a glimpse behind the curtain of the pro wrestling business in his new book, The Worst Wrestling Pay-Per-View Ever, the inside story of the 2011 Night of Legends, a live pay-per-view event featuring stars including WWE Hall of Famers Kevin Nash, "Hacksaw" Jim Duggan and The Rock 'n Roll Express that was met with almost universally negative reviews.

Mad About U: History of University Hall available on Amazon for just $5.99!

Mad About U: Four Decades of at University Hall is a comprehensive book covering the players, coaches and memories of University Hall at the University of Virginia. Join us as we look back at the memories from more than 40 years in U Hall.

News From Around the Web

Shop Google