Digital Library Internship: Metadata Related Article

Guenther, Rebecca and Leslie Myrick. "Archiving Web Sites for Preservation and Access: MODS, METS and MINERVA." Journal of Archival Organization 4, No. 1/2: 145 - 170.

Guenther and Myrick's article begins by stating, "Born-digital material such as archived Web site provides unique challenges in ensuring access and preservation." What follows in the article is the many questions, challenges, and developments that have come out of pursing the creation of a Web Archive.

Before delving into the various questions relating to the capturing and storing Web-based material, we are given a brief background of the issues, including the idea of web material as a "moving target," and the forty-four day average life span of a Web page. Some questions that are mentioned include: what is the best can to capture and maintain Web-based material for preservation and access? How to collect Web sites before they disappear or change? How do we define the data? How do we manage the data?

After a discussion of the questions and issues, the authors provide an overview of the Web Archiving evolutionary process. The first, very basic, Web Archiving began with crawlers and agents like those used by Yahoo. These evolved into the indexing of Web sites by Google, though the first archival Web crawler came from the National Deposit Libraries. From there the progression to current Web Archives, such as MINERVA, began.

The article moves on to discuss the technical and harvesting issues associated with Web capture. This includes a discussion of large and small scale crawlers and there functions, how crawlers collect their data, how a web site mirror functions, and what is included in the data collection.

At the start of the article, the authors mentions a number of current archiving projects, one of which is the Library of Congress's MINERVA. The remainder of the article is dedicated to discussing the development of the MINERVA project. MINERVA was started to "collect and preserve 'born-digital' materials, specifically open-access primary source material on the World Wide Web."

Guenther and Myrick state, "We argue that, among the proliferation of the schemas available for packaging and managing complex digital objects ... METS is uniquely suited to encapsulate a Web site object." They use MINERVA to demonstrate this argument. We are shown a progression of the MINERVA project through its use of MARC, MODS, and eventually the addition of METS. The author's describe METS as both strong and flexible, and describe how two separate catalogers would most likely create two very different METS instances, given the same simple item. The use of METS as a transfer syntax is also discussed at length.

The remainder of the article is a very technical based discussion of METS, including creating models and templates, its uses in XML, and the development of a METS profile. Due to the technical nature of this section, I found it a bit more difficult to follow. There was however, a very interesting discussion of challenges presented by archiving Web sites: Knowing where web site boundaries lie, the complex nature of the site itself, what of the structure should be recorded, and lack of control over creation. We are also provided with a brief introduction to PREMIS (preservation metadata: implementation strategies).

The article concludes, not with answers, but the very same questions it began with.

Digital Library Internship

Metadata Related Article

No comments:

Useful Links

Total Hours:

Labels

Archive