Digital Library Internship

Day 23 - Last Day

Friday, August 3rd 9-4pm

Today is the final day of my internship. Most of my day will be spent doing wrap-up things like finishing my last few record conversions, making sure the authority file is accurate, and drafting a couple of final templates. I also need to make sure that everything is available on Bertha and that I have updated the project documentation file to accurately represent the work I have done.

This afternoon I am meeting with Elaine and Nancy to run the creation of a record. To save a bit of time I have created the record ahead of time. Reading through the entire record for subject headings is the most time consuming. Also both Elaine and Nancy are much more experienced at subject heading creation than I am so they really don't need me to go through that part.

I expect that all of these activities will take up most of the day. If I have any time remaining I will work on more records, but I doubt I will have the time.

Day 22

Wednesday, August 1st 9:30-5:30pm

I am still working on converting the metadata records so that they will be compatible for upload to CONTENTdm. I hope to finish most of the records today so that on Friday I can concentrate on other things. Since Friday is my last day I hope to have everything in order and explained to that others can easily follow what I have been doing. I would also like to have final (or almost final) drafts for the CONENTdm metadata templates for tables and images set up before I leave.

Accomplishment

I am almost finished converting the existing records into the CONTENTdm format. I should easily finish up Friday morning. I have also finished the IDeA/CONTENTdm templates (all in one excel file) for both the issue and article level. I should have the same done for illustrations and tables/statistics by the time I leave on Friday. The authority file is also up to date. I will make sure that all of this is on Bertha and ready to go by the end of the day on Friday.

I also have a practice issue-level record to go through with Elaine and Nancy Friday
afternoon. I plan to go through the process of creating both the IDeA and CONTENTdm records, though they are very similar.

Day 21

Monday, July 30th 9-5pm

I again will be spending the day converting our metadata from the IDeA template to the new CONTENTdm template. It is a rather slow process due to the horizontal nature of the template. I will also be, once again, going back through already created records and doing some updating. It has been decided that the title, at the issue level, should include vol and no (instead of just Monthly Bulletin).

Metadata Related Article

Milstead, J., & Feldman, S. (1999). Metadata projects and standards. Online (January 1999).

This reading was more or less an overview of many of the existing metadata standards. Though I was aware that different standards existed, this article gave me a whole new perspective on just how many standards are in use. Standards covered included: ISO, ANSI, PICS, RDF, DC, ROADS, CSDGM, and more.

A considerable amount of space was given to the description of Dublin Core. This included sections on its simplicity, the 15 elements, and changes.

There was also discussion of several projects, such as work on the Content Standard for Digital Geospatial Metadata (CSDGM), which I have read about previously and it very interesting.

Other topics addressed were address changes which they described as the "now you see it now you don't phenomenon" where web addresses randomly and constantly change making it difficult to revisit sites. This also included a discussion of the development of the Handle system, which gave sites unique IDs.

Digital Cataloging Related Article

Levy, D. M. (1995). Cataloguing in the digital order: Paper regarding the future of cataloguing, from the Digital Libraries 95 conference.

This article is an overview of digital age cataloging. It begins with an introduction to the general idea of cataloging. This includes various definitions along with the contents of a typical cataloging record (title, author, publisher, etc.). Also included are descriptions of both descriptive and subject cataloging along with the distinction between cataloging and bibliographies.

The next section of the article, entitled "Cataloging as order-making" describes just that. A discussion of how materials are maintained and, most importantly, made available. It also includes a more in-depth discussion of what cataloging can truly involve, such as discussions of standards.

The remainder of the article delves into how cataloging is being affected and changed by the new digital word. Since digital collections, like any other, need to be organized, maintained, and made available, it logically follows that some type of cataloging will be required. The little space is given to how these new collections will be cataloged, the article does pose important questions such as, what standards will be used, what type of training will be needed for this cataloging, and will the final digital catalog be universal.

Day 20

Friday, July 27th 10-5pm

I will be spending all of my time today taking they records that have been created for IDeA and converting them for CONTENTdm. This will also give me a chance to do a bit of quality control on the articles I have already created.

Day 19

Wednesday, July 25th 11-8:30

Today I am continue to work on the new metadata template for CONTENTdm. I sent out a rough draft Monday afternoon and received a few questions/comments/suggestions. So far today I have made the changes and read through my Indiana Memory metadata information to answer a few questions. I then sent out the updated template and the answers I found. I am currently working on finding a logical order for the elements to appear within CONTENTdm. The most import are the first three, since they will be displayed in the simple record.

Metadata Related Reading

Since the Monthly Bulletin project has gotten funding from Indiana Memory, the digital project will now be housed within Indiana Memory. Like Idea, Indiana Memory uses Dublin Core, though, because they are within CONTENTdm, they have their own specific set of standards. The paper I will summarize briefly below is the Dublin Core Metadata Guide: Indiana Memory Project.

The paper began with a brief introduction to the ideas of required, recommended, and options elements. It also included links to other useful sites. Following this introduction the guide jumped right into the definitions of the required elements.

The required elements were: title, subject, item type, technical metadata, item ID, usage statement, data.original, and date.digital. The section on each of the elements included DC mapping, definition, comments, notes on cataloging, and examples. Some element definitions contained other information, such as subject, which contained a list of recommended thesauri and links for thesauri.

The definitions for recommened and optional elements contained the same information. Recommended elements were: creator, publisher, description and language. Optional elements included: ordering information, transcript, and local item ID.

Following the section on elements the paper also included and FAQ section. This covered a wide variety of topics ranging from What is Indiana Memory? to What are some other (not DC) descriptive metadata schemas? Though useful and interesting information could be found in the FAQ section, most of it was either information I already had, or information that did not relate to our specific project.

Day 18

Monday, July 23rd 9-5pm

My main goal today is to go over the information from the CONTENTdm workshop. Specifically I want to go over the Dublin Core metadata standards within CONTENTdm. From what I learned at the seminar, I believe that we create our own labels for the metadata (e.g. Table Of Contents) and then map them to the appropriate DC field (e.g. Description). I am hoping to find more specific information with the extra literature provided.

Accomplishments:
I managed to create a rough draft for our new metadata template which will be compatible with CONTENTdm. It will still need to be arranged into a correct order.

Day 17

Wedesday, July 18th 8:30-4pm

CONTENTdm training seminar. It was very interesting and informative, especially since I have not experience with CONTENTdm. I will have to make some adjustments to the way I am organizing the records. I plan to discuss this more after I have had a chance to go back over the material.

Useful Links From Seminar
metast.pdf
imgst.pdf

Day 16

Friday, July 13th 11-5pm

Today is another afternoon of working on records. I started the afternoon by reviewing a couple I did earlier this week. Upon this inspection I found a reoccurring error that I had made in the contributor field. I have spent my first hour and a half going back and checking for and correcting the error in past records. I think I am going to spend a bit of time every week checking for errors. The more error-free my records are the easier it will be for anyone following my work.

Day 15

Wednesday, July 11th 11-5pm

Today is yet another day of working on creating records. As I speculated in my last entry, authored articles within the issues are becoming more frequent, causing each individual issue to take more time.

Day 14

Monday, July 9th 9-1pm

Today I am simply continuing to work on creating records. Changes have been made to the necessary pdf files so that I can complete the records for issue 7. Then I will continue on with the rest of the volume 4 records. The volumes have been taken longer since long articles with authors are appearing more frequently, requiring multiple records for issues.

Day 13

Tuesday, July 3rd 9-2pm 3-8pm

There are several things I plan to accomplish today. The first is to spend some time reading articles related to Metadata/Digital Libraries/Digital Publications. I have found several interesting and pertinent articles. Summaries of these articles will be included in later journal entries.

I also plan to take some time to look at the pdf files, which have had OCR run, to see if the statistical charts can be easily searched. I am hoping to determine how many subject headings will be needed if full text searching is not an option.

I also plan to create records for the authored articles within the issues I have already created records for. The most difficult part of these article records is deciding on subject headings. As stated in previous entries, I plan to use PHIN headings, but I have also be cross-referencing them with MESH since the MESH database has more clear definitions of the subject headings and their position in the hierarchy.

Day 12

Wednesday, June 27th 8:30-5pm

I have updated all of the required subject terms for the completed records and have uploaded these new records to Bertha. I am continuing to build our authority file as I use more subject terms.

I spent some time on Monday working with MESH and trying to understand it. So far I have not had to use MESH terms, but I think that may change as we get into later volumes where I will have to extract the individual articles.

Today again I am working on creating records. I have finished up volume 3 and will be starting on volume 4 today. While creating these records I plan to check both PHIN and MESH for subject terms for comparison sake. I may also spend some time to find articles for next week, since I only have two at the moment.

Day 11

Monday, June 25th 10-8pm

Today's work is mostly surrounding subject headings and authorities. I am meeting with Nancy and Elaine this morning so that I can ask questions and get feedback on the subject headings i have been using thus far. We are also going to do a run through of working with MESH subject headings, since I have not experience with them at all.

My guess is that the meeting and MESH tutorial will take up the entire morning. I then plan to spend the afternoon revising subject headings as needed within the records I have already created. I am also going to take this opportunity to review that records and check for any errors I made. Once I have completed the revising and review I will continue on with more records.

Another issue I may run into this afternoon is that fact that I have almost caught up to where we are in pdf file creation process. Though i am still a couple volumes behind the scanning. It is very difficult to create records without the pages of the issue being together in a complete pdf file. Opening one page at a time on the Dell laptop is difficult and having multiple pages open at one time is almost impossible.

All Issue Records Will contain the following PHIN subject terms

Indiana
Descriptive Statistics
Age-specific death rate
Cause-specific mortality rate
Infant mortality rate
Homicide
Suicide

Day 10

Wednesday, June 20th 9am-6pm

Today, again, I will continue to create metadata records. I hope to at some point sit down with Elaine and Nancy to discuss my use of subject headings and see if there are any corrections they would like me to make.

Thus far, there has only been one article for which I have needed to create a separate metadata record. I would guess that, as I am now into volume 3, that more of these articles will occur. I would also expect to begin seeing more pictures in the issues.

I may also take some time today to find a few articles on CONTENTdm, medical digital libraries, statistical tables in digital libraries, or perhaps just recent publications on digital libraries or institutional repositories.'

Questions:

When is an article (with author) long enought to warrant own record?

There are fewer and fewer bolded article headings, should longer non-bolded article headings be included in the table of contents?

Subject headings? Still having trouble deciding on what to use. Many issues only have "Indiana" and "Descriptive Statistics" since articles are so varried on topic. Should I try to find more to include?

Subject headings for individual article records? How many is enough? Too many?

Having trouble finding appropriate subject headings in PHIN to match article topics. Suggestions?

Day 9

Monday, June 18th 9am-6pm

The main plan for today is to work on completing the list for geographic authorities. Once that is set, I plan to go back into the records and make changes as needed to the completed records. I am also hoping to run the subject headings I have used so far, past Elaine and Nancy to make sure we are all on the same page.

Other than that, I plan to spend the rest of my time working on creating records.

Subject Headings:

We will be using PHIN geographic subject headings for the records. These headings come with both a descriptive name (e.g. Indiana) and a numeric name (e.g. A0009896). I am still in the process of figuring out if the numbers are going to be used in addition to the descriptive name and if so, how this is going to be displayed within the record. I am also still having a difficult time deciding what subject headings to include on the issue level. I have Indiana and Descriptive Statistics for each issue. I have been adding other subject headings when more than one article on a specific topic appears or topic takes up more than one page of the issue. I am having trouble finding the happy medium between overly simplistic and including an excessive number of sujects.
Subject Heading Decision: It has been decided that we will use only the descriptive name in the records. We will, however, include the numeric name in the authority record. Records have been updated to reflect this new standard.

Copyright Statement: The copyright statement to be included in the records is as follows:

"Although these works may be freely accessible on the World Wide Web and may not include any statement about copyright, the U.S. Copyright Act nevertheless provides that such works are protected by copyright. Users must assume that works are protected by copyright until they learn otherwise."

Metadata Related Article

David Mimno, Alison Jones, and Gregory Crane. "Finding a Catalog: Generating Analytical Catalog Records from Well Structured Digital Texts." Proceedings of the 2005 Joint Conference on Digital Libraries, Denver, CO, June 7-11, 2005.

On the whole, I found Mimno's article fascinating. The idea of automatic metadata generation intrigues me great, though I will admit to still being rather skeptical, even after reading this article.

One section that I found particularly interesting was when they were discussing the use of statistics and probability in the metadata generation. They use the example of Washington within the Civil War-Era documents and discuss how they use the placement, frequency, and words appearing before and after Washington to determine if it is the place (and if so, which Washington) or the person.

I thought that the article was presented very well. It was very understandable while discussing the various steps in the extraction of different types of information. I also found it helpful that they explained exactly which MODS fields the metadata would be entered into (e.g. the title area of the XML into titleInfo).

I also found their reasoning for choosing MODS was very sound. It works well within the constraints of traditional cataloging and mirrors tradition context more easily than Dublin Core, for example (an issue I have been getting to know well at my internship this summer). They also discussed the fact that MODS being in an XML format was also important. This makes MODS compatible will many different kinds of software for "editing, searching and formatting metadata records."

Metadata Related Article

Missingham, R. (2004). Reengineering a national resource discovery service: MODS down under. D-Lib Magazine, 10(9).

I found this article very interesting. The idea of a national bibliographic database, such as the one being constructed in Australia, seems to be an excellent and logical idea. The service provided by Kinetica, allowing any Australian library to contribute to the national online catalog, seems to be a logical progression in digital library resources.

The goals of digitally archiving publications and resource discovery through Kinetica are also important goals discussed in the article. It makes sense, while creating a catalog to provide accessibility to records from many libraries, to also preserve items digitally.

I also found the discussion of the rationalization for MODS as an intermediary for Dublin Core records and MARC records to be interesting. The reasoning, I thought, made complete sense. Dublin Core to MODS is a rather basic conversion. MODS is specifically designed to be compatible with MARC. Therefore, as stated in the article, it is the clear choice for an intermediary.

I do have one question, that is not directly related to the discussions in this article. How much of the world uses Library of Congress standards and formats such as MARC and AACR2? I notices several references to the Library of Congress in this article, which surprised me, since it was dealing with an Australian project.

Metadata Related Article

IN Harmony project documentation: Metadata creation guidelines

The readings this week surrounded the IN Harmony, Sheet Music from Indiana, project documentation. We were provided with the IN Harmony cataloging guidelines, which I plan to discuss here, as well as the accompanying information of fields summary and an XML template.

The guidelines walk through the elements for the sheet music records detailing requirements, whether or not an element may be repeated, authority control, and a description of what should be included in that element.

The only required element for each record is the Title, which is meant to be the title proper. Unlike some other elements this does not have an official authority control, though the description states that AACR2 can be used.

Elements that require authority control include Uniform Title, Title of Larger Work, Series Title, Composer, Arranger, Performer, Topical Subject, and Genre. The authority control is stated as "Provide list of previously used values; can add to list" which is addressed in more detail in each description section for the element.

One element with very specific authority control is that of Date. In to make the date machine readable, it is required to be in the format of YYYY-MM-DD and should include all known digits.

It was interesting going through these guidelines after reviewing the Aquifer MODS guidelines. Though both require a title, the IN Harmony guidelines have no other required elements. The IN Harmony guidelines are more of a descriptive template, whereas the Aquifer MODS guidelines, are very specific. I think that the slightly more open guidelines are preferable because the give the institution more flexibility to adapt them to the institutional needs.

Metadata Related Article

Guenther, Rebecca and Leslie Myrick. "Archiving Web Sites for Preservation and Access: MODS, METS and MINERVA." Journal of Archival Organization 4, No. 1/2: 145 - 170.

Guenther and Myrick's article begins by stating, "Born-digital material such as archived Web site provides unique challenges in ensuring access and preservation." What follows in the article is the many questions, challenges, and developments that have come out of pursing the creation of a Web Archive.

Before delving into the various questions relating to the capturing and storing Web-based material, we are given a brief background of the issues, including the idea of web material as a "moving target," and the forty-four day average life span of a Web page. Some questions that are mentioned include: what is the best can to capture and maintain Web-based material for preservation and access? How to collect Web sites before they disappear or change? How do we define the data? How do we manage the data?

After a discussion of the questions and issues, the authors provide an overview of the Web Archiving evolutionary process. The first, very basic, Web Archiving began with crawlers and agents like those used by Yahoo. These evolved into the indexing of Web sites by Google, though the first archival Web crawler came from the National Deposit Libraries. From there the progression to current Web Archives, such as MINERVA, began.

The article moves on to discuss the technical and harvesting issues associated with Web capture. This includes a discussion of large and small scale crawlers and there functions, how crawlers collect their data, how a web site mirror functions, and what is included in the data collection.

At the start of the article, the authors mentions a number of current archiving projects, one of which is the Library of Congress's MINERVA. The remainder of the article is dedicated to discussing the development of the MINERVA project. MINERVA was started to "collect and preserve 'born-digital' materials, specifically open-access primary source material on the World Wide Web."

Guenther and Myrick state, "We argue that, among the proliferation of the schemas available for packaging and managing complex digital objects ... METS is uniquely suited to encapsulate a Web site object." They use MINERVA to demonstrate this argument. We are shown a progression of the MINERVA project through its use of MARC, MODS, and eventually the addition of METS. The author's describe METS as both strong and flexible, and describe how two separate catalogers would most likely create two very different METS instances, given the same simple item. The use of METS as a transfer syntax is also discussed at length.

The remainder of the article is a very technical based discussion of METS, including creating models and templates, its uses in XML, and the development of a METS profile. Due to the technical nature of this section, I found it a bit more difficult to follow. There was however, a very interesting discussion of challenges presented by archiving Web sites: Knowing where web site boundaries lie, the complex nature of the site itself, what of the structure should be recorded, and lack of control over creation. We are also provided with a brief introduction to PREMIS (preservation metadata: implementation strategies).

The article concludes, not with answers, but the very same questions it began with.

Metadata Related Article

Guenther, R. S. (2004). "Using the Metadata Object Description Schema (MODS) for resource description: guidelines and applications." Library Hi Tech 22(1): 89-98.

According the Guenther's article, Using the metadata object description schema (MODS) for resource description, MODS is a reaction to the need for metadata which is richer than Dublin Core but more simplified and user friendly than MARC. This article focuses mainly on the user guidelines and applications, specifically the MINERVA project to collect and preserve materials from the Web.

The discussion of the user guidelines included a great deal of information about the relationship between MODS and MARC21. This specifically included sections on conversion (information that may be lost, etc.), elements that exist in MODS but not MARC, and fields that map from MARC to MODS. It also included information on the guidelines relating to aspects of MODS records such as punctuation, description of elements, notes, identifiers, etc.

Guenther's article included some examples of situations where MODS is being used. The majority of the examples involved some type of digital project, such as the LD Digital Audio-visual Preservation Prototyping Project, which is intuitive since one of the strengths of MODS is its ability to describe digital works.

The information on the MINERVA project was also very interesting. This section did a brief walk though of the project, enplaning the metadata which was to be created and where that information was taken from. This helped to give a picture of using MODS that was more concrete and less abstract than other examples and descriptions I have encountered thus far.

Metadata Related Article

McCallum, Sally H. (2004). "An introduction to the Metadata Object Description Schema (MODS)." Library Hi Tech 22(1): 82-88.

This article was a very useful introduction to MODS for someone, like myself, with little to no experience with Metadata Object Description Schema. The relationship between MODS, MARC21, and the use of XML was clearly laid out in the introduction. McCallum's article gave an excellent introduction to the XML environment, including the brief history and current uses for XML. I found this background information particularly useful in giving me a handle on understanding XML.

Prior to reading McCallum's article (and starting the workshop) I did not clearly understand the close link between MODS and MARC21. There was very clear explanation of the way MARC21 can, through steps, be converted into MODS.

I also found the presentation of the MODS features very useful, though slightly more difficult to follow in parts. The information covered the user-friendly tags used by MODS, which are actual English language words, as well as the grouping to the elements. Also included was the discussion of whether or not attributes should be used in MODS. The decision was to define attributes in places where they were deemed useful. They listed two cases in which elements must be used for the information instead of element; if information is repeatable or structured.

Overall, I found McCallum's article to be a good introduction to MODS. Though parts were a bit too technical for a very beginner, the majority of the article was clear and well put together.

Day 8

Thursday, June 7th 9am-8pm

Today I spent my first hour or so getting organized and continuing to create item records based on the template I had created. Following this, I met with Nancy Eckerman and Elaine Skopelja to discuss subject headings, geographical headings and name authorities.

Elaine provided me with an Excel file containing PHIN subject headings (medical terms, chemicals, diseases, etc.) to be used in the item records. As for names we are using name authority records when possible. However since most individuals will not have name authority files, our local standard will be to use the names as they appear in the bulletins. We are also having difficulties with geographical headings. The records will need to include one or more of the following: state, region, county, township, or city. The LC standard is more complex than we would like to use for web searching purposes and is also not entirely consistent. Nancy is going to continue looking into this issue, though we may be creating our own local authorities.

My job for the moment is to insert appropriate subject headings into the records I have already created. Along with that I will be creating our local authority file. Next week I will be meeting with Nancy and Elaine again to discuss what subjects I have used and why. At that time we are hoping to have a definitive plan for geographical headings and get a more concrete idea of the number of subject headings that should be included in an item record.

I also spent some time at the end of the day researching CONTENTdm, which I am not very farmiliar with. I began at http://www.oclc.org/contentdm/ and expanded my search from there.

Digital Library Internship

Day 23 - Last Day

Day 22

Day 21

Metadata Related Article

Digital Cataloging Related Article

Day 20

Day 19

Metadata Related Reading

Day 18

Day 17

Day 16

Day 15

Day 14

Day 13

Day 12

Day 11

Day 10

Day 9

Metadata Related Article

Metadata Related Article

Metadata Related Article

Metadata Related Article

Metadata Related Article

Metadata Related Article

Day 8

Useful Links

Total Hours:

Labels

Archive