Sunday, June 24, 2012

Digital Preservation Interest Group @ #ala12

I probably won't post notes from every session I went to today, but I definitely will from the Digital Preservation Interest Group because it was the most relevant and interesting for me and my work. (We who work in state libraries are a lonely lot and love to hear and meet people who work in other state libraries.)

Once again - the notes are very sketchy and very minimally edited. I apologize for any errors in my reporting.

Digital Preservation Interest Group
Sunday, June 24 @ 8 am

"Collecting born-digital materials from the web: It's a CINCH!" Lisa Gregory @ North Carolina State Library
funded by IMLS grant
Capture INgest CHecksum too automate transfer of online content to a repository
Grabs online content, authenticates extracts metadata, repares for repository
Modular, flexible, easy to use, repository-neutral, open source
Focused on small and mid-size institutions
Why another digi pres tool?
State library shall be the official complete and permanent depository of state publications
Collecting born digital since 2004 and extensive digitization program
Get material by download, email or cd/hard drive
Very curated and manual process (4 staff members work on this)
Also crawl web using Archive-it (periodically dumped onto hard drives)
They do it every other month for most sites
Drawbacks - manual collection, not getting it all. staff could be doing something more value added with collection. Ingested objects may not be "authentic". We have to badger encourage contributors.
Drawbacks - website archiving - a web archive is hard to understand for users, harder to provide continuity from print to digital, you have tons of data
"How can we extract, use and preserve publications in an automated and prervation responsive way?"
User uploads file list - use list generated through IA's Archive-It report (or use site map generator)
Checks to see if their are duplicates from previous stuff you downloaded. File size limited .4Gb, checksum is calculated
CINCH grabs files and does virus scan. last modified date/time is verified, metadata is extracted, checksum is calculated, duplicate checks for current downloads
Final steps - creates a ZIP file, emails user, user downloads zip file
Produces list of problem files, metadata and audit trail, and files
Pull in PDF metadata from file properties, and some natural language processing of title, keywords, subjects
Can get it from
In process of moving from Digital Archive to Duracloud!!!!!

Lori Donovan from Internet Archive (UM grad)
"The Web is a Mess or How I learned to stop worrying and love web archiving"
Internet Archive - a digital library whose motto is Universal access to all knowledge
Goal of web archiving is to document changes to resources over time archive them and make them accessible.
Not just screen shots but links are archived
95% of gov't info is born digital. (State of the Federal web report)
Libraries have task/duty/role to collect documents/records of the time - which most often are web only
Archive-It - created in 2006.
Web based application allows user to create, manage and preserve web content.
Archived content includes html, vido, audio, pdf, social networking
Archived content accessible within 24 hours
205 institutions using Archive-It
How/Why do they use it? institutional mandate, augment physical collection, topical or event-based web-archives, can be used with records retention policies
Eg. of collections, Stanford & New York - harvest and preserve Iranian blogs
UT @ Austin - Latin American gov't docs archive
Electronic Literature Org - archive born digital literature
NCState Library - pushed for need to archive social networking sites
Access to Collections - can use restricted/login approach
Provide enhanced access - landing pages on own website. Some partners host content on own servers
(my question which was answered before I could ask - What about discovery - can agency add metadata to enhance access?)
answer - user can add collection level, seed level, document level metadata

Digital Preservation of Dynamic Reference Works: Where do we go from here?
Heather Ruland Staines - from Springer - publisher
Reference works - both separate titles, and database type content
How does Springer preserve our eBooks - PDF & metadata or xmlepub
Database content is preserved if individual books are preserved
SpringerReference - living reference work - wikilike for content providers - tracking versions and updates is important
With dynamic reference works - what is it we are trying to preserve?
content, organizational structure, user experience, or concepts

1 comment:

  1. Thanks for taking the time to take these notes. They are good to read through.

    I wanted to make it down there but you know how it just got in the way.

    However, glad to see ALA is in Seattle this winter. Ya for not having to fly to attend a conference! :)