Library collections often extend far beyond just books and journals, and today’s digital collections offer free access to all types of multimedia. Online collections from the Library of Congress include photos/prints, manuscripts, video, audio, maps, and even archived websites. One of my favorite types of digital collections are historic images in the science and medicine field. It can be fascinating to see catalog images for intricate machines from a century ago, infographics from the 1950s about medical careers, or beautifully detailed watercolors of plants. Here are a few of my favorite places to look for historic science and medicine image collections:
This is only just scratching the surface of online image collections…if you have a lot of time to kill, visit the British Library Flickr page, which offers over a million public domain images scanned from old books.
Perhaps the most important lesson is the reminder that in a networked information world, preserving a single object in isolation may not actually preserve it if it consists of links to other resources which are lost.
The content of the web changes every second, and a website can be taken down at any time. If a Tweet links to a website that’s no longer available, how useful is an archived version of that Tweet? And that’s just the tip of the iceberg…the Library of Congress (LOC) has been trying to figure how out to create a usable archive of Tweets since 2010.
Twitter promised to hand over all the tweets posted since the company’s launch in 2006, as well as a regular feed of new submissions. In return, the library agreed to embargo the data for six months and ensure that private and deleted tweets were not exposed.
The Library of Congress has the raw data, but it struggles with the ever-growing size and complexity of the Tweets archive. With 500 million Tweets added a day (in 2012) and the added metadata of embedded images, videos, and conversation threads, the archive of Tweets has become nearly unsearchable with current technology available to the LOC. The Atlantic article quotes an LOC blog post from 2013 that describes how “executing a single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours.” Researchers desperately want free access to the Twitter archives, but the sheer volume, variety, and velocity of this big data makes it extremely difficult to create an easily searchable portal. Even if the LOC does create a searchable portal for the Twitter archives, how useful will those preserved Tweets really be without the context of working links?
Preserving the Internet: The Internet Archive
Twitter is just a single social media platform…how can we possibly preserve all versions of all websites ever available on the web? Many well written articles have already pondered this question:
One thread uniting these articles are mentions of the Internet Archive, which describes itself as “a non-profit library of millions of free books, movies, software, music, websites, and more.” You can search everything from copyright records to TV clips of President Trump on the Internet Archive, but the crowning achievement of the site is the Wayback Machine, which allows users to explore more than 299 billion web pages saved over the past two decades.
For example, if I want to explore all archived versions of the MedlinePlus homepage, I can just search by the URL and view 3,551 versions of the page, saved between April 7, 2000 and July 21, 2017. Some links on the archived pages will take you to similar archived versions of the linked webpages (although the captures of the linked pages may have a different time stamp). Many of the images and drop-down menus are also preserved, so you get a relatively accurate feel for what the webpage looked like during that time. The Wayback Machine is a fascinating tool for cultural and historical research, and it’s even used for more creative purposes like patent searching and improving search engine optimization (SEO).
Exhibiting the Internet: The Library of Congress
Although the Library of Congress has yet to release a usable Twitter archive, the LOC still offers plenty of smaller online content archives which provide valuable insights into web culture. The LOC recently announced the release of the Webcomics Web Archive and the Web Cultures Web Archive. The Webcomics archive focuses on “award-winning comics as well as webcomics that are significant for their longevity, reputation or subject matter”, while the Web Cultures archive includes “a representative sampling of websites documenting the creation and sharing of emergent cultural traditions on the web such as GIFs, memes and emoji.”
Each archived website includes a metadata page with a representative screenshot and bibliographic data about the website (including a summary and description of the site). The archived website page also links to a timeline of all captured versions of the site. For example, the Cute Overload! 😉 archived website page links to 122 captures of the Cute Overload homepage between October 3, 2006 to June 1, 2016.
While the Internet Archive aims for quantity and preserving as many webpage captures as possible, the Library of Congress online collections aim for a representative sample of high-quality sites. The Library of Congress collections also include helpful metadata for each archived website, so they are easily discoverable. The LOC collection can be used as an internet history museum, while the Internet Archive Wayback Machine is the closest thing we currently have to an actual archive of the internet. Hopefully we’ll eventually also have access to a full Twitter archive from LOC, but that may be years down the road.
Copyright is an incredibly important form of intellectual property in the US that protects “original works of authorship fixed in a tangible medium of expression”, ranging from artwork and novels to computer software and architecture. Copyright can also be an enormous pain to search, especially if you’re looking for pre-1978 copyright registrations. You very well may need to search for pre-1978 copyright registrations, since works originally copyrighted after 1922 and renewed before 1978 “have been automatically extended to last for a total term of 95 years” (learn more about copyright duration here). Basically, a work published in 1923 could still have an active copyright today.
If you’re searching for a post-1978 copyright registration, you can check the online Copyright Catalog. The search interface doesn’t have a lot of bells and whistles, but you can at least search by keyword, title, claimant, organization, etc. and quickly browse through lists of results.
You don’t have nearly as much luck if you need to search pre-1978 registrations. Here are the options that I’m aware of:
Search the copyright card catalog (which contains approximately 45 million cards covering 1870 through 1977) onsite in the Copyright Public Records Reading Room at the Library of Congress. If you don’t live near Washington DC, this may be tricky.
Try browsing digitized versions of the Catalog of Copyright Entries (CCE). The University of Pennsylvania has an excellent guide on locating digitized historic registration records. The Internet Archive has a collection of digitized Catalogs of Copyright Entries from July 1891 through December 1977. You can keyword search within individual volumes thanks to OCR’ed text, but I couldn’t find a way to keyword search across the text of all volumes at once. (Note: The Copyright Office states “The CCE does not contain all registration updates and does not contain entries for recorded documents, including assignments, and should not be used as the only reference.”)
Thankfully, the US Copyright Office is in the midst of a massive digitization project that will eventually “provide web-access to the pre-1978 Copyright registration records.” The Project Goals page gives an update on the current status of the project:
In 2014-2015 the Copyright Office completed the digitization of pre-1978 records for preservation. The Office is now capturing pre-1978 digital content and is moving towards integrating the content and card images into the existing online record.
There’s no estimated completion date for the project, and knowing the speed at which government works, it may be a few years before we see the pre-1978 records integrated into the online Copyright Catalog. At least the project is moving along (although it does concern me that the Project Blog link no longer works!). Kudos to the Library of Congress and US Copyright Office for undertaking this enormous task, and hopefully the project will help librarians more easily identify copyright status of older works in the future.