Exploring Strange (and Amazing) Collections on the Internet Archive

I love exploring digital collections, so it’s probably no surprise that I’m an enormous fan of the Internet Archive.  The Internet Archive is a non-profit library that hosts digital versions of billions of archived webpages and millions of books, texts, audio recordings, videos, images, and even software programs.  Many people are familiar with Internet Archive due to it’s Wayback Machine collection of archived webpages (about the closest we currently get to preserving the internet), but the other collections on Internet Archive also deserve attention for the wonderful, educational, and sometimes bizarre text and media artifacts they contain.

Searching the Internet Archive

The Internet Archive includes both a simple search form accessible in the upper right corner of the page (which allows you to search across metadata, full text of books, TV captions, or archived websites) or an advanced search with fielded search forms or the option to search with lucene query syntax.

simple
Simple search form on Internet Archive.

When exploring the collections, I personally prefer to just select the icons for web, text, video, audio, software, or images in the upper left corner of the screen and then choose to view all items for that specific media type (like All Video).

videos
Video collections on the Internet Archive.

I’m then able to use the side filtering options to narrow my search by criteria like subject, collection, creator, or language.  I can also search across the metadata within that specific collection or media type.

m
Filtering video collections on the Internet Archive.

Strange Collections: Text, Video and Audio

I’m just going to focus on three media types in this post (text, video, and audio collections), but I hope to explore software, image, and web collections in a future post.  Here is just a quick sampling of some of the interesting collections to explore on the Internet Archive:

This is just scratching the surface of the Internet Archive’s digital collections.  Be careful about beginning to explore the Internet Archive, since once you get started, you may go down a rabbit hole that will take hours to find your way out of (like spending 3 hours listening to old-time radio shows).

Advertisements

Big Data at the Library of Congress

Health science libraries are learning how to assist healthcare professionals, researchers, and patients with managing the “Big Data” generated through clinical trials, electronic health records, wearable technology, and a variety of other sources, to help patients get more individualized and evidence-based care, to visualize and identify health disparities by location, and many, many other applications I can’t begin to list or even imagine yet.  Health science librarians aren’t the only librarians wresting with big data, though.

The Library of Congress (LOC) is busy with the creation and dissemination of enormous data sets, as I learned last Saturday at the National Book Festival when attending the presentation at the Library of Congress Town SquareLC for Robots! Mining the Library’s Digital Collections.  Library of Congress Innovation Specialist Jaime Mears discussed a few examples of how the LOC is promoting the dissemination and re-use of its data sets:

Capture
Library of Congress data sets on Data,gov.
  • Hack-to-Learn at the LOC: In May 2017, the Library of Congress offered a two day “hackathon” training to teach librarians how to mine digital collections.  The 61 attendees at the training were taught to use “low or no-cost computational tools to explore four library collection as data sets”, including the MARC record data set.

The Library of Congress is using open-access data sets, contests to encourage creative use and mining of the data sets, and training librarians in computer and data science fundamentals to transform itself into a true 21st Century library, with innovative applications of Big Data and digital collections leading the way.  I know I’m going to keep an eye on the Meetings and Events and Training calendars hosted by the Digital Preservation section of LOC, since they seem to offer interesting trainings on data science topics (like the hackathon) and meetings (like Collections as Data 2016 and 2017).

Browser Extensions for Link Checks, Accessibility, and Research

Browser extensions can help with all sorts of daily tasks, speeding up mundane work like link checking or finding research articles.  James Day at Library Technology Launchpad describes 6 Chrome Browser Extensions Every Librarian Needs, such as DOI Resolver or Google Scholar Button for research, Grammarly to proofread online writing, or Wayback Machine for viewing archived webpages.  I regularly use browser extensions myself (usually in Google Chrome) for link checking, and I’ve learned about a few interesting extensions for checking accessibility and for locating and organizing research articles.  Here’s a quick rundown:

check
Viewing valid, redirected, and broken links through Check My Links.
  • Link checking: I do periodic manual link checks for some online resources, and usually I’ll start the check by running the Check My Links extension to highlight which links on the page are live, redirecting, or dead.  I always try to manually click through all links (since sometimes a valid redirect may still lead to a page where the desired content is no longer available), but opening every link one at a time is an enormous pain.  Thankfully, there are browser extensions like Linky or Linkclump, where you can highlight or select a section of a webpage and automatically open all links within that selected area in separate tabs.  This can save a lot of time.
access
Accessibility testing for a webpage using WAVE.
  • Accessibility testing: When sharing online material from a government resource, the content needs to meet Section 508 requirements for accessibility.  The content needs to be equally accessible to anyone with disabilities (visual, auditory, cognitive, etc.), which means that content creators need to keep a number of guidelines in mind to make sure their content is fully compliant.  The browser extension WAVE can be used to evaluate accessibility of web content, and it will highlight any errors or alerts for accessibility issues on a webpage.  It will even identify issues with color contrast which may be hard for users with visual limitations to see.  See the Medium article Free web accessibility tools round-up  by Carlin Scuderi for a great list of accessibility check tools (including a few more Google Chrome extensions).
OA
Options menu for Unpaywall.
  • Research tools: 
    • Unpaywall: One browser extension I keep hearing about on library listservs, Twitter, and blogs is Unpaywall, which automatically searches for open access versions of paywalled journal articles.  When viewing an article on a publisher website, the extension automatically searches across “thousands of open-access repositories worldwide” to find full text (and legally uploaded) versions of the article (check their FAQ section to learn more). Unpaywall sounds like a very helpful tool for a librarian or researcher who needs an article from a journal that their institution doesn’t subscribe to, but who doesn’t have the time to wait to receive the article through inter-library loan.
    • Refigure: I recently learned about this tool from INFOdocket.  This extension seems more geared towards scientific researchers than librarians, but it was just too interesting not to mention. Refigure “aggregates and organizes different scientific figures amongst users”, which sounds like an innovative way for researchers to collaborate, organize, and share a more visual type of research data that may be overlooked in traditional databases.

There are so many browser extensions available (just in the Chrome web store alone!), it can be difficult to separate the useful from the useless.  That’s why I’m grateful for librarians on Twitter, library news resources, and listservs (like MEDLIB-L) for the many helpful recommendations on new extensions to try.

From Submarine Blueprints to Intricate Fruit: Digital Collections of Historic Images, Science and Medicine

Library collections often extend far beyond just books and journals, and today’s digital collections offer free access to all types of multimedia.  Online collections from the Library of Congress include photos/prints, manuscripts, video, audio, maps, and even archived websites. One of my favorite types of digital collections are historic images in the science and medicine field.  It can be fascinating to see catalog images for intricate machines from a century ago, infographics from the 1950s about medical careers, or beautifully detailed watercolors of plants.  Here are a few of my favorite places to look for historic science and medicine image collections:

Library of Congress Digital Collections (Science and Technology) – View 19 collections, such as Architecture, Design & Engineering Drawings. This collection “covers about 40,000 drawings (described in more than 3,900 catalog records), spanning 1600 to 1989” and includes a wide range of architectural and engineering designs, such as a submarine design from 1806.

3g06843v
[Submarine (“Submarine Vessel, Submarine Bombs and Mode of Attack”) for the United States government. Cock cavity and wheel details for “plunging boat”]
National Library of Medicine Digital Collections – I recommend exploring the almost 70,000 images within the Images from the History of Medicine collection.  Browse health-related advertisements, educational material, images of patients and healthcare professionals, medical illustrations, etc. from before 1600 to the present.  For example, check out this infographic from 1957 about the growing field of health service occupations.

ajaxp
Health service occupations: a growing field of employment for both men and women

Smithsonian Libraries Digital Collections – One of my favorite collections, which I first became familiar with when hunting for online trade literature collections for patent searches, is the Instruments for Science, 1800-1914 collection.  This collection lets you browse through catalogs for scientific instruments and machinery from over a century ago.  Here’s an instrument called a “Moist Chamber” from an 1899 catalog, which was used to “keep a muscle and nerve preparation damp during the experiment” (yikes).

smith
Moist Chamber (pg 29)

United States Department of Agriculture Special Collections – Some science images are absolute works of art, like the watercolors of fruits and nuts from the USDA Pomological Watercolor Collection.  This painting of strawberries from 1914 is one beautiful example.

agr
Fragaria: Pine Apple

This is only just scratching the surface of online image collections…if you have a lot of time to kill, visit the British Library Flickr page, which offers over a million public domain images scanned from old books.

Legacy of Beall’s List: Ongoing Efforts to Identify Predatory Journals

The sudden disappearance of Beall’s List of potential, possible, or probable predatory scholarly open-access publishers was one of the more dramatic sagas I’ve come across in the scientific publishing/librarian fields.  Here’s a bare-bones timeline of the story (cobbled together from No More ‘Beall’s List’ by Carl Straumsheim, Beall’s article What I learned from predatory publishers, and the Wikipedia page on predatory open-access publishing):

  • between 2008 and 2010: Jeffery Beall, librarian and researcher at University of Colorado Denver, first posted his list of predatory publishers on the Posterous blog platform.
  • January 2012: Beall launched a blog called Scholarly Open Access that listed predatory publishers/journals and also offered criticism of scholarly open-access publishing.
  • August 2012: Beall posted his criteria for evaluating publishers.
  • February 2013: Beall added a process for a publisher to appeal their inclusion in the list.
  • 2013: OMICS publishing group threatened to sue Beall for $1 billion for including them on the list (this threat obviously wasn’t successful, since the list lived on for another 4 years).
  • January 17, 2017: The list was taken offline.  Beall describes his reasons for taking down the list in his article What I learned from predatory publishers:

In January 2017, facing intense pressure from my employer, the University of Colorado Denver, and fearing for my job, I shut down the blog and removed all its content from the blog platform.

So that’s the story in a nutshell.  Beall’s List was highly controversial (angering both open-access publishers included on the list and some open-access advocates), but it was also incredibly useful, with many researchers and librarians using the list as an authoritative resource to identify predatory journals to avoid publishing in and using for research.

Thankfully, there are still ongoing efforts to identify predatory journals and guide researchers towards high-quality, reputable journals for publishing and research.  Many of these efforts utilize or build on Beall’s work.  Here are a few ways Beall’s legacy lives on:

  • Archived versions of Beall’s List: Some LibGuides and blogs link to or post archived versions of the list. A site called Beall’s List of Predatory Journals and Publishers (hosted on Weebly) also appears to be built from archived versions of Beall’s List.
  • Active updates of Beall’s List: The website Stop Predatory Journals seeks to continue updating Beall’s list through a collaborative community effort.  It’s unclear if this page is still regularly updated, though, since the last post on the homepage is from February 10, 2017.
  • Cabell’s Predatory Journal Blacklist (subscription tool): A Nature.com article titled Pay-to-view blacklist of predatory journals set to launch describes the new subscription-based predatory journal blacklist from scholarly-services firm Cabell’s International. Rick Anderson at The Scholarly Kitchen offers a detailed review of Cabell’s list.
  • Guidelines for Avoiding Predatory Journals: The website Thinkchecksubmit.org provides guidelines and resources for researchers to help them identify reputable journals where they can safely publish their work.

Are there other ways researchers and librarians are working to identify and avoid predatory publishers?  Let me know in the comments or on Twitter!

Takeaways from MLA 2017

I just attended my first Medical Library Association (MLA) Annual Meeting (this year in Seattle, WA), and I came away with a lot of great new ideas, resources, and news from the health sciences information field.  I’m still trying to absorb everything I’ve seen and learned over the past few days, but here’s a quick list of some of my most interesting takeaways from the conference:

  • Open Access Biomedical Journals – The vendor hall offered me the opportunity to explore the online tools and publications available from a variety of biomedical publishers, and I checked around for any open access resources they offered.  A few open access publications and resources I came across include:
  • Data Resources – 
rd3
New data resources portal from NNLM.
  • LibGuides to ExploreI find LibGuides very useful, so I kept an eye out during the poster sessions for any interesting projects related to LibGuides. Two fantastic LibGuides I learned about:
    • Mobile Resources for Health from the University of Florida – Learn about health-related apps, ranging from apps for healthcare professionals (clinical apps, administrative/productivity apps, E-journal and literature database apps, etc.) to apps for patient education.  The LibGuide is mobile-friendly, so learn about healthcare apps on your phone!
    • Disability Resource Guide Disability Resource Guide from University of Illinois – Learn about a variety of physical and mental disabilities, including depictions of the disability in popular literature and media, web/reference/academic resources, and common assistive technologies related to the disability.
  • New Online Learning Portal for MLA – The Medical Library Association recently launched MEDLIB-ED, an online education portal for health information professionals where users can “find, complete, track, and claim credit for educational activities.”  A free competencies self-assessment is available where users can learn about the newly revised MLA Competencies for Lifelong Learning and Professional Success, rate their skills, and use the ratings to plan professional development.
  • Product Updates from National Library of Medicine (NLM) – The NLM provided updates about a number of their free online tools, including:

These are just a few of my favorite highlights, but check Twitter for #MLAnet2017 for more updates and insights on the conference!

Finding Open Access Institutional Repositories

The NLM Technical Bulletin recently published a post describing how PubMed now includes links to full text of articles available through institutional repositories.  This is fantastic news, since this feature expands the possible open access resources for locating full text of indexed articles on PubMed beyond PubMed Central and publisher websites.  Institutional repositories are often overlooked treasures brimming with open access resources, including full-text journal articles (often preprint), theses, and other research output published by students and faculty at the institutions.

OpenScholarship.org defines institutional repositories as:

Digital collections of the outputs created within a university or research institution. Whilst the purposes of repositories may vary (for example, some universities have teaching/learning repositories for educational materials), in most cases they are established to provide Open Access to the institution’s research output.

So how can you find institutional repositories?  My two favorite resources are:

opendoar
Search for repositories on OpenDOAR.
ro.png
Homepage of ROAR.

OpenDOAR and ROAR have similar search and browsing features, but ROAR seems to have a larger collection of repository listings to search through.  I also prefer ROAR because it uses Library of Congress Classification to categorize repositories in its collection by subject.

ROARloc.JPG
ROAR uses LOC classification, how can a librarian resist?

If you want to learn more about open access repositories, check out the academic LibGuide Open Access Repositories – UC Santa Barbara Library.  Repository66.org also has a neat visualization of repositories on a global map.  If anyone knows any additional useful institutional repository resources, please share!