Health science libraries are learning how to assist healthcare professionals, researchers, and patients with managing the “Big Data” generated through clinical trials, electronic health records, wearable technology, and a variety of other sources, to help patients get more individualized and evidence-based care, to visualize and identify health disparities by location, and many, many other applications I can’t begin to list or even imagine yet. Health science librarians aren’t the only librarians wresting with big data, though.
The Library of Congress (LOC) is busy with the creation and dissemination of enormous data sets, as I learned last Saturday at the National Book Festival when attending the presentation at the Library of Congress Town Square, LC for Robots! Mining the Library’s Digital Collections. Library of Congress Innovation Specialist Jaime Mears discussed a few examples of how the LOC is promoting the dissemination and re-use of its data sets:
- Chronicling America: Chronicling America is a digitized (and full-text searchable) collection of America’s historic newspaper pages from 1789-1925, sponsored jointly by the National Endowment for the Humanities (NEH) and the LOC. Users can access and search the data through APIs or even bulk data download. In 2016, the NEH hosted a contest called the Chronicling America Data Challenge, which “challenged members of the public to produce creative web-based projects using data pulled from Chronicling America, the digital repository of historic U.S. newspapers.” Winners included projects like America’s Public Bible: Biblical Quotations in U.S. Newspapers and American Lynching: Uncovering a Cultural Narrative.
- MARC Open-Access: In May 2017, the LOC announced that it was “making 25 million records in its online catalog available for free bulk download.” The bibliographic records had previously only been available through individual viewing or through a paid subscription for bulk access. The records can be downloaded through the MARC Distribution Services page on the LOC website or at Data.gov.
- Hack-to-Learn at the LOC: In May 2017, the Library of Congress offered a two day “hackathon” training to teach librarians how to mine digital collections. The 61 attendees at the training were taught to use “low or no-cost computational tools to explore four library collection as data sets”, including the MARC record data set.
The Library of Congress is using open-access data sets, contests to encourage creative use and mining of the data sets, and training librarians in computer and data science fundamentals to transform itself into a true 21st Century library, with innovative applications of Big Data and digital collections leading the way. I know I’m going to keep an eye on the Meetings and Events and Training calendars hosted by the Digital Preservation section of LOC, since they seem to offer interesting trainings on data science topics (like the hackathon) and meetings (like Collections as Data 2016 and 2017).