Takeaways from MLA 2017

I just attended my first Medical Library Association (MLA) Annual Meeting (this year in Seattle, WA), and I came away with a lot of great new ideas, resources, and news from the health sciences information field.  I’m still trying to absorb everything I’ve seen and learned over the past few days, but here’s a quick list of some of my most interesting takeaways from the conference:

  • Open Access Biomedical Journals – The vendor hall offered me the opportunity to explore the online tools and publications available from a variety of biomedical publishers, and I checked around for any open access resources they offered.  A few open access publications and resources I came across include:
  • Data Resources – 
rd3
New data resources portal from NNLM.
  • LibGuides to ExploreI find LibGuides very useful, so I kept an eye out during the poster sessions for any interesting projects related to LibGuides. Two fantastic LibGuides I learned about:
    • Mobile Resources for Health from the University of Florida – Learn about health-related apps, ranging from apps for healthcare professionals (clinical apps, administrative/productivity apps, E-journal and literature database apps, etc.) to apps for patient education.  The LibGuide is mobile-friendly, so learn about healthcare apps on your phone!
    • Disability Resource Guide Disability Resource Guide from University of Illinois – Learn about a variety of physical and mental disabilities, including depictions of the disability in popular literature and media, web/reference/academic resources, and common assistive technologies related to the disability.
  • New Online Learning Portal for MLA – The Medical Library Association recently launched MEDLIB-ED, an online education portal for health information professionals where users can “find, complete, track, and claim credit for educational activities.”  A free competencies self-assessment is available where users can learn about the newly revised MLA Competencies for Lifelong Learning and Professional Success, rate their skills, and use the ratings to plan professional development.
  • Product Updates from National Library of Medicine (NLM) – The NLM provided updates about a number of their free online tools, including:

These are just a few of my favorite highlights, but check Twitter for #MLAnet2017 for more updates and insights on the conference!

Visualizing Library Data in Socrata and Tableau

I decided it was time to experiment with Tableau again, and what better way to practice than using data from my local public library system, Montgomery County Public Libraries?  Locating MCPL data was almost as fun as using Tableau, since I was able to learn about and experiment with another data sharing and visualization tool called Socrata.

Socrata is a cloud-based platform that government organizations can use to host and share public data sets.  Montgomery County uses Socrata to power dataMontgomery, where I found a data set called Gov Stat MCPL Spreadsheet, listing Montgomery County Public Library performance measures.  The Socrata platform offers tools for filtering, sorting, visualizing, and exporting data sets, so I was able to filter and visualize the data in charts (like actual and projected numbers of “attendance of library programs” by fiscal year, displayed in a line graph).

soc
Data visualization in Socrata (actual and projected numbers of “attendance of library programs” by fiscal year).

I was also able to export the full data set to a CSV file in Socrata, which I then saved to Excel and uploaded to Tableau to practice creating a dashboard.  In my first Tableau viz I used the Story format (basically, a slide show of graphs and charts).  For my second viz, I decided to try the Dashboard format, where I can organize multiple charts on a single screen.  I created four charts but was only able to fit two of the charts comfortably on the dashboard screen (“Actual and Projected Attendance” and “Use of Library Services and Website”).  Here’s the completed viz, Service Usage and Attendance at Montgomery County Public Libraries (MCPL).

Untitled
My second Tableau viz.

I love experimenting with Tableau, but the best part of this exercise was learning about the data sharing and visualization capabilities of Socrata.  A quick Google search for “Socrata government data” shows that many local and state governments use Socrata to share data sets with the public (for example, Baltimore and Hawaii).  Federal government institutions also use Socrata to share data sets, like the open data catalog for the Institute of Museum and Library Services or NASA’s open data portal.  It’s a promising sign that both local and federal governments are making it a priority to openly share data with researchers and the general public, so anyone can use the data in new and creative ways.

How can health science librarians get involved in big data?

The following reflection was written for the online class Big Data in Healthcare: Exploring Emerging Roles, a fantastic free course provided by the National Network of Libraries of Medicine (NNLM).

Enormous data sets containing a broad variety of information produced at high velocity are transforming the healthcare field.  This “big data” is being used for clinical research, patient diagnosis and treatment, analysis of public health trends, and in many other innovative ways to move healthcare into a new era of highly personalized medicine.  Patients provide the health data, programmers and data scientists create new tools to manipulate the data, and clinicians and other healthcare professionals consult and analyze the data.  Health science librarians may wonder what roles they can play in this daunting but incredibly important new domain.  Librarians can use their specialized skills to fill three key roles in the big data field: they can act a liaisons between healthcare professionals and programmers, they can act as advocates for patients, and they can act as educators for patients and healthcare professionals.

Librarians regularly perform reference interviews and user needs assessments to determine the information and programming needs of their patrons, and these skills can help librarians become effective liaisons between healthcare professionals and programmers who create tools to manipulate big data.  In the presentation The Triple Aim at the Front Lines: Lessons from a VA Experience in using data to drive change, Dr. Nick Meo describes how in order to create more effective data tools for physicians, programmers need to know how frontline physicians are using these tools in their everyday practice.  Librarians can be the intermediaries in this situation.  After performing reference interviews, focus groups, and other forms of needs assessments with healthcare professionals, the librarian can then work with programmers to create data tools that fit the information needs and diagnostic/treatment processes of the healthcare team.

Librarians can also act as advocates for patients, by learning about patient concerns related to use of their personal health data and communicating these concerns to both the programmers and healthcare professionals.  In the article A ‘green button’ for using aggregate patient data at the point of care, Christopher Longhurst, Robert Harrington, and Nigam Shah suggest a change to HIPAA, so that it will be “acceptable for front-line clinicians to use aggregate patient data, even if identified, for the purpose of treating a similar patient under their care” (1233).  This idea may make aggregated patient data more easily accessible to clinicians, but how would patients feel about their personal health data being used in this manner?  Librarians can work with patients to gain their viewpoints on possible new uses for health data like the suggested “green button”, and patients may reveal ethical, privacy, or security concerns that programmers and healthcare professionals had not previously considered.

Finally, librarians can act as educators for both healthcare professionals and patients to demonstrate the value of utilizing big data in healthcare. Harlan Krumholz describes in the article Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system how healthcare professionals will need to change their viewpoints about best practices for research in order to fully embrace big data.  Librarians can begin changing viewpoints by presenting healthcare professionals with concrete examples of how big data has been used to improve patient care, as well as training resources for learning more about data science.  Librarians can also promote participation for patients within big data initiatives, by explaining how the projects will benefit public health.  For instance, librarians can explain to patients and the general public how participation in the All of Us Research Program may improve personalized medicine for current and future generations.

Health science librarians don’t need advanced programming skills or a medical degree as a prerequisite to work with big data.  Librarians already possess valuable communication and training skills which will make them effective liaisons between patients, healthcare professionals, and programmers who contribute to generating, analyzing, and creating tools for big data.

Learning Tableau

Creating interactive visualizations of data to tell a story is a great skill to have, but what if you don’t have programming skills?  I fall in the non-programmer boat (although hopefully I can fix that knowledge gap this year by learning R), but fortunately there are a ton of free online visualization tools, many of which don’t require programming knowledge.  Tableau is one option for creating free or low-cost interactive visualizations of large data sets using a drag-and-drop interface.

What is Tableau?

Tableau is data visualization software that includes both subscription and free versions.  The free version of the software is called Tableau Public.  Through Tableau Public, users can download the Tableau Desktop Public Edition app, upload and clean data, create visualizations, and then save and store visualizations (called “vizzes”) to your Tableau Public Profile.  You get 10GB of space in your Public Profile, and vizzes can be shared and embedded on websites and blogs.

How Can Libraries Use Tableau?

A quick search of Tableau Public shows some academic libraries using Tableau to create dashboards of library usage statistics (see Library Assessment for UMass Amherst Libraries or LibraryViz@OSU for Ohio State University Libraries).  Public libraries (like Brooklyn Public Libraries) may use a subscription version of the tool for indepth usage analytics and decision making.

Getting Started

For my first Tableau visualization, I decided to use a relatively large data set downloaded from PillBox.  PillBox is a database from the National Library of Medicine and can be used to identify unknown pills and capsules by visual indicators like color, shape, and size.  I wanted to explore the pill shapes, colors, and distributors for pills containing the active ingredient Acetaminophen.

My first (rather rough) attempt at a Tableau visualization can be found here.

Capture
Visit the full visualization.

I mostly just figured out how to use the interface through trial and error and Googling any questions I had about the tool (there’s a large and active user base for the software, thankfully).  The Tableau website also offers some basic tutorial videos.

A few random thoughts:

  • I had trouble using Tableau Desktop Public Edition app on my Dell, since the Dell Backup and Recovery program interfered with the app.  I had to uninstall Dell Backup and Recovery to get Tableau to work.
  • Now that Google has gotten in the data visualization game with Google Data Studio, Tableau better up it’s game.  Here’s a comparison of Google Data Studio and Tableau I found interesting.

 

Thoughts on Crowdsourcing: Patents, Health, and Libraries

While thinking about the hype surrounding “Big Data”, I started to think about another term that I’ve also been hearing tossed around a lot in the patent, health, and library fields: crowdsourcing. The Merriam-Webster Dictionary defines “crowdsourcing” as:

The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers.

Crowdsourcing has come of age in a digital environment, especially over social media, where individuals or organizations can solicit help from thousands of people instantly.  Everyone seems to be hopping on the crowdsourcing bandwagon, due to the low cost and speed of implementation and the sometimes highly creative responses provided by participants.  The main downside to crowdsourcing seems to be the need to sift through large amounts of contributions of highly varying quality (which may be an instance where “Big Data” skills and tools can come in handy).

Here are just a few examples of crowdsourcing in the patent, health, and library fields:

  • Patent Searching – A few companies, like Patexia and Article One Partners,  successfully crowdsource patent searches by offering prizes and rewards to any participants who successfully locate and submit highly relevant patent prior art.  Fledgling searchers can use free online patent databases,like Google Patents, Espacenet, and WIPO PATENTSCOPE (Patexia also offers free patent search tools), and participants receive guidance and support through online communities created on the company websites.
  • Challenges from National Institute of Health (NIH) – NIH has offered over a dozen contests  where researchers can submit solutions to various challenges, like “A Wearable Alcohol Biosensor” or “Innovation in Breast Cancer Genetics Epidemiology.” Government agencies like NIH can use the Challenge.gov site to post “a problem or question to the public and ‘solvers’ respond and submit solutions. An agency pays only for those solutions that meet the criteria and are chosen as winners.”
Capture
Recent challenges from the National Institute of Health on Challenge.gov.
  • Libraries Embrace Crowdsourcing – I can’t even pick a single example.  A multitude of blog posts, opinion pieces, and articles describe how institutional and public libraries use crowdsourcing for a variety of projects:
    • Library of Congress makes catalog corrections and enhancements to photographs in their collection, based on comments from users on Flickr.
    • New York Public Library is asking “citizen volunteers to provide identification, transcription, tagging and more” for a digitized collection of “bond and mortgage records from The Emigrant Savings Bank during the years 1841–1933.”
    • The Biodiversity Heritage Library, a consortium of natural history museums and botanical garden libraries, is “testing the effectiveness of gaming for crowdsourcing OCR text correction” and also “using crowds to verify the accuracy of semantic markup of text that was done by automated algorithms.”
    • The British Library has created a portal called LibCrowds, which lists challenges like “help create a catalogue of Lord Chamberlain’s Plays and Correspondence.”

I haven’t even discussed one of the greatest crowdsourcing achievements, Wikipedia, which has created a vast free online encyclopedia of modern human knowledge.

Crowdsourcing has drawbacks, but there are so many possible applications that result in brilliant discoveries and new tools based on harnessing the online knowledge base, I can’t even begin to list them all.

Where to Start with Big Data?

Librarians have to sink or swim in the constantly shifting waters of the information field, and the latest wave sweeping over information sciences is Big Data. I started learning about the importance of data analysis and visualization while working with patents, where analysis of large patent portfolios could be used for competitive intelligence, planning acquisitions, spotting trends in a technology sector, and much more.

Now working in the health field, I’m truly beginning to see why everyone calls it “Big Data.”  The amount of data generated through general healthcare services and biomedical research is truly staggering, ranging from data in electronic health records to genomic data generated through human genome sequencing.  How do we make this data searchable and reusable, so researchers can discover new innovations from existing data sets?  How do we also protect personal information, especially with data generated from electronic health records?  Can researchers retain intellectual property rights to their data while still making their data searchable and reusable?  There are so many thorny issues to consider and new concepts to learn surrounding Big Data and data science in general, and it can be a daunting task trying to find a place to start.

Here are a few resources which are helping me wrap my mind around basic data science concepts and the current state of Big Data:

  1. To get an overview of how data science is impacting the healthcare field, I’m taking the National Network of Libraries of Medicine (NNLM) online course Big Data in Healthcare: Emerging Roles.  (I highly recommend checking the NNLM Upcoming Classes list for other free courses and webinars you can sign up for.)
  2. Check out this recording of a webinar called Data Science 101: An Introduction for Librarians (also from NNLM), which provides a quick overview of data science concepts like the data science pipeline, machine learning, supervised learning, unsupervised learning, natural language processing, etc.
  3. IBM produced a great infographic called The Four V’s of Big Data, which describes how big data can be broken down into four dimensions: volume, velocity, variety, and veracity of the data.
  4. Learn about the FAIR Data Principles, which suggest that all data sets should be findable, accessible, interoperable, and re-usable.  A recent article in Nature gives a detailed overview of the FAIR Data Principles.
  5. I found the blog post Is Big Data Still a Thing? (The 2016 Big Data Landscape) by Matt Turck to be a useful overview of the current state of Big Data, especially the infographic included in the post which illustrates many of the major players in the field.