How can health science librarians get involved in big data?

The following reflection was written for the online class Big Data in Healthcare: Exploring Emerging Roles, a fantastic free course provided by the National Network of Libraries of Medicine (NNLM).

Enormous data sets containing a broad variety of information produced at high velocity are transforming the healthcare field.  This “big data” is being used for clinical research, patient diagnosis and treatment, analysis of public health trends, and in many other innovative ways to move healthcare into a new era of highly personalized medicine.  Patients provide the health data, programmers and data scientists create new tools to manipulate the data, and clinicians and other healthcare professionals consult and analyze the data.  Health science librarians may wonder what roles they can play in this daunting but incredibly important new domain.  Librarians can use their specialized skills to fill three key roles in the big data field: they can act a liaisons between healthcare professionals and programmers, they can act as advocates for patients, and they can act as educators for patients and healthcare professionals.

Librarians regularly perform reference interviews and user needs assessments to determine the information and programming needs of their patrons, and these skills can help librarians become effective liaisons between healthcare professionals and programmers who create tools to manipulate big data.  In the presentation The Triple Aim at the Front Lines: Lessons from a VA Experience in using data to drive change, Dr. Nick Meo describes how in order to create more effective data tools for physicians, programmers need to know how frontline physicians are using these tools in their everyday practice.  Librarians can be the intermediaries in this situation.  After performing reference interviews, focus groups, and other forms of needs assessments with healthcare professionals, the librarian can then work with programmers to create data tools that fit the information needs and diagnostic/treatment processes of the healthcare team.

Librarians can also act as advocates for patients, by learning about patient concerns related to use of their personal health data and communicating these concerns to both the programmers and healthcare professionals.  In the article A ‘green button’ for using aggregate patient data at the point of care, Christopher Longhurst, Robert Harrington, and Nigam Shah suggest a change to HIPAA, so that it will be “acceptable for front-line clinicians to use aggregate patient data, even if identified, for the purpose of treating a similar patient under their care” (1233).  This idea may make aggregated patient data more easily accessible to clinicians, but how would patients feel about their personal health data being used in this manner?  Librarians can work with patients to gain their viewpoints on possible new uses for health data like the suggested “green button”, and patients may reveal ethical, privacy, or security concerns that programmers and healthcare professionals had not previously considered.

Finally, librarians can act as educators for both healthcare professionals and patients to demonstrate the value of utilizing big data in healthcare. Harlan Krumholz describes in the article Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system how healthcare professionals will need to change their viewpoints about best practices for research in order to fully embrace big data.  Librarians can begin changing viewpoints by presenting healthcare professionals with concrete examples of how big data has been used to improve patient care, as well as training resources for learning more about data science.  Librarians can also promote participation for patients within big data initiatives, by explaining how the projects will benefit public health.  For instance, librarians can explain to patients and the general public how participation in the All of Us Research Program may improve personalized medicine for current and future generations.

Health science librarians don’t need advanced programming skills or a medical degree as a prerequisite to work with big data.  Librarians already possess valuable communication and training skills which will make them effective liaisons between patients, healthcare professionals, and programmers who contribute to generating, analyzing, and creating tools for big data.

Where to Start with Big Data?

Librarians have to sink or swim in the constantly shifting waters of the information field, and the latest wave sweeping over information sciences is Big Data. I started learning about the importance of data analysis and visualization while working with patents, where analysis of large patent portfolios could be used for competitive intelligence, planning acquisitions, spotting trends in a technology sector, and much more.

Now working in the health field, I’m truly beginning to see why everyone calls it “Big Data.”  The amount of data generated through general healthcare services and biomedical research is truly staggering, ranging from data in electronic health records to genomic data generated through human genome sequencing.  How do we make this data searchable and reusable, so researchers can discover new innovations from existing data sets?  How do we also protect personal information, especially with data generated from electronic health records?  Can researchers retain intellectual property rights to their data while still making their data searchable and reusable?  There are so many thorny issues to consider and new concepts to learn surrounding Big Data and data science in general, and it can be a daunting task trying to find a place to start.

Here are a few resources which are helping me wrap my mind around basic data science concepts and the current state of Big Data:

  1. To get an overview of how data science is impacting the healthcare field, I’m taking the National Network of Libraries of Medicine (NNLM) online course Big Data in Healthcare: Emerging Roles.  (I highly recommend checking the NNLM Upcoming Classes list for other free courses and webinars you can sign up for.)
  2. Check out this recording of a webinar called Data Science 101: An Introduction for Librarians (also from NNLM), which provides a quick overview of data science concepts like the data science pipeline, machine learning, supervised learning, unsupervised learning, natural language processing, etc.
  3. IBM produced a great infographic called The Four V’s of Big Data, which describes how big data can be broken down into four dimensions: volume, velocity, variety, and veracity of the data.
  4. Learn about the FAIR Data Principles, which suggest that all data sets should be findable, accessible, interoperable, and re-usable.  A recent article in Nature gives a detailed overview of the FAIR Data Principles.
  5. I found the blog post Is Big Data Still a Thing? (The 2016 Big Data Landscape) by Matt Turck to be a useful overview of the current state of Big Data, especially the infographic included in the post which illustrates many of the major players in the field.