Librarians have to sink or swim in the constantly shifting waters of the information field, and the latest wave sweeping over information sciences is Big Data. I started learning about the importance of data analysis and visualization while working with patents, where analysis of large patent portfolios could be used for competitive intelligence, planning acquisitions, spotting trends in a technology sector, and much more.
Now working in the health field, I’m truly beginning to see why everyone calls it “Big Data.” The amount of data generated through general healthcare services and biomedical research is truly staggering, ranging from data in electronic health records to genomic data generated through human genome sequencing. How do we make this data searchable and reusable, so researchers can discover new innovations from existing data sets? How do we also protect personal information, especially with data generated from electronic health records? Can researchers retain intellectual property rights to their data while still making their data searchable and reusable? There are so many thorny issues to consider and new concepts to learn surrounding Big Data and data science in general, and it can be a daunting task trying to find a place to start.
Here are a few resources which are helping me wrap my mind around basic data science concepts and the current state of Big Data:
- To get an overview of how data science is impacting the healthcare field, I’m taking the National Network of Libraries of Medicine (NNLM) online course Big Data in Healthcare: Emerging Roles. (I highly recommend checking the NNLM Upcoming Classes list for other free courses and webinars you can sign up for.)
- Check out this recording of a webinar called Data Science 101: An Introduction for Librarians (also from NNLM), which provides a quick overview of data science concepts like the data science pipeline, machine learning, supervised learning, unsupervised learning, natural language processing, etc.
- IBM produced a great infographic called The Four V’s of Big Data, which describes how big data can be broken down into four dimensions: volume, velocity, variety, and veracity of the data.
- Learn about the FAIR Data Principles, which suggest that all data sets should be findable, accessible, interoperable, and re-usable. A recent article in Nature gives a detailed overview of the FAIR Data Principles.
- I found the blog post Is Big Data Still a Thing? (The 2016 Big Data Landscape) by Matt Turck to be a useful overview of the current state of Big Data, especially the infographic included in the post which illustrates many of the major players in the field.