Tools to Visualize Local Health Data

Have you ever wondered which issues have the biggest impact on public health in your community, or how your county’s public health ranks in comparison to other counties in your state?  Here are two helpful tools for visualizing and comparing county-level health data, found through the list of County and Local Health Data tools at PHPartners.org (I originally learned about these tools through the free NNLM class Health and Wellness @ the Library: The Essentials of Providing Consumer Health Services).

CHSI 2015

CHSI 2015 (created by the Centers for Disease Control and Prevention) describes itself as “an interactive web application that produces health profiles for all 3,143 counties in the United States.”  Select a state and county to view an “at-a-glance” summary (under the “Summary Comparison Report” section) on “how the selected county compares with peer counties” (better, moderate or worse) “on the full set of Primary Indicators” (arranged under categories Mortality, Morbidity, Healthcare Access and Quality, Health Behaviors, Social Factors, and Physical Environment).

cdc1
The Summary Comparison Report for Montgomery County, MD at CHSI 2015.

CHSI 2015 also allows you to view county demographics data and county-level data for specific Primary Indicators.  For instance, the age adjusted Alzheimer’s disease death rate for Montgomery County, MD is 13.3 per 100,000 residents, while the US median rate is 27.3.

Capture
Data on Alzheimer’s disease death rate for Montgomery County, MD at CHSI 2015.

County Health Rankings and Roadmaps

County Health Rankings and Roadmaps (created by Robert Wood Johnson Foundation (RWJF) and University of Wisconsin) measure “the health of nearly all counties in the nation and rank them within states” using “county-level measures from a variety of national and state data sources.”  Check their Our Approach page for more information on their data sources and ranking methods.

Try searching by state from the County Health Rankings homepage, and then choose a county to view the Rankings data for the county (compared against overall state-level data and its ranking compared to other counties in the state)  under categories including Health Outcomes (Length of Life and Quality of Life) and Health Factors (Health Behaviors, Clinical Care, Social and Economic Factors, and Physical Environment).  Choose the “Show areas of strength” checkbox at the top of the screen to highlight public health factors where the county has a strong ranking, or choose “Show areas to explore” to highlight categories where the county has a weaker ranking.

rankings
Viewing County Health Rankings data for Montgomery County, MD.

Choose the “Compare Counties” option to create charts comparing the public health data of two or more counties (including counties in different states).  For instance, the screenshot below shows a chart comparing County Health Rankings data for Calvert, MD, Fairfax, VA, and Montgomery, MD.

compare
Comparing County Health Rankings data for three different counties.

I also want to highlight a website specifically for my local county (Montgomery County, Maryland) called Healthy Montgomery, which allows users to create customized health dashboards for their local zip code.

From the Healthy Montgomery homepage, choose the Community Health Dashboards option under the Find Data drop-down menu.  You can then choose to view county health dashboards based on a variety of health indicator measurements (like Healthy People 2020 or Maryland SHIP 2017).  You can also build a custom dashboard and filter to view only specific indicators, view data for a specific location (zip codes within Montgomery County), filter by comparisons (like Healthy People 2020 or Maryland SHIP), filter by subgroups (like age, gender, or race), or filter by data source.

The dashboards include helpful icons beside measurement data to indicate if the measurement is higher or lower than county/US average, or if the measurement has an upwards or downwards trend when compared to prior values.

healthymontg
Customized dashboard from Healthy Montgomery for Silver Spring, MD (zip code 20910).

While the county-level health data tools like CHSI 2015 and County Health Rankings are useful for getting a general idea about public health in larger communities, I hope all counties will eventually have websites like Healthy Montgomery available to view health status (and local health disparities) at a more granular, neighborhood-based level.

Preserving the Internet: Library of Congress and the Internet Archive

Preserving internet content seems to be a Sisyphean task, especially content from social media.  A recent article from Forbes.com on Why We Need To Archive The Web In Order To Preserve Twitter by Kalev Leetaru made a great point about the challenges of preserving online content:

Perhaps the most important lesson is the reminder that in a networked information world, preserving a single object in isolation may not actually preserve it if it consists of links to other resources which are lost.

The content of the web changes every second, and a website can be taken down at any time.  If a Tweet links to a website that’s no longer available, how useful is an archived version of that Tweet?  And that’s just the tip of the iceberg…the Library of Congress (LOC) has been trying to figure how out to create a usable archive of Tweets since 2010.

Obstacles in Archiving Twitter

An article last year from The Atlantic, Can Twitter Fit Inside the Library of Congress? by Andrew McGill, provides an overview of the agreement the Library of Congress made with Twitter in April 2010:

Twitter promised to hand over all the tweets posted since the company’s launch in 2006, as well as a regular feed of new submissions. In return, the library agreed to embargo the data for six months and ensure that private and deleted tweets were not exposed.

The Library of Congress has the raw data, but it struggles with the ever-growing size and complexity of the Tweets archive.  With 500 million Tweets added a day (in 2012) and the added metadata of embedded images, videos, and conversation threads, the archive of Tweets has become nearly unsearchable with current technology available to the LOC.  The Atlantic article quotes an LOC blog post from 2013 that describes how “executing a single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours.” Researchers desperately want free access to the Twitter archives, but the sheer volume, variety, and velocity of this big data makes it extremely difficult to create an easily searchable portal.  Even if the LOC does create a searchable portal for the Twitter archives, how useful will those preserved Tweets really be without the context of working links?

Preserving the Internet: The Internet Archive

Twitter is just a single social media platform…how can we possibly preserve all versions of all websites ever available on the web?  Many well written articles have already pondered this question:

One thread uniting these articles are mentions of the Internet Archive, which describes itself as “a non-profit library of millions of free books, movies, software, music, websites, and more.”  You can search everything from copyright records to TV clips of President Trump on the Internet Archive, but the crowning achievement of the site is the Wayback Machine, which allows users to explore more than 299 billion web pages saved over the past two decades.

For example, if I want to explore all archived versions of the MedlinePlus homepage, I can just search by the URL and view 3,551 versions of the page, saved between April 7, 2000 and July 21, 2017.  Some links on the archived pages will take you to similar archived versions of the linked webpages (although the captures of the linked pages may have a different time stamp). Many of the images and drop-down menus are also preserved, so you get a relatively accurate feel for what the webpage looked like during that time. The Wayback Machine is a fascinating tool for cultural and historical research, and it’s even used for more creative purposes like patent searching and improving search engine optimization (SEO).

Capture
Capture of the MedlinePlus homepage from April 7, 2000 on the Wayback Machine.

Exhibiting the Internet: The Library of Congress

Although the Library of Congress has yet to release a usable Twitter archive, the LOC still offers plenty of smaller online content archives which provide valuable insights into web culture.  The LOC recently announced the release of the Webcomics Web Archive and the Web Cultures Web Archive. The Webcomics archive focuses on “award-winning comics as well as webcomics that are significant for their longevity, reputation or subject matter”, while the Web Cultures archive includes “a representative sampling of websites documenting the creation and sharing of emergent cultural traditions on the web such as GIFs, memes and emoji.”

Each archived website includes a metadata page with a representative screenshot and bibliographic data about the website (including a summary and description of the site).  The archived website page also links to a timeline of all captured versions of the site.  For example, the Cute Overload! 😉 archived website page links to 122 captures of the Cute Overload homepage between October 3, 2006 to June 1, 2016.

Capture
Archived Website page for Cute Overload on the Library of Congress website.

While the Internet Archive aims for quantity and preserving as many webpage captures as possible, the Library of Congress online collections aim for a representative sample of high-quality sites.  The Library of Congress collections also include helpful metadata for each archived website, so they are easily discoverable.  The LOC collection can be used as an internet history museum, while the Internet Archive Wayback Machine is the closest thing we currently have to an actual archive of the internet.  Hopefully we’ll eventually also have access to a full Twitter archive from LOC, but that may be years down the road.

Takeaways from MLA 2017

I just attended my first Medical Library Association (MLA) Annual Meeting (this year in Seattle, WA), and I came away with a lot of great new ideas, resources, and news from the health sciences information field.  I’m still trying to absorb everything I’ve seen and learned over the past few days, but here’s a quick list of some of my most interesting takeaways from the conference:

  • Open Access Biomedical Journals – The vendor hall offered me the opportunity to explore the online tools and publications available from a variety of biomedical publishers, and I checked around for any open access resources they offered.  A few open access publications and resources I came across include:
  • Data Resources – 
rd3
New data resources portal from NNLM.
  • LibGuides to ExploreI find LibGuides very useful, so I kept an eye out during the poster sessions for any interesting projects related to LibGuides. Two fantastic LibGuides I learned about:
    • Mobile Resources for Health from the University of Florida – Learn about health-related apps, ranging from apps for healthcare professionals (clinical apps, administrative/productivity apps, E-journal and literature database apps, etc.) to apps for patient education.  The LibGuide is mobile-friendly, so learn about healthcare apps on your phone!
    • Disability Resource Guide Disability Resource Guide from University of Illinois – Learn about a variety of physical and mental disabilities, including depictions of the disability in popular literature and media, web/reference/academic resources, and common assistive technologies related to the disability.
  • New Online Learning Portal for MLA – The Medical Library Association recently launched MEDLIB-ED, an online education portal for health information professionals where users can “find, complete, track, and claim credit for educational activities.”  A free competencies self-assessment is available where users can learn about the newly revised MLA Competencies for Lifelong Learning and Professional Success, rate their skills, and use the ratings to plan professional development.
  • Product Updates from National Library of Medicine (NLM) – The NLM provided updates about a number of their free online tools, including:

These are just a few of my favorite highlights, but check Twitter for #MLAnet2017 for more updates and insights on the conference!

Visualizing Library Data in Socrata and Tableau

I decided it was time to experiment with Tableau again, and what better way to practice than using data from my local public library system, Montgomery County Public Libraries?  Locating MCPL data was almost as fun as using Tableau, since I was able to learn about and experiment with another data sharing and visualization tool called Socrata.

Socrata is a cloud-based platform that government organizations can use to host and share public data sets.  Montgomery County uses Socrata to power dataMontgomery, where I found a data set called Gov Stat MCPL Spreadsheet, listing Montgomery County Public Library performance measures.  The Socrata platform offers tools for filtering, sorting, visualizing, and exporting data sets, so I was able to filter and visualize the data in charts (like actual and projected numbers of “attendance of library programs” by fiscal year, displayed in a line graph).

soc
Data visualization in Socrata (actual and projected numbers of “attendance of library programs” by fiscal year).

I was also able to export the full data set to a CSV file in Socrata, which I then saved to Excel and uploaded to Tableau to practice creating a dashboard.  In my first Tableau viz I used the Story format (basically, a slide show of graphs and charts).  For my second viz, I decided to try the Dashboard format, where I can organize multiple charts on a single screen.  I created four charts but was only able to fit two of the charts comfortably on the dashboard screen (“Actual and Projected Attendance” and “Use of Library Services and Website”).  Here’s the completed viz, Service Usage and Attendance at Montgomery County Public Libraries (MCPL).

Untitled
My second Tableau viz.

I love experimenting with Tableau, but the best part of this exercise was learning about the data sharing and visualization capabilities of Socrata.  A quick Google search for “Socrata government data” shows that many local and state governments use Socrata to share data sets with the public (for example, Baltimore and Hawaii).  Federal government institutions also use Socrata to share data sets, like the open data catalog for the Institute of Museum and Library Services or NASA’s open data portal.  It’s a promising sign that both local and federal governments are making it a priority to openly share data with researchers and the general public, so anyone can use the data in new and creative ways.

How can health science librarians get involved in big data?

The following reflection was written for the online class Big Data in Healthcare: Exploring Emerging Roles, a fantastic free course provided by the National Network of Libraries of Medicine (NNLM).

Enormous data sets containing a broad variety of information produced at high velocity are transforming the healthcare field.  This “big data” is being used for clinical research, patient diagnosis and treatment, analysis of public health trends, and in many other innovative ways to move healthcare into a new era of highly personalized medicine.  Patients provide the health data, programmers and data scientists create new tools to manipulate the data, and clinicians and other healthcare professionals consult and analyze the data.  Health science librarians may wonder what roles they can play in this daunting but incredibly important new domain.  Librarians can use their specialized skills to fill three key roles in the big data field: they can act a liaisons between healthcare professionals and programmers, they can act as advocates for patients, and they can act as educators for patients and healthcare professionals.

Librarians regularly perform reference interviews and user needs assessments to determine the information and programming needs of their patrons, and these skills can help librarians become effective liaisons between healthcare professionals and programmers who create tools to manipulate big data.  In the presentation The Triple Aim at the Front Lines: Lessons from a VA Experience in using data to drive change, Dr. Nick Meo describes how in order to create more effective data tools for physicians, programmers need to know how frontline physicians are using these tools in their everyday practice.  Librarians can be the intermediaries in this situation.  After performing reference interviews, focus groups, and other forms of needs assessments with healthcare professionals, the librarian can then work with programmers to create data tools that fit the information needs and diagnostic/treatment processes of the healthcare team.

Librarians can also act as advocates for patients, by learning about patient concerns related to use of their personal health data and communicating these concerns to both the programmers and healthcare professionals.  In the article A ‘green button’ for using aggregate patient data at the point of care, Christopher Longhurst, Robert Harrington, and Nigam Shah suggest a change to HIPAA, so that it will be “acceptable for front-line clinicians to use aggregate patient data, even if identified, for the purpose of treating a similar patient under their care” (1233).  This idea may make aggregated patient data more easily accessible to clinicians, but how would patients feel about their personal health data being used in this manner?  Librarians can work with patients to gain their viewpoints on possible new uses for health data like the suggested “green button”, and patients may reveal ethical, privacy, or security concerns that programmers and healthcare professionals had not previously considered.

Finally, librarians can act as educators for both healthcare professionals and patients to demonstrate the value of utilizing big data in healthcare. Harlan Krumholz describes in the article Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system how healthcare professionals will need to change their viewpoints about best practices for research in order to fully embrace big data.  Librarians can begin changing viewpoints by presenting healthcare professionals with concrete examples of how big data has been used to improve patient care, as well as training resources for learning more about data science.  Librarians can also promote participation for patients within big data initiatives, by explaining how the projects will benefit public health.  For instance, librarians can explain to patients and the general public how participation in the All of Us Research Program may improve personalized medicine for current and future generations.

Health science librarians don’t need advanced programming skills or a medical degree as a prerequisite to work with big data.  Librarians already possess valuable communication and training skills which will make them effective liaisons between patients, healthcare professionals, and programmers who contribute to generating, analyzing, and creating tools for big data.

Learning Tableau

Creating interactive visualizations of data to tell a story is a great skill to have, but what if you don’t have programming skills?  I fall in the non-programmer boat (although hopefully I can fix that knowledge gap this year by learning R), but fortunately there are a ton of free online visualization tools, many of which don’t require programming knowledge.  Tableau is one option for creating free or low-cost interactive visualizations of large data sets using a drag-and-drop interface.

What is Tableau?

Tableau is data visualization software that includes both subscription and free versions.  The free version of the software is called Tableau Public.  Through Tableau Public, users can download the Tableau Desktop Public Edition app, upload and clean data, create visualizations, and then save and store visualizations (called “vizzes”) to your Tableau Public Profile.  You get 10GB of space in your Public Profile, and vizzes can be shared and embedded on websites and blogs.

How Can Libraries Use Tableau?

A quick search of Tableau Public shows some academic libraries using Tableau to create dashboards of library usage statistics (see Library Assessment for UMass Amherst Libraries or LibraryViz@OSU for Ohio State University Libraries).  Public libraries (like Brooklyn Public Libraries) may use a subscription version of the tool for indepth usage analytics and decision making.

Getting Started

For my first Tableau visualization, I decided to use a relatively large data set downloaded from PillBox.  PillBox is a database from the National Library of Medicine and can be used to identify unknown pills and capsules by visual indicators like color, shape, and size.  I wanted to explore the pill shapes, colors, and distributors for pills containing the active ingredient Acetaminophen.

My first (rather rough) attempt at a Tableau visualization can be found here.

Capture
Visit the full visualization.

I mostly just figured out how to use the interface through trial and error and Googling any questions I had about the tool (there’s a large and active user base for the software, thankfully).  The Tableau website also offers some basic tutorial videos.

A few random thoughts:

  • I had trouble using Tableau Desktop Public Edition app on my Dell, since the Dell Backup and Recovery program interfered with the app.  I had to uninstall Dell Backup and Recovery to get Tableau to work.
  • Now that Google has gotten in the data visualization game with Google Data Studio, Tableau better up it’s game.  Here’s a comparison of Google Data Studio and Tableau I found interesting.

 

Thoughts on Crowdsourcing: Patents, Health, and Libraries

While thinking about the hype surrounding “Big Data”, I started to think about another term that I’ve also been hearing tossed around a lot in the patent, health, and library fields: crowdsourcing. The Merriam-Webster Dictionary defines “crowdsourcing” as:

The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers.

Crowdsourcing has come of age in a digital environment, especially over social media, where individuals or organizations can solicit help from thousands of people instantly.  Everyone seems to be hopping on the crowdsourcing bandwagon, due to the low cost and speed of implementation and the sometimes highly creative responses provided by participants.  The main downside to crowdsourcing seems to be the need to sift through large amounts of contributions of highly varying quality (which may be an instance where “Big Data” skills and tools can come in handy).

Here are just a few examples of crowdsourcing in the patent, health, and library fields:

  • Patent Searching – A few companies, like Patexia and Article One Partners,  successfully crowdsource patent searches by offering prizes and rewards to any participants who successfully locate and submit highly relevant patent prior art.  Fledgling searchers can use free online patent databases,like Google Patents, Espacenet, and WIPO PATENTSCOPE (Patexia also offers free patent search tools), and participants receive guidance and support through online communities created on the company websites.
  • Challenges from National Institute of Health (NIH) – NIH has offered over a dozen contests  where researchers can submit solutions to various challenges, like “A Wearable Alcohol Biosensor” or “Innovation in Breast Cancer Genetics Epidemiology.” Government agencies like NIH can use the Challenge.gov site to post “a problem or question to the public and ‘solvers’ respond and submit solutions. An agency pays only for those solutions that meet the criteria and are chosen as winners.”
Capture
Recent challenges from the National Institute of Health on Challenge.gov.
  • Libraries Embrace Crowdsourcing – I can’t even pick a single example.  A multitude of blog posts, opinion pieces, and articles describe how institutional and public libraries use crowdsourcing for a variety of projects:
    • Library of Congress makes catalog corrections and enhancements to photographs in their collection, based on comments from users on Flickr.
    • New York Public Library is asking “citizen volunteers to provide identification, transcription, tagging and more” for a digitized collection of “bond and mortgage records from The Emigrant Savings Bank during the years 1841–1933.”
    • The Biodiversity Heritage Library, a consortium of natural history museums and botanical garden libraries, is “testing the effectiveness of gaming for crowdsourcing OCR text correction” and also “using crowds to verify the accuracy of semantic markup of text that was done by automated algorithms.”
    • The British Library has created a portal called LibCrowds, which lists challenges like “help create a catalogue of Lord Chamberlain’s Plays and Correspondence.”

I haven’t even discussed one of the greatest crowdsourcing achievements, Wikipedia, which has created a vast free online encyclopedia of modern human knowledge.

Crowdsourcing has drawbacks, but there are so many possible applications that result in brilliant discoveries and new tools based on harnessing the online knowledge base, I can’t even begin to list them all.