Unboxing Averages: Analysis of Descriptions of Adverse Events Related to Cardiovascular Devices | Epstein Becker & Green
While this column typically uses data visualizations that you’ve probably seen before, I want to introduce one that you might not have. This is in the field of text analysis. When looking at FDA data, there are many places where the most interesting information is not found in an easily quantifiable data field, but rather in narrative text. Take, for example, medical device adverse event reports, or “MDR”. While we can do a statistical analysis of MDRs showing, for example, which product categories have the most, the really interesting information can be found in the descriptions of the events.
Why should you care? Those who focus on product quality want to learn from the story, and how better than from everyone’s story, not just your own. We can use MDR data to discover product experiences with broad or narrow product categories. In 2020, for example, there were over 1.5 million such reports.
In this month’s column, I’m going to focus on cardiovascular equipment. I might as well focus on software, or any implants or braces or any other term that can cover a wide range of products. Because this stuff isn’t common, I’ll include the methodology first and then the visualization.
I use what is known in data science as subject modeling to extract information from event descriptions in MDRs. Thematic modeling is an approach that allows us to have an overview from a long list of documents. Essentially, thematic modeling seeks to identify common topics covered in a corpus of recordings.
While the output somehow sounds like English, neither is it. The algorithm searches for words that are frequently used together. The algorithm strings these words together based on their statistical significance, not how an English student would write them. Additionally, to help the algorithm recognize that the words “implant”, “implanted”, “implants” and “implantation” are all quite similar, we reduce each word to its root. In the table you will see all variations of the word valve such as “valv”. It takes some getting used to.
Also, since context is usually important for understanding how words are used, we are looking for phrases, in this case two words that are usually linked together. We could, however, use as many words in a sentence as we want, but more words means a lot more computer resources. In this case, the two-word sentences seem to work fine.
This exercise is a mixture of art and science, and one of the judgments to be made is the number of subjects to consider. We usually make this decision on the basis of what we call consistency, which is a statistical measure of what’s a logical principle: how well the words fit together. We want to find topics that are meaningful to humans. Another area of using judgment is getting rid of words that don’t interest us because they are too common and uninformative. I deleted words like “complaint” and “patient” as they appeared in many event descriptions and didn’t add any particularly useful information.
From a technical standpoint, for those interested, I use a specific technique called Latent Dirichlet Allocation, a form of unsupervised learning, implemented via the Python library genism.
For cardiovascular equipment, I thought it might be interesting to compare the subjects included in MDRs in 2010 with those in 2020. I wanted to see how many changes there might be over time. In 2010, there were over 45,000 MDR for cardiovascular equipment, and in 2020, there were almost 85,000.
A word about reading these tables. This is called heat maps. Colors correspond to the intensity or value of a particular word, in this case the subject. The darker the color, the more important the word is in characterizing the meaning of the subject.
A cardiovascular expert will no doubt be able to better understand the significance of this data, but it is interesting to note that over the past 10 years the software has uncovered what turned out to be many very similar. The order has changed. But a lot of the topics seem to be quite similar. For professionals, this can be a bit depressing as it suggests a lack of progress in solving common problems.
Breaking balloons always seems to be a problem. Guidewire tips always seem to be a problem. Battery issues always seem to be a problem. But we also have new issues, such as issues with a surface cooling device (Arctic Sun®) for therapeutic hypothermia after cardiac arrest.
This thematic modeling technique can be applied as widely or as tightly as we want. This can be very useful when analyzing trends when the valuable underlying data is found in large volumes of text rather than structured data. In the next few columns, I’ll dig into other sources of regulatory texts to see what information we can extract.