Skip to main content
Image of a circuit board with a chip in the middle.

Over the past two years, health disparities, including inequalities in economic status, access to healthcare, housing stability, English language skills, and education, have translated into a disproportionate COVID-19 toll on underserved individuals and communities. That the conditions under which we are born, live, learn, work, play, worship, and age — factors commonly referred to as the Social Determinants of Health (SDOH) — have a tremendous impact on our health has long been recognized. Yet, exactly how SDOH and the resulting health disparities exacerbate COVID-19 risk and how to effectively counteract them is complex. For example, internet access has been a persistent and stubborn barrier to scheduling COVID-19 vaccination appointments — compounding other challenges such as limited access to transportation, childcare, and time off from work. Surveys have shown — and our experience mining caller interactions from our COVID-19 vaccination scheduling hotlines has confirmed — the lack of information and uncertainty about logistics and eligibility makes it much harder for underserved individuals to access vaccinations. Confusion about eligibility for vaccinations, insurance, and documentation requirements, clinic locations, and inflexible appointment times kept many otherwise willing people from quickly getting their shots. Unfortunately, representative data to better quantify these impacts and contrast them to the many other health disparities remains lacking. As this example shows, SDOH and resulting health disparities are often complex and intertwined, and data to pinpoint strategies to break down health disparities remain scarce. The nation’s public health systems will need investment and modernization to improve the disconnect.

Systematically breaking down health disparities: artificial intelligence unlocks SDOH data.

Data on patients’ SDOH — and the health disparities experienced as a result — are still not systematically available. For instance, although SDOH data can now be recorded in a structured, standardized, and easy-to-analyze way in Electronic Health Records (EHRs), clinicians still rarely use this option — in less than 2% of patient records, according to one recent study. And even when captured, the information recorded this way only represents a few SDOH dimensions - predominantly homelessness, disappearance and death of a family member, problems related to living alone or in residential institutions, or conflict in a relationship with a spouse or partner. Instead, clinicians still record most of the information relevant to patients’ SDOH and resulting health disparities as hard-to-analyze, unstandardized, and unstructured information in clinician notes or other free-text EHR fields.

AI and Natural Language Processing (NLP) tools can be directed to extract and standardize this SDOH data, finally making it available for analysis. In our own hands, these NLP tools provide a versatile, highly automatable, cost-effective, and quite reliable way to extract relevant information from unstructured data, reflecting corresponding findings from the literature. For instance, a recent review article identified 42 studies that used NLP tools in this way. The SDOH and health disparities analyzed varied widely — from identifying homelessness and housing instability to extracting information about a patient’s lifestyle, identifying social risk factors such as living alone or with a poor social support network, and characterizing a patient’s socioeconomic status.

The AI tools used also varied widely. Several studies used ‘rules-based’ methods to search for a priori defined key terms. In contrast, others used unsupervised learning approaches that mined the data to determine rather than presuppose relevant key terms. For example, one of the more advanced unsupervised methods included using topic modeling to group words by underpinning topics such as drug use, housing stability, or hospital care — broader concepts that may be described by a range of different words and phrases, although pointing to the same underlying issues. Another method relies on deep neural network models that incorporate contextual information about where words appear in the text to better characterize their meaning — for instance, to describe lack of social support, educational attainment, or indicators of a history of self-harm or violence.

Artificial Intelligence can pinpoint where to address health disparities.

AI models can untangle deeply intertwined risk factors and offer valuable insights into who will likely suffer the most significant public health impacts and why. We have successfully leveraged AI tools to support public health in this purpose since 2017, and the literature further supports this finding. For instance, a recent review described a variety of Machine Learning (ML) models that reliably predicted the risk of cardiovascular disease as a function of multiple SDOH risk factors. However, the availability of SDOH data, particularly data related to the environment, was a key challenge for the models. As another case in point, a recent study used an ensemble of ML models to predict COVID-19 cases in Tennessee and found that socioeconomic and environmental factors, including access to healthcare and transportation, were increasingly important risk factors for COVID illness as the pandemic progressed. At the same time, the relative impact of age, race, and ethnicity decreased as more people were vaccinated and their vaccination status included in the model. Finally, in another study, a neural-network AI model based on SDOH data successfully identified Medicare Part D beneficiaries experiencing barriers to using an automated medication refill tool, highlighting participants for additional outreach to improve refill adherence. 

Unfortunately, AI can also inadvertently harm health equity.

One of AI’s biggest strengths – and perhaps its Achilles heel — is the ability to use “unsupervised” learning algorithms to identify and build complex predictive models based on observed patterns in the data without any explicit assumptions about their underlying root causes. Unfortunately, the population health data available to train these models is often not representative. For instance, data are often limited to a primarily white population and to a population that is too homogenous in age, health status, disabilities, socioeconomic status, and other SDOH risk factors. The resulting models are often poorly applicable to minority or otherwise disadvantaged populations, harming health equity and perpetuating racial bias. In addition, ML algorithms themselves can be plagued by algorithmic bias. Finally, persisting workforce, skill, and experience gaps — as well as erosion of trust in the equity underlying AI models — can hinder equitable benefits from SDOH insights, even when the AI models are well-built and well-trained.

Where to go from here?

Access to complex SDOH data and sufficiently powerful tools to interpret that data remain a significant barrier to addressing health disparities. As our own experience and a review of the relevant literature show, AI can be a valuable tool for addressing these challenges and helping to prioritize action. Still, it remains a tool that must be used with care. AI can help break down persisting health disparities – or further cement them. Successful examples where we and others have leveraged AI tools to improve population health and reduce health disparities include mining interactions with government benefit program beneficiaries to improve access to services, strengthening public health surveillance and risk prevention, and proactively identifying and counteracting health threats. But, unleashing the power of AI for better, more equitable population health holds some potential perils, as outlined recently in HHS’s Trustworthy AI Playbook. Ultimately, it will be up to all of us to make sure AI is trustworthy as we work to improve our insights into public health. To maximize AI’s promise while guarding against its perils, we need our use of AI to foster public trust and confidence while protecting privacy, civil rights, and liberties and upholding the Law and our American values.