TabMenu

Poverty and InequalitySexual and Reproductive HealthFamily, Maternal & Child HealthMethodology

Machine-Learning Algorithms to Code Public Health Spending Accounts

TitleMachine-Learning Algorithms to Code Public Health Spending Accounts
Publication TypeJournal Article
Year of Publication2017
AuthorsBrady, ES, Leider, JP, Resnick, BA, Alfonso, YN, Bishai, D
JournalPublic Health Rep
Volume132
Pagination350-356
Date PublishedMay/Jun
ISBN Number0033-3549
Accession Number28363034
Keywordshealth finance, machine learning, Public Health
Abstract

OBJECTIVES: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. METHODS: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. RESULTS: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of >/=6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. CONCLUSIONS: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

PMCID

PMC5415260