Summary
Sexually transmitted infections (STIs) continue to pose a substantial public health challenge in the United States (US). Surveillance, a cornerstone of disease control and prevention, can be strengthened to promote more timely, efficient, and equitable practices by incorporating health information exchange (HIE) and other large-scale health data sources into reporting. New York City patient-level electronic health record data between January 1, 2018 and June 30, 2023 were obtained from Healthix, the largest US public HIE. Healthix data were linked to neighborhood-level information from the American Community Survey. In this case-control study, chlamydia, gonorrhea, and HIV-positive cases were compared to controls to estimate the odds of receiving a specific laboratory test or positive result using generalized estimating equations with logit function and robust standard errors. Among 1,519,121 tests performed for chlamydia, 1,574,772 for gonorrhea, and 1,200,560 for HIV, 2%, 0.6% and 0.3% were positive for chlamydia, gonorrhea, and HIV, respectively. Chlamydia and gonorrhea co-occurred in 1,854 cases (7% of chlamydia and 21% of gonorrhea total cases). Testing behavior was often incongruent with geographic and sociodemographic patterns of positive cases. For example, people living in areas with the highest levels of poverty were less likely to test for gonorrhea but almost twice as likely to test positive compared to those in low poverty areas. Regional HIE enabled review of testing and cases using granular and complementary data not typically available given existing reporting practices. Enhanced surveillance spotlights potential incongruencies between testing patterns and STI risk in certain populations, signaling potential under- and over-testing. These and future insights derived from HIE data may be used to continuously inform public health practice and drive further improvements in provision and evaluation of services and programs.
INTRODUCTION
Sexually transmitted infections (STIs) continue to pose a substantial public health challenge. The Centers for Disease Control and Prevention (CDC) estimate there are about 67.6 million prevalent and 26.2 million incident cases of STIs in the United States.1 These statistics largely result from steady increases in rates of common nationally notifiable STIs such as chlamydia and gonorrhea over the last decade. During this same period, longstanding disparities in human immunodeficiency virus (HIV) and other STIs persist unabated among historically minoritized populations. For example, people who identify as Black or African American comprise 40% of new HIV diagnoses despite representing only 14% of the population.2
Traditionally, prevalence and incidence estimates are based on STI surveillance systems that rely heavily on direct notification from healthcare providers and laboratories. Due to this, only positive cases are typically reported to local health agencies as part of standard clinical practice. Lack of reporting on testing (e.g., number conducted) results in lack of information on positivity rates, limiting understanding of differences in population-level risk for public health services and programs. In addition, delays in case reporting and long lag time required for data aggregation and preparation of surveillance reports (often at least 9 months following the review period) substantially hinder real-time response to potential outbreaks.3 Furthermore, fulfilling this vital mandate of public health agencies is often hampered by an ailing and highly fragmented data infrastructure precluding reliable, timely, and accurate data.4
Numerous agencies have prioritized addressing health inequity through enhanced analytical capacity.3,5 Surveillance may be strengthened by incorporating rich, complementary data sources such as readily available public datasets (e.g., national survey data) and regional health information exchanges (HIEs).6 HIEs share electronic health record (EHR) data across multiple health systems and EHR platforms within a defined geographic area, allowing for aggregation of data in a fashion that closely approximates regional surveillance performed by departments of public health. Utilization of HIE data also affords the added advantage of granular, timely, and comprehensive clinical and demographic information typically unavailable using existing reporting practices.7,8 Prior studies involving HIE use have been associated with both cost and time savings on demographic and treatment requests made by public health staff,9 decrease in unnecessary diagnostic testing in emergency departments,10 improved continuity of care and patient outcomes,11–14 and a reduction in racial and ethnic disparities.15 Large publicly available datasets offer additional context when data linkages are made based on common geography, an important dimension of social determinants of health (SDOH).16,17
The objective of this study was to demonstrate how comprehensive, near real-time HIE data and a complementary, large public data source might be leveraged to promote timely and equitable surveillance of HIV and other STIs in support of existing public health reporting. Specifically, we sought to: 1) quantify the number of laboratory tests conducted and percent of tests that resulted in confirmed positive cases of chlamydia, gonorrhea, and HIV; 2) examine trends in concurrent testing for and co-occurring cases of chlamydia, gonorrhea, and HIV; and 3) characterize population-level differences between testing and positivity rates for chlamydia, gonorrhea, and HIV along geographic and sociodemographic dimensions.
METHODS
Study design, setting, and population
We performed a retrospective observational case-control study of adults residing and receiving care in New York City (NYC) between January 2018 and June 2023. We examined testing and test results for chlamydia, gonorrhea, and HIV. Laboratory testing included nucleic acid amplification for chlamydia and gonorrhea and serology testing for HIV. We defined cases as patients who received a test or had a positive test for a given STI and controls as patients who were untested or tested negative, respectively. This study was approved by the institutional review board of Columbia University Irving Medical Center (Protocol AAAT1774) and reporting follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline.
Data
Our primary data source was Healthix, the largest public health information exchange (HIE) in the United States, which serves the NYC metropolitan area and Long Island, New York. Funded by the New York State Department of Health, Healthix collects EHR and administrative data from more than 8,000 healthcare facilities for over 20 million patients. Healthix uses a proprietary probabilistic matching system paired with human intelligence to accurately match data collected for the same patient across multiple care sites.18 Healthix provided a limited dataset via secure file transfer protocol (SFTP) with patient information including demographics, diagnoses, encounters, procedures, lab results, and medications. For each patient in this extract, we requested all chlamydia, gonorrhea, HIV screening tests available. All data files were stored on a secure Linux server compliant with Health Insurance Portability and Accountability Act (HIPAA) standards.
We used the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize EHR and laboratory testing center data from different health systems and facilities.19 Source values for patient sex may have represented sex assigned at birth or the administrative sex used for billing purposes (which may reflect gender identity). Race and ethnicity were also often conflated in the source data, in particular with respect to persons identifying as Hispanic, Latina, Latino, Latine, or Latinx. For brevity, we use “Hispanic or Latino.” HIV cases were classified using the Recommended Laboratory HIV Testing Algorithm for Serum or Plasma Specimens.20 Laboratory tests for previously known positive cases and HIV viral load monitoring were removed from the dataset.
For each patient, we derived United Hospital Fund (UHF) neighborhood designations based on grouping of contiguous residential ZIP Code Tabulation Areas (Table S1).21 The NYC Department of Health and Mental Hygiene (DOHMH) commonly uses UHF neighborhood designations in STI surveillance reports. We also converted patient residential ZIP code to ZIP Code Tabulation Area (ZCTA) to identify area-based poverty level (ABPL) based on 2020 American Community Survey (ACS) 5-Year Estimates using the Census Bureau’s Application Programming Interface (API).22 ABPL, defined as the percentage of people earning below the Federal Poverty Threshold (FPT) within a ZCTA, is a standard measure of socioeconomic status (SES) used by the NYC DOHMH to describe and monitor disparities in public health data analyses, including STI reporting.23,24 In accordance with NYC DOHMH standard cut-points, we categorized neighborhood-level poverty as low, medium, high, and very high (<10%, 10% to <20%, 20% to <30%, 30% to 100% of residents in ZCTA living below the FPT, respectively). Patient ages were grouped into categories based on intervals used in recent NYC DOHMH reports (e.g., 20-24, 55-59, 65+).25
Outcomes and Variables
The primary outcomes were 1) occurrence of a laboratory test for chlamydia, gonorrhea, or HIV and 2) determination of a positive test result. For each STI of interest, we examined trends overall and differences based on factors such as patient sex, race, ethnicity, age, residential neighborhood, borough, and ABPL.
Statistical Analysis and Data Visualization
We computed descriptive statistics, including assessment of concurrent testing and co-occurring positive cases, using counts and percentages. We compared proportions using the χ2 test and means using the Student’s t-test. To assess neighborhood-level clustering of rates, we computed Moran’s I statistics. To estimate the odds of testing for a given STI or obtaining a positive test result, we fit unadjusted and adjusted logistic regression models using generalized estimating equations with an exchangeable correlation structure to account for clustering of multiple tests for a given patient. We computed unadjusted and adjusted odds ratios and 95% confidence intervals (CIs) using robust standard errors. Models examined patient sex, race, ethnicity, age, borough of residence, and ABPL. To compute absolute difference between case and test percentages, we divided the number of tests or cases in each UHF neighborhood by citywide totals, then subtracted the percentage of tests performed from the percentage of cases identified (e.g., a positive difference indicates more citywide cases than tests in the neighborhood). We conducted all statistical analyses with p<0.05 indicating statistical significance and generated chloropleth maps using the R programming environment, version 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria).
Role of the funding source
The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.
RESULTS
The HIE dataset comprised 4,767,322 patients with a mean age of 46 (standard deviation [SD], 18) years; 61% were identified as female and 38% as male (Table 1). Among patients, 14% were identified as Asian or Pacific Islander, 17% as Black or African American, 18% as Hispanic or Latino, 21% as White; 11% were labeled as “Other” in source data, and 19% were of unknown race. Generally, one to three percent of patients resided in each of the 42 neighborhoods in NYC (Fig. 1). Approximately 66% of patients lived in areas of low or medium ABPL and 33% in areas of high or very high ABPL (Table 1). Sociodemographic differences in the proportion of persons tested compared to those untested and persons with positive test results versus those with negative results were statistically significant (p<0.001 for all comparisons).
Diagnostic Testing and Positivity Rates
During the review period, patients received 1,519,121 tests for chlamydia, 1,574,772 tests for gonorrhea, and 1,200,560 for HIV (Fig. 2). Among tests conducted, 28,251 (2%) were positive for chlamydia (1,860 positive per 100,000 tests), 8,873 (0.6%) for gonorrhea (563 positive per 100,000 tests), and 3,374 (0.3%) for HIV (281 positive per 100,000 tests).
Concurrent Testing and Co-infection
Chlamydia and gonorrhea testing were largely concurrent (1,489,256 shared tests) comprising 98% of chlamydia tests and 95% of gonorrhea tests (Fig. 2). HIV testing in combination with testing for both chlamydia and gonorrhea (n=530,855) represented 44% of HIV tests (35% and 34% of chlamydia and gonorrhea tests, respectively). Dual testing for HIV and gonorrhea (n=27,376) was more common than dual testing for HIV and chlamydia (n=3,250). Single testing for HIV comprised 53% of all HIV testing. Testing was positive for both chlamydia and gonorrhea in 1,854 tests (7% of chlamydia and 21% of gonorrhea total cases). Results were positive for HIV and chlamydia in 23 tests (0.7% of HIV and 0.1% of chlamydia total cases), positive for both HIV and gonorrhea in 22 tests (0.6% of HIV and 0.3% of gonorrhea total cases), and positive for all three STIs in seven tests. One-third of positive chlamydia cases (34%) had an HIV test performed during the same visit (9,659 of 28,251 total cases). Nearly one-fifth (22%) of positive gonorrhea cases had an HIV test performed during the same visit (1,954 of 8,873 total cases).
Testing and Positivity by Population
We identified population-level trends concerning testing behavior and positive cases (Fig. 3, Tables S2-3). Men were nearly half as likely to test for chlamydia (adjusted odds ratio [aOR]:0.62, 95% CI:0.62-0.63) or gonorrhea (aOR:0.63, 95%CI:0.63-0.63) and more likely to test for HIV as women (aOR:1.16, 95%CI:1.15-1.17). In contrast, men were slightly more likely to receive a positive result for chlamydia (aOR:1.09, 95%CI:1.05-1.12), three times as likely to test positive for gonorrhea (aOR:3.28, 95%CI:3.11-3.45), and five times as likely to receive a positive test result for HIV (aOR:5.18, 95%CI:4.79-5.61).
Patients who were Asian or Pacific Islander were nearly as likely to test for gonorrhea or HIV as those who were White (aOR:0.95, 95%CI:0.94-0.95 and aOR:1.03, 95%CI:1.02-1.04, respectively) but far less likely to test positive (aOR:0.50, 95%CI:0.43-0.58 and aOR:0.33, 95%CI:0.26-0.42, respectively). Participants who were Black or African American were nearly as likely to get tested for chlamydia, gonorrhea, or HIV but their odds of receiving a positive result were three to four times as high as their odds of taking a test (e.g., aOR:4.08, 95%CI:3.73-4.46 for positive gonorrhea test compared to aOR:1.06, 95%CI:1.05-1.07 for gonorrhea testing). A similar but smaller difference was also notable in chlamydia testing among patients identified as Hispanic or Latino (aOR:1.13, 95%CI:1.12-1.14 for positive result versus aOR:2.32, 95%CI:2.19-2.46 for testing).
Notably, as age increased, odds of testing and odds of receiving a positive test result for chlamydia and gonorrhea decreased. Likelihood of testing for HIV decreased with age as well, though odds of testing positive did not follow this same trend. Compared to patients in the 18-19 years old category, patients who were 30-34 years old had the highest odds of testing positive (aOR:3.94, 95%CI:2.88-5.39) followed by patients who were 35-39 years old (aOR:3.49, 95%CI:2.54-4.78).
Differences in odds of testing and positivity were also found among patients given the poverty level of the area in which they lived. Compared to patients living in areas with low levels of poverty, those residing in areas with high levels of poverty were about as likely to get tested but were more likely to test positive for chlamydia (aOR:1.36, 95%CI:1.29-1.43), gonorrhea (aOR:1.61, 95%CI:1.45-1.77), and HIV (aOR:1.49, 95%CI:1.30-1.71). In contrast, patients living in areas with very high levels of poverty were less likely to test for either chlamydia (aOR:0.94, 95% CI:0.93-0.95) or gonorrhea (aOR:0.90, 95%CI:0.89-0.91) and more likely to test positive for all three STIs (aOR:1.41, 95%CI:1.33-1.50 for chlamydia; aOR:1.91, 95%CI:1.72-2.12 for gonorrhea; and aOR:1.79, 95%CI:1.55-2.07 for HIV).
Distribution of STIs by Neighborhood
Testing and cases exhibited marked spatial heterogeneity (Fig. 4, Table S4) with positive spatial autocorrelation indicating statistically significant levels of clustering in both testing and cases (Table S5). The largest percentage of chlamydia, gonorrhea, and HIV tests were conducted in the West Queens neighborhood (5.9%, 5.8%, and 6.0%, respectively). By comparison, the highest percentage of positive cases for chlamydia, gonorrhea, and HIV (5.9%, 7.6%, and 7.8%, respectively) were in the Bedford Stuyvesant-Crown Heights neighborhood of Brooklyn (compared to 5.1%, 3.5%, and 3.9%, respectively, in West Queens).
The largest positive absolute differences between case and test percentages for chlamydia, gonorrhea, and HIV were consistently found in the Bedford Stuyvesant-Crown Heights neighborhood of Brooklyn (Table S5). Chlamydia comprised 1.4% more citywide positive cases than the percentage of citywide tests and 3.0% more gonorrhea and HIV citywide cases than tests. The largest negative absolute difference between case and test percentages for chlamydia was in the Borough Park neighborhood of Brooklyn (-1.4%). For gonorrhea, the largest negative absolute difference was in West Queens (-2.3%). Flushing, Queens had the largest negative absolute difference (-2.4%) for HIV.
DISCUSSION
By leveraging data from a regional HIE with detailed, near real-time EHR data standardized across platforms and health systems, we established a robust, scalable data infrastructure to enhance STI surveillance, enable examination of population-level trends, and promote more effective and equitable testing and intervention. Our findings provide valuable information on laboratory-confirmed cases for chlamydia, gonorrhea, and HIV, in particular as they compare to underlying testing patterns and differ across populations and geographies.
The high burden of STIs, as evidenced by the large number of cases, underscores the importance of effective, timely surveillance and targeted interventions. Traditional surveillance heavily relies on direct notification from providers and laboratories, which limits understanding of population-level risk and the efficacy and equity of public services and programs. By incorporating data from HIEs and public datasets with neighborhood-level information relevant to SDOH, we were able to overcome these limitations and gain more comprehensive insights into STI testing and positivity rates.
Unsurprisingly, most patients had concurrent testing for chlamydia and gonorrhea as the most common nucleic acid amplification test performed is capable of identifying both diseases simultaneously. We did not, however, note similarly substantial concurrent testing for HIV in tandem with tests for chlamydia and gonorrhea. This underscores the need for implementing and strengthening integrated strategies that simultaneously screen for multiple STIs, as co-infections can have serious health consequences and contribute to ongoing transmission.26–28 By identifying the occurrence of concurrent testing and co-occurring infections, public health agencies can enhance their strategies for STI prevention and control by addressing missed opportunities for testing and implementing interventions to reduce repeated visits to seek care and the burden of multiple infections.
Enhanced surveillance also revealed notable disparities with respect to both distribution in and alignment between STI testing and positive cases among different socio-demographic groups. Consistent with previous studies, we found that members of certain populations were disproportionately affected by STIs. These findings highlight the urgent need for targeted interventions and strengthening of efforts to promote health equity.
Differences were also found when comparing testing and positivity rates based on patient sex, race, ethnicity, age, neighborhood, and socioeconomic status. Men were less likely to undergo testing for chlamydia and gonorrhea but had higher odds of receiving a positive result. This suggests potential missed opportunities for testing among men. This does not, however, lead us to recommend women receive less testing, as STIs are more commonly asymptomatic in women. Similarly, patients who were identified as Asian or Pacific Islander were almost as likely to test for HIV but far less likely to test positive compared to people who were identified as non-Hispanic white. People identified as Black, African American, Hispanic, or Latino were more likely to receive certain tests and test positive. Nonetheless, relative differences between testing and case rates signal possible undertesting. While STI testing generally decreased as age increased, odds of testing positive for HIV did not following this trend. Differences suggest missed opportunities to test. These findings highlight the complex interplay of socio-demographic factors in STI testing and positivity rates, emphasizing the importance of tailored interventions and culturally competent care. By utilizing HIE data, public health agencies may better identify and address disparities and tailor interventions to specific needs of different populations.
Geospatial analysis demonstrated marked heterogeneity in testing behavior and cases across neighborhoods and clustering of testing and cases. This suggests geography was a key factor among patterns of infection, diagnosis, and treatment. Unsurprisingly, this highlights the importance of the well-established practice of considering geographic in addition to demographic factors for STI prevention and control strategies. Our primary aim in presenting these expected results is to demonstrate the utility of HIE as a data source in characterizing spatial distribution of STIs to help identify high-risk areas, distinguish between sub-populations, and allocate resources more effectively and equitably (e.g., targeting interventions to specific areas via mobile testing sites).
It is widely reported that fragmented data collection systems at state and local levels have been underfunded for years and it has been estimated that modernizing non-federal public health reporting systems would require $8 billion per year for 10 years.4 In the era of big data, existing platforms like HIEs present opportunities to further empower public health professionals to conduct their work.6 This approach enables identification of emerging trends, prediction of outbreaks, and optimization of resource allocation for more targeted, effective population-based interventions. By continuously learning from real-world evidence, public health agencies may adapt strategies and responses to evolving STI dynamics more quickly, ultimately improving outcomes and reducing burden on communities.
Most notably, this data-driven approach aligns closely with a learning health system (LHS).29 The LHS framework can extend to public health systems which primarily operate at a population instead of an individual-service level.30 Indeed, a learning public health system (LPHS) might encompass multiple LHS and traditional health systems and routinely analyze large-scale datasets, extract insights regarding population health, and inform public health practice to enable continual improvement.
Limitations
We acknowledge several limitations to our study in need of address to achieve an equitable LPHS. First, Healthix data did not capture all testing and cases within the NYC metropolitan area. Without a comprehensive list of non-Healthix facilities, we were unable to determine which sites were missing and whether certain clinic types or locations were over/underrepresented. Populations less well represented may have received testing at non-participating facilities. We also used residential ZIP code to perform geospatial analyses. While, our limited data set only allowed for spatial resolution to the ZIP code level, full residential addresses are available to the HIE. Thus, public health agencies may be able to access such information if needed to ensure more precise, targeted intervention. Notably, source data for patient sex and gender as well as race and ethnicity were often conflated in our dataset, which may introduce bias. Nearly 20% of patients were also missing data on race and/or ethnicity. An additional 10% were only assigned a value of “Other” in the source data. While these concerns are commonly found among EHR data, they remain a challenge if we are to support elucidation of disparities and offer actionable findings to inform intervention. In addition, patients on pre-exposure prophylaxis (PrEP) for HIV prevention often undergo regular STI testing. Although our modeling accounts for repeated measures from the same patient, our approach does not fully address how this may influence findings. Lastly, our analysis evaluated patterns related to chlamydia, gonorrhea, and HIV. Future work would be necessary to establish that other STIs (e.g., syphilis) reflect similar trends.
Conclusion
Our study demonstrates the value of leveraging HIE and public data to enhance surveillance and inform interventions. By integrating detailed, near real-time data, public health agencies may gain a more nuanced understanding of testing patterns and positive cases, as well as identify disparities among different socio-demographic groups and geographic regions. This knowledge can guide development of targeted interventions, allocation of resources, and implementation of an equitable learning public health system. Moving forward, further research and collaboration between public health agencies, healthcare providers, and HIEs are needed to fully realize the potential of data-driven approaches in reducing the burden of STIs and improving community health outcomes.
Data Availability
The electronic health record and administrative data underlying this article are not available due to restrictions to preserve patient confidentiality.
Competing Interests Statement
The authors have no competing interests to declare.
Funding Statement
This work was supported by grants from the National Institute of Allergy and Infectious Diseases and National Library of Medicine at the National Institutes of Health (HRN: T15-LM007079; JZ: K23-AI150378, UM1AI069470) and a Computational and Data Science Fellowship from the Association for Computing Machinery Special Interest Group in High Performance Computing (HRN).
Contributor Statement
All authors had full access to the data in the study. They also take responsibility for the integrity of the data and accuracy of the analysis, and have approved the final manuscript.
Concept and design: Reyes Nieva, Elhadad, Zucker
Acquisition, analysis, or interpretation of data: All authors
Drafting of the manuscript: Reyes Nieva
Critical revision of the manuscript for important intellectual content: All authors
Statistical analysis: Reyes Nieva
Obtained funding: Elhadad, Zucker
Administrative, technical, or material support: Reyes Nieva, Tucker
Supervision: Elhadad, Zucker
Data Availability Statement
The electronic health record and administrative data underlying this article are not available due to restrictions to preserve patient confidentiality.
Supplemental Appendix
Appendix A. Supplementary Figures
Appendix B. Supplementary Tables
Acknowledgements
The authors thank Healthix for providing the data necessary to perform study analyses. We also thank Kayla Schiffer and Lindsay Grant for their involvement early in the project. Preliminary data from this study were presented at the STI & HIV 2023 World Congress in Chicago, IL on July 25, 2023 and the American Medical Informatics Association 2023 Annual Symposium in New Orleans, LA on November 12, 2023.