ABSTRACT
Introduction Left ventricular ejection fraction (EF) is an important factor for treatment decisions for heart failure. The EF is unavailable in administrative claims. We sought to evaluate the predictive accuracy of claims diagnoses for classifying heart failure with reduced ejection fraction (HFrEF) versus heart failure with preserved ejection fraction (HFpEF) with International Classification of Disease-Tenth Revision codes.
Methods We identified HF diagnoses for VA patients between 2017-2019 and extracted the EF from clinical notes and imaging reports using a VA natural language processing algorithm. We classified sets of codes as HFrEF-related, HFpEF-related, or non-specific based on the closest EF within 180 days. We selected a random heart failure diagnosis for each patient and tested the predictive accuracy of various algorithms for identifying HFrEF using the last 1 year of heart failure diagnoses. We performed sensitivity analyses on the EF thresholds, the cohort, and the diagnoses used.
Results Between 2017-2019, we identified 358,172 patients and 1,671,084 diagnoses with an EF recording within 180 days. After dividing diagnoses into HFrEF-related, HFpEF-related, or non-specific, we found using the proportion of specific diagnoses classified as HFrEF-related had an AUC of 0.76 for predicting EF≤40% and 0.80 for predicting EF<50%. However, 23.3% of patients could not be classified due to only having non-specific codes. Predictive accuracy increased among patients with ≥4 HF diagnoses over the preceding year.
Discussion In a VA cohort, administrative claims with ICD-10 codes had moderate accuracy for identifying reduced ejection fraction. This level of specificity is likely inadequate for performance measures. Administrative claims need to better align terminology with relevant clinical definitions.
Left ventricular ejection fraction (EF) is a critical factor in determining guideline-directed therapy for patients with heart failure (HF). However, EF is unavailable in administrative claims, limiting use of these data for quality measures or clinical research. Previous analyses of inpatient HF using International Classification of Disease-Ninth Revision (ICD-9) codes demonstrated low sensitivity for identifying HF patients with reduced (rEF) or preserved ejection fraction (pEF).1-3 We used the Veteran’s Affairs (VA) natural language processing algorithm for EF to evaluate the predictive accuracy of ICD-10 HF codes.
METHODS
We identified HF diagnoses for VA patients between 2017-2019 from VA, non-VA fee care, and Medicare administrative claims. We leveraged a validated natural language processing algorithm with >95% precision to extract EF from clinical notes and imaging reports.4 We excluded EF with ranges exceeding 10% as potential errors.
For each diagnosis, we identified the closest EF within 180 days. We then determined the proportion with EF≤40%, 40-50%, or ≥50% across codes (Table 1). We classified codes as HFrEF-related if over half had EF≤40% and HFpEF-related if over half had EF≥50%. We termed codes that met neither criterion or had total count <1,000 as non-specific.
Performance of HF Diagnosis Claims for Classifying Ejection Fraction
To test EF classification using multiple diagnoses, we identified a random diagnosis between 2018-2019 for each patient and all HF diagnoses in the prior year. We evaluated two patient-level predictors: (1) the proportion of specific HF diagnoses classified as HFrEF-related and (2) the proportion of all HF diagnoses classified as HFrEF-related. We then assessed three thresholds for identifying HFrEF: >0 (i.e., any HFrEF diagnosis), ≥0.5, and 1 (i.e., all HFrEF diagnoses). We calculated the area under the curve (AUC), sensitivity, and positive predictive value (PPV) for identifying EF≤40%, ≤45%, and <50%.
We performed multiple sensitivity analyses: (1) only clinician evaluation and management or inpatient principal diagnoses, (2) inpatient principal diagnosis alone, (3) EF within 30 days of diagnosis, (4) patients with ≥4 diagnoses, and (5) diagnoses within prior 90 days.
RESULTS
Between 2017-2019, we identified 11,817,035 HF diagnoses across 993,408 individuals. There were 358,172 patients and 1,671,084 diagnoses with an EF recording within 180 days. This included 398,650 (23.9%) VA outpatient, 652,716 (39.1%) VA inpatient, 279,729 (16.7%) non-VA outpatient, and 339,989 (20.3%) non-VA inpatient diagnoses. The median absolute time between diagnosis and EF recording was 1 day (IQR: 1-14 days). The median EF was 43% (IQR: 30-55%).
Table 1 lists the EF subgroup breakdown (EF≤40%, 40-50%, ≥50%) for each HF code. Among the 523,718 diagnoses classified as HFrEF, 67.6% had EF≤40% compared with 16.6% with EF≥50%. Among the 287,916 diagnoses classified as HFpEF, 77.6% had EF≥50% compared with 13.5% with EF≤40%. There were 859,450 non-specific diagnoses.
We identified a random diagnosis for 274,202 patients between 2018-2019 with an average age of 74.0 (SD: 10.5) and only 2.6% being women. The median number of total HF diagnoses in the prior year was 2 (IQR: 2-4).
Table 1 displays the performance of our predictors. Predictor 1 used the proportion of specific diagnoses classified as HFrEF-related. The AUC for predicting EF≤40% was 0.76, which increased to 0.80 for EF<50%. 23.3% of patients had only non-specific HF diagnoses and were not characterized. Using the proportion of all diagnoses classified as HFrEF-related (predictor 2) enabled predictions across the cohort but with decreased AUC of 0.73. Predictor 1 performed better among patients with ≥4 diagnoses in the prior year (AUC 0.79 for EF≤40%). At a threshold of >0 (at least 1 HFrEF diagnosis), sensitivity was 94.9% and PPV was 61.8%. Requiring all diagnoses to be classified as HFrEF increased specificity to 72.1% but decreased sensitivity to 77.7%.
DISCUSSION
Across a large VA patient cohort, administrative claims provided moderate accuracy (AUC 0.76) at identifying HFrEF using the proportion of specific HF diagnoses classified as HFrEF-related. However, a quarter of patients could not be analyzed due to only having non-specific diagnoses.
HFrEF classification improved with an EF threshold of <50% because clinicians frequently use systolic dysfunction codes for mid-range EF. However, clinical evidence and performance measures focus on EF≤40%. This limits the utility of administrative codes for identifying performance measure populations as over 1 in 3 patients classified as HFrEF have an EF >40%.
Desai and colleagues developed a claims-based EF classification model based on a cohort of 7,001 patients spanning ICD-9/ICD-10.1 Their algorithm predicted EF≤45% with a sensitivity of 32% and a PPV of 0.72.4 We achieved higher sensitivity by using multiple diagnoses over a 1-year period and including the I50.4X codes, which they classified as unspecified HF. Incorporating other patient-level variables may also improve classification. However, an algorithm that incorporates prior treatment or comorbidities may influence its validity for evaluating quality of care.
Current HF diagnosis codes likely remain inadequate for defining populations for quality measures. Research using administrative data should be cognizant of misclassification risk. Administrative coding requires better alignment with clinical definitions to maximize quality improvement and research. Fortunately, ICD-11 designates specific diagnoses for EF subgroups. Until then, studying heart failure with claims data will remain challenging.
Data Availability
The data is available from the Veteran's Affairs (VA) health system for VA researchers.
Footnotes
Funding: ATS is supported by a grant from the NHLBI (1K23HL151672-01).