Variations in Using Diagnosis Codes for Defining Age-Related Macular Degeneration Cohorts
Fritz Gerald Paguiligan Kalaw, Jimmy S. Chen, Sally L. Baxter

TL;DR
This study finds that there is significant variation in how researchers use medical codes to identify patients with age-related macular degeneration, leading to inconsistent results.
Contribution
The study reveals a lack of standardization in using ICD codes for AMD, which affects cohort accuracy and reproducibility.
Findings
Only 7% of studies using ICD-9/9-CM correctly defined AMD codes, compared to 78% using ICD-10.
72% of cohort definitions had missing or incomplete AMD codes.
35% of articles included ICD codes outside the scope of AMD diagnosis.
Abstract
Data harmonization is vital for secondary electronic health record data analysis, especially when combining data from multiple sources. Currently, there is a gap in knowledge as to how studies identify cohorts of patients with age-related macular degeneration (AMD), a leading cause of blindness. We hypothesize that there is variation in using medical condition codes to define cohorts of AMD patients that can lead to either the under- or overrepresentation of such cohorts. This study identified articles studying AMD using the International Classification of Diseases (ICD-9, ICD-9-CM, ICD-10, and ICD-10-CM). The data elements reviewed included the year of publication; dataset origin (Veterans Affairs, registry, national or commercial claims database, and institutional EHR); total number of subjects; and ICD codes used. A total of thirty-seven articles were reviewed. Six (16%) articles…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Retinal Diseases and Treatments · Data-Driven Disease Surveillance
Introduction
Age-related macular degeneration (AMD) is a progressive degenerative retinal disease that affects the macula and is one of the leading causes of blindness in the adult population in Western society aged 55 years and older [1]. Its development is multifactorial in origin, with a combination of different interactions between retinal microvasculature, metabolic, environmental, and genetic factors [1,2]. It has been classified by the Beckman Initiative for Macular Research Classification Committee into early, intermediate, and late AMD [3]. Late AMD has been subdivided into neovascular AMD or geographic atrophy. Neovascular AMD is characterized by the formation of new blood vessels within the macula, which may cause an accumulation of fluid or blood within the intraretinal, subretinal, or subretinal pigment epithelium (RPE) [4]. Geographic atrophy, on the other hand, is characterized by the appearance of atrophic lesions on the outer retina caused by the loss of photoreceptors and RPE [5]. Because of the complex nature of the disease process, numerous studies have emerged since it was first discovered to better understand the pathophysiology and management of such a common and yet blinding disease.
The widespread adoption of electronic health records (EHRs) has facilitated the availability of observational health data for clinical use or research. With this, several clinical registries in ophthalmology have been established and were noted to have grown significantly in the past decades, which could help in quality improvement and research [6]. Some examples of this are nationwide registries such as the American Academy of Ophthalmology Intelligent Research In Sight (IRIS^®^) Registry [7] and the National Institutes of Health (NIH) All of Us Research Program [8]. These data sources have integrated structured EHR data into large datasets that can be used for retrospective studies. For observational studies that entail a secondary analysis of EHR data, investigators often use standardized diagnosis codes to identify a cohort of patients relevant to their study question.
The World Health Organization established the International Classification of Diseases (ICD) as a standardized coding of human diseases from data reported globally. The clinical terms coded in the ICD are the main basis for recording diseases, which are used for health recording, statistics, and death certificates [9]. With several iterations, the ICD has been regularly updated throughout the years. The ICD-9 was initially published in 1977 and the ICD-10 in 1994. ICD-9 uses four to five digits to categorize specific diagnosis or pathology. In ICD-10, alphanumeric coding can reach as many as seven digits to provide further granularity of diagnosis. Additional provisions and modifications have been provided throughout the years [10]. In the United States, modifications of the ICD-9 and ICD-10 called Clinical Modifications (CMs) were developed to ensure the clinical accuracy and utility of disease codes [11]. Its latest revision (ICD-11) was adopted in 2019 and came into effect in early 2022, although the CM version for use in the United States has not yet been developed and widely implemented [12]. AMD diagnosis codes are available in ICD-9 and more extensively in ICD-10/ICD-10-CM (Table S1).
Despite the availability of the diagnosis codes for AMD, they may not necessarily be used consistently in observational studies involving EHR data. Lack of standardization in cohort definitions is a common challenge in observational research and can limit generalizability and reproducibility across studies if study cohorts are defined differently. Here, we conducted a review of observational studies using ICD codes to define cohorts of AMD patients to understand the current usage, variations, and opportunities for future improvement.
Methods
This study did not entail a direct analysis of health data and focused on reviewing published literature, which does not entail human subject research. It adhered to the tenets of the Declaration of Helsinki.
Article Search and Review
2.1.
All articles published before the search (12 November 2023) were identified in PubMed using the following constructed terms in the search box: “macular degeneration AND (ICD OR diagnosis codes OR billing codes)”. Articles included in the Web of Science were also identified in the search box using the term “macular degeneration ICD”. The authors performed a manual review of each article, and the articles were included based on the following eligibility criteria: (1) Studies entailing analyses of retrospective data from electronic health records from clinical institutions, registries, or national or commercial claims databases; (2) used and listed diagnosis codes defined from ICD-9, ICD-9-CM, ICD-10, or ICD-10-CM; (3) provided the total number of subjects identified in the cohort of AMD codes used; and (4) full-text articles available in English. Articles that fulfilled the eligibility criteria were parsed, recorded, and analyzed.
Article Parsing
2.2.
For each article that was included, the following data were extracted: study dataset (e.g., Veterans Affairs, registry, national or commercial claims database, and institutional EHR); year of publication; ICD terminology used; type of AMD the investigators aimed to study (e.g., all AMD patients, neovascular AMD, or non-neovascular AMD); the set of ICD codes the study investigators used to comprise their cohort definition; and the total number of subjects among the AMD cohort. For the purpose of comparison, diagnoses were categorized using synonymous terms. For example, “non-neovascular AMD” was used to encompass dry or non-exudative AMD, and “neovascular AMD” was used to include studies regarding wet or exudative AMD. The ICD codes were cross-checked for appropriateness with the diagnosis of AMD. For example, if the ICD codes included were related to neovascular AMD, non-neovascular AMD, or both (defined in our table as “AMD”).
Statistical Analysis
2.3.
All data elements were tabulated, analyzed, and represented using Microsoft Excel and PowerPoint version 16.58 (Microsoft Corporation, Redmond, WA, USA). First, we analyzed the distribution of the data sources (e.g., Medicare, Veterans Affairs, institutional EHRs, etc.) by publication year. Next, we analyzed the extent of alignment between the codes used in each individual study against the set of relevant ICD codes for each terminology and cohort group. For example, to define neovascular AMD in ICD-9 terminology, the following code was deemed appropriate for the cohort definition: [36252]. If a study defined a cohort of neovascular AMD patients using ICD-9, we evaluated whether the set of codes they used for their cohort definition had an exact match with our gold standard cohort definition. If there was not an exact match, we evaluated whether there were too many codes included (such as including non-neovascular AMD codes or non-AMD codes entirely, for example) or too few codes included (such as not including some of the relevant codes for neovascular AMD). These were tabulated to calculate the proportion of studies with correct coding matches for each version of ICD terminology. See Supplemental Table S1 for a list of our standardized cohort definitions. We used Fisher’s exact test to evaluate whether there was a significant difference in the proportion of correctly matched cohort definitions between studies using the ICD-9 and ICD-10 terminologies. We also generated a Sankey diagram to illustrate the distribution of exact matches in codes, excess codes, and missing codes by ICD terminology. Finally, we conducted a bibliometric analysis of co-authorship networks and created visualizations of these networks using VOSViewer v1.6.20 (Centre for Science and Technology Studies, Leiden University, The Netherlands, www.vosviewer.com, accessed on 10 April 2024), a free software used for creating maps based on network data.
Results
The initial query of PubMed and Web of Science yielded 250 articles. Two hundred and thirteen articles did not meet the eligibility criteria; hence, 37 articles were parsed and analyzed (Figure 1).
A total of 8,398,072 subjects were studied among the eligible articles. Article publications ranged from 2003 to 2023, with the majority (22/37, 59%) published within the last decade. The largest proportion of the studies obtained their cohort from national claims databases (Medicare) (14, 38%). This was followed by commercial claims databases (9, 24%). Table 1 and Figure 2 show the distribution of dataset origin per year, showing the consistency of using the Medicare database within the past two decades and a rise in the use of institutional EHRs within the last few years, as well as the availability of published data using various dataset origins in the last year.
EHR—Electronic Health Record
Table 2 presents the AMD cohort definition used for each article, while Figure 3 summarizes how well the AMD cohort definitions align with the set of appropriate codes for the cohort of interest. Six (16%) articles used cohort definitions from two ICD terminologies. ICD-9 and ICD-9-CM were used in 12 (32%) and 13 (35%) articles, respectively, whereas ICD-10 and ICD-10-CM were used in 5 (14%) and 1 (3%) article, respectively, and combined ICD-9 and ICD-10 in 4 articles (11%). For the studies that used ICD-9 and 9-CM terminologies, only 2 (out of 30, 7%) defined and utilized the appropriate four AMD codes (362.5, 362.50, 362.51, and 362.52), on average missing two AMD codes per article. Most of the missing codes were either 362.5 or 362.50 for ICD-9/9-CM. For the studies that used ICD-10 terminologies, seven (out of nine, 78%) defined the AMD codes correctly (H35.3), while two used a different coding (H35.31 and H35.32). Based on our review, only two studies used ICD-10-CM terminologies; one defined all diagnosis AMD codes, and the other did not. Using Fisher’s exact test, our analysis showed that studies using ICD-10 terminology were significantly more likely to have an exact match with the appropriate set of codes compared to those using ICD-9 terminology (p = 0.0001).
Moreover, 13 articles included ICD codes that were outside the scope of the diagnosis of AMD (Table 3). These included diagnoses such as cystoid macular degeneration of the retina, drusen of the retina, serous detachment of the retinal pigment epithelium, and hemorrhagic detachment of the retinal pigment epithelium.
We used VOSViewer to map co-author networks, as shown in Figure 4. This co-authorship analysis refers to the relatedness or link of items based on the number of co-authored documents. We used the co-authorship network to determine the group of co-authors and the links between these co-authors who studied AMD using controlled terminologies such as ICD. Our analysis revealed that authors clustered differently based on the cohort definitions of AMD, using ICD-9, ICD-9-CM, ICD-10, and ICD-10-CM. Out of the 37 reviewed articles, we found 27 clustered groups that used cohort definitions from different ICD terminologies.
Discussion
The present study systematically reviewed 37 published articles that used different definitions of AMD based on ICD-9 and 10 terminologies in defining cohorts for their studies. The present study uncovered the following findings: (1) The use of national databases serves as an important tool to extract big data, with institutional EHRs becoming increasingly used in the last few years to capture patient data and relevant information; (2) There has been underutilization of AMD diagnosis codes, which may lead to underestimating a set of cohorts; and (3) The use of non-AMD diagnosis codes, which may lead to overestimation of a set of cohorts.
The first revision of the ICD (ICD-1) was established over a century ago and has been on periodic revision thus far. The ninth and tenth revisions (ICD-9 and ICD-10) have been implemented since 1979 and 1999, respectively [50]. Medicare is a federal health insurance program generally for individuals over 65 years of age among US citizens [51] and has over 65 million beneficiaries as of March 2023 [52]. Studies using Medicare administrative claims were first published in 1979 and have since been growing [51]. Since AMD commonly affects the older adult population, using Medicare claims would be advantageous to use for studying AMD. This was reflected in our review, as Medicare databases had the highest proportion among the observational studies reviewed. Although national registries and commercial claims-based data provide heterogeneous and robust patient data, limitations such as generalizability, coverage restrictions, lack of billing codes, difficulty accessing and using the data, and understanding the data may deter sampling methods [53–56]. The passage of the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 paved the way to advancing EHR use [57]. One of its potential advantages is the improved ability to conduct research and ease of access [58]. In ophthalmology, one advantage to using EHR data is the availability of specialty-specific information that can be linked and integrated into the patient data, such as multimodal retinal imaging data like fundus images, optical coherence tomographic scans, and visual fields. The usage of institutional EHR data in studying AMD has also been increasing, and as seen from our review, it has been notable within the past decade.
The second key finding of the present study was the underutilization of AMD diagnosis codes. ICD-9-CM has four AMD condition codes, with 362.5 (degeneration of macula and posterior pole) and 362.50 (macular degeneration [senile], unspecified) being distinct from each other but can be mistaken as one due to a minor addition (the fifth digit: number 0). This can confuse clinicians or investigators when inputting codes and can underestimate the cohort when doing research. The other two (362.51 [nonexudative senile macular degeneration] and 362.52 [exudative senile macular degeneration]) have been the most commonly used codes in each cohort. In studies where “AMD” was the target cohort, the studies averaged two unused codes, which may underrepresent the population. A recent study on AMD condition coding reported an underreporting of geographic atrophy, an advanced form of AMD, due to incorrect coding as intermediate dry AMD from the seventh digit of the ICD-10-CM coding [59]. Regarding the use of ICD-10, which only provides a single code for AMD (H35.3—degeneration of macular and posterior pole), nearly all studies captured the proper code. The ICD-10-CM coding for AMD has become more specific, adding subclassifications to the disease classification [60]. It has more data granularity, including laterality, disease classification, and clinical activity (Table S1). In terms of the use of ICD-10-CM terminology, one study identified 16 codes for neovascular AMD, and the other only targeted 2 out of the 47 codes for AMD in general. However, transitioning from aggregate (ICD-9-CM) to granular (ICD-10-CM) data poses some challenges. The complexity of coding makes it difficult for physicians to participate in encoding to ensure an appropriate diagnosis [61]. As seen from the present results, most studies used a number of codes, less than what is available, to define the AMD cohort, which may lead to underrepresenting the targeted population. Our diagram illustrates these, where only 21% had utilized the correct diagnosis codes.
The third key finding was using non-AMD diagnosis codes from the ICD terminologies. Thirteen (35%) of the reviewed articles were noted to have additional codes unrelated to the diagnosis of AMD, even though the stated patient population of interest was AMD. This included the following: serous detachment of the retinal pigment epithelium (362.42), hemorrhagic detachment of the retinal pigment epithelium (362.43), cystoid macular degeneration of the retina (362.53), and drusen (degenerative) of the retina (362.57). Although the first three diagnoses can be a consequence of AMD, these diagnoses are not specific to AMD. Including the codes may dilute the target population and may even inadvertently include other primary causes of such diagnoses. The clinical hallmark of non-neovascular AMD is drusen, which are yellowish deposits at the level of the retinal pigment epithelium [62]. According to the clinical classification of AMD [3], early AMD is considered when drusen with a size of >63 μm and ≤125 μm is apparent. Drusen alone is not considered a class of AMD since normal aging changes can present with druse [3]. Nine out of thirteen of the articles incorporated drusen (362.57) as an inclusion to define their AMD cohort, which may again dilute the results since the prevalence of drusen can be as high as 91% in the normal population [63].
In the field of ophthalmology, specifically vitreoretinal diseases, improving the standardized representation of diseases is ongoing. A recent report by Kalaw and colleagues [64] discovered several important retinal diagnoses not represented in the Systematized Nomenclature of Medicine (SNOMED). In one of the articles reviewed in this study [28], polypoidal choroidal vasculopathy, considered a pachychoroid disorder, and idiopathic choroidal neovascularization were defined as AMD, even though these diagnoses warrant a separate coding system due to the nature of the disorder and distinct pathophysiology. A study by Tavakoli and colleagues [65] reported that some ophthalmic infectious and traumatic diagnoses do not accurately match the ICD-10-CM diagnosis and are considered a wide match. Lastly, in a study by Cai and colleagues [66], there were noted gaps in diagnosis codes and eye exam data elements. Future collaborative studies may be needed to supply the missing elements and concepts in ophthalmology.
The present study has limitations. It obtained peer-reviewed articles from PubMed and Web of Science. Other biomedical literature databases, such as Google Scholar or Scopus, may provide more relevant articles. Additionally, the study focused on variations in the use of ICD terminologies. Additional variations may be present when using SNOMED or other standardized terminologies.
Conclusions
In summary, there is substantial variation in the use of ICD diagnosis codes for identifying cohorts of AMD subjects, with possible implications of under-sampling, oversampling, and a lack of reproducibility across studies. This could affect the ongoing efforts in understanding and treating one of the most common diseases in the field of ophthalmology. Awareness among healthcare professionals, especially ophthalmologists, with the appropriate and specific codes should be practiced. Standardization of cohort definitions should be observed to provide reproducible results.
Supplementary Material
supplementarytables
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kalaw FGP; Alex V; Walker E; Bartsch DU; Freeman WR; Borooah S Inner Retinal Thickness and Vasculature in Patients with Reticular Pseudodrusen. Ophthalmic Res. 2023, 66, 873–879.10.1159/00053079937271137 · doi ↗ · pubmed ↗
- 2Cicinelli MV; Rabiolo A; Sacconi R; Carnevali A; Querques L; Bandello F; Querques G coherence tomography angiography in dry age-related macular degeneration. Surv. Ophthalmol 2018, 63, 236–244.28648383 10.1016/j.survophthal.2017.06.005 · doi ↗ · pubmed ↗
- 3Ferris FL; Wilkinson CP; Bird A; Chakravarthy U; Chew E; Csaky K; Sadda SR Clinical Classification of Age-related Macular Degeneration. Ophthalmology 2013,120, 844–851.23332590 10.1016/j.ophtha.2012.10.036PMC 11551519 · doi ↗ · pubmed ↗
- 4Finocchio L; Zeppieri M; Gabai A; Toneatto G; Spadea L; Salati C Recent Developments in Gene Therapy for Neovascular Age-Related Macular Degeneration: A Review. Biomedicines 2023,11, 3221.38137442 10.3390/biomedicines 11123221 PMC 10740940 · doi ↗ · pubmed ↗
- 5Nadeem A; Malik IA; Shariq F; Afridi EK; Taha M; Raufi N; Naveed AK; Iqbal J; Habte A Advancements in the treatment of geographic atrophy: Focus on pegcetacoplan in age-related macular degeneration. Ann. Med. Surg 2023, 85, 6067–6077.10.1097/MS 9.0000000000001466 PMC 1071834438098608 · doi ↗ · pubmed ↗
- 6Tan JCK; Ferdi AC; Gillies MC; Watson SL Clinical Registries in Ophthalmology. Ophthalmology 2019, 126, 655–662.30572076 10.1016/j.ophtha.2018.12.030 · doi ↗ · pubmed ↗
- 7Chiang MF; Sommer A; Rich WL; Lum F; Parke DW The 2016 American Academy of Ophthalmology IRIS®Registry (Intelligent Research in Sight) Database. Ophthalmology 2018,125,1143–1148.29342435 10.1016/j.ophtha.2017.12.001 · doi ↗ · pubmed ↗
- 8The All of Us Research Program Investigators. The “All of Us” Research Program. N. Engl. J. Med 2019, 381, 668–676.31412182 10.1056/NEJ Msr 1809937 PMC 8291101 · doi ↗ · pubmed ↗
