Validation of Trøndelag Apnoea Score Proxy for Obstructive Sleep Apnoea in the General Population of Norway: The HUNT Study

James Filosa; Petter Moe Omland; Knut Hagen; Knut Langsrud; Morten Engstrøm; Trond Sand

PMC · DOI:10.1155/2024/1242505·June 6, 2024

Validation of Trøndelag Apnoea Score Proxy for Obstructive Sleep Apnoea in the General Population of Norway: The HUNT Study

James Filosa, Petter Moe Omland, Knut Hagen, Knut Langsrud, Morten Engstrøm, Trond Sand

PDF

Open Access

TL;DR

A new seven-item Trøndelag Apnoea Score (TASC) was validated as a useful tool for identifying obstructive sleep apnoea in Norway's general population.

Contribution

The study introduces and validates a novel seven-item proxy score for obstructive sleep apnoea in a general population setting.

Findings

01

TASC showed 65% sensitivity and 87% specificity for detecting OSA with AHI ≥ 15.

02

Validity was higher in men and individuals over 50 years of age.

03

OSA prevalence estimates varied significantly based on AHI thresholds and scoring criteria.

Abstract

The aim was to validate a new seven-item “TASC” (Trøndelag Apnoea Score) proxy for obstructive sleep apnoea (OSA) against polysomnography in the general population. Objectives included validation against different polysomnographic criteria, stratification by age and gender, and estimation of OSA prevalence. From the fourth wave of the Trøndelag Health Study (HUNT4), 1,201 participants were randomly invited to a substudy focusing on sleep and headaches, of whom 232 accepted and 84 (64% women, mean age 55.0 years, and standard deviation 11.5 years) underwent polysomnography. The TASC proxy sums seven binary items for snoring, observed breathing pauses, restricted daytime activities, hypertension, body mass index (≥30 kg/m2), age (≥50 years), and gender (male). A single night of ambulatory (home) polysomnography was analysed using both the recommended and optional hypopnoea criteria of the…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases7

restricted daytime activities breathing pauses headaches hypertension snoring Apnoea OSA

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsObstructive Sleep Apnea Research · Neuroscience of respiration and sleep · Gastroesophageal reflux and treatments

Full text

1. Introduction

Obstructive sleep apnoea (OSA) is linked to a reduced quality of life, with unrefreshing sleep, sleepiness, fatigue, and depressive mood as potential mediators [1, 2]. It is strongly associated with adverse health conditions including atrial fibrillation, heart failure, stroke and coronary heart disease [3], and motor vehicle accidents [4]. It is therefore important to develop and validate proxy diagnoses for OSA in epidemiological studies.

The third edition of the International Classification of Sleep Disorders (ICSD-3), by the American Academy of Sleep Medicine (AASM), defines OSA by the number of predominantly obstructive, respiratory events per hour (apnoea-hypopnoea index (AHI), preferably by polysomnography (PSG)), symptoms, and comorbidities [2]. The criteria are met by AHI ≥ 15 alone but also by AHI ≥ 5 plus at least one of the following: snoring, breathing pauses, daytime symptoms of poor sleep (sleepiness, nonrestorative sleep, fatigue, or insomnia symptoms), or a diagnosis of a listed comorbidity (including hypertension). However, researchers typically operationalise the lone AHI cut-offs of 5, 15, and 30, referred to as “mild,” “moderate,” and “severe” OSA.

The prevalence of mild and moderate-to-severe OSA in the general population varies between 9% to 38% and 6% to 17%, respectively [5]. Prevalence is known to increase with body mass index (BMI), age, and male gender [2, 6], but it also doubles with the use of the recommended 2012 AASM criteria compared with 2007 AASM criteria [5, 7, 8]. Accordingly, there are concerns that both the symptoms and comorbidities accepted by the ICSD-3 and the latest AASM criteria for OSA are too inclusive. Two population-based studies among men above 40 years of age estimated the prevalence of ICSD-3 OSA at 52.2% (AHI ≥ 10, 2007 AASM criteria) and 74.4% (AHI ≥ 5, 2012 AASM criteria) [9, 10]. It is therefore of interest to estimate OSA prevalence by different AHI cut-offs and old versus new AASM criteria for hypopnoea. Also, OSA proxies have not previously been comparatively validated against the old and new AASM criteria, to our knowledge.

Although useful in epidemiological and some clinical settings, OSA proxies may never completely replace objective sleep testing [11]. The popular, eight-item STOP-Bang questionnaire was developed for surgical populations to rapidly evaluate the risk of OSA, being associated with perioperative complications [12]. It appeared superior to pre-existing questionnaires [13, 14] and has since been validated in numerous populations [15]. Since neck circumference measurements are rarely available in large epidemiological studies relying on questionnaires, including the fourth wave of the Trøndelag Health Study (HUNT4), it is of great interest to develop a STOP-Bang-inspired proxy without neck circumference.

While the STOP-Bang accepts any of tiredness, fatigue, or sleepiness as daytime symptoms of OSA, it is of interest of simplicity to settle on one daytime symptom of OSA. A recent large cohort study found only a weak association between the Epworth Sleepiness Scale (≥11) and the AHI [16], and excessive daytime sleepiness is less typical among women [17]. Meanwhile, elderly cases may have fewer symptoms altogether [18]. These differences may impact validity and necessitate gender- and age-specific proxy cut-offs [19]. Finally, the ICSD-3 states no minimum frequency of daytime symptoms. It is therefore of interest to validate a new proxy incorporating different daytime symptoms, of different frequencies, specified in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and the ICSD-3 (e.g., insomnia), in age and gender strata.

The general aim of this population-based study was to validate a new seven-item proxy for OSA, named the Trøndelag Apnoea Score (TASC), using items for snoring, breathing pauses, daytime symptoms, hypertension, BMI, age, and gender, against a PSG-based diagnosis, in an adult general population subsample from HUNT4 in Norway. The main objective was to validate several cut-offs of the TASC against AHI ≥ 5, AHI ≥ 15, and AHI ≥ 30, using the recommended AASM criteria. Secondary objectives were to study validity by the choice of daytime symptom (restricted daytime activities, sleepiness, or tiredness), study validity by the choice of minimum frequency of symptoms, stratify validity by age and gender strata, validate the TASC against the optional AASM criteria, and estimate prevalence of PSG- and proxy-based prevalence of OSA.

2. Methods

2.1. Participants

HUNT4 took place between August 2017 and February 2019. All residents above 20 years of age (meeting the World Health Organization criteria of adulthood) of the defunct Nord-Trøndelag county were invited to two questionnaires, a structured interview and a clinical examination [20], which 56,078 out of 96,469 residents (58%) underwent. Next, HUNT4 participants from Stjørdal municipality were invited by postal mail to an approved HUNT4 substudy named Sleep and Pain. Stjørdal municipality is a 938 km^2^ agricultural area with a small town centre and 23,165 inhabitants sufficiently representative of Trøndelag county. Out of 1,201 randomly selected HUNT4 participants, 232 (19%) agreed by telephone and were scheduled appointments at the interview site in Stjørdal in November 2017. They completed a waiting room questionnaire about sleep and health pending a face-to-face interview about sleep, health, headache, and pain. Finally, the participants of Sleep and Pain study were invited to a nerve conduction study (included for differential diagnostics of restless leg syndrome) and a single-night ambulatory PSG, which was initially accepted by 87 participants. However, two participants withdrew after the nerve conduction study, and the polysomnogram of another was omitted because of technical issues, resulting in 84 participants for the current PSG study.

The HUNT4 organisation performed the randomisation, and the regional ethics committee approved the invitation letter. The median time delay between questionnaire completion in the Sleep and Pain substudy and the PSG study was 11 months (range 3–22 months, 94% within 15 months). We documented clinical findings including advice for clinical follow-up in the electronic patient journal, sending copies to the participant and their general practitioner. Participants previously with diagnosed OSA (10 out of 232: 4%; 4 out of 84 with PSG: 5%) or any other disease were included. The validity of questionnaire-based diagnoses for insomnia, primary headaches, and restless leg syndrome has been published for this sample [21–23].

2.2. Questionnaires and Other Health-Related Data

In the Sleep and Pain substudy, participants completed the Karolinska Sleep Questionnaire (KSQ) [24] which includes items for snoring, apnoeas, and presumed daytime symptoms of sleep disturbances (Supplementary Table 1). Other questionnaires included the Epworth Sleepiness Scale [25], Insomnia Severity Index (ISI) [26], and the Hospital Anxiety and Depression Scale (HADS) [27]. Participants listed their health conditions, which were supplemented by their medication list. On the day of the ambulatory PSG, the height and weight of participants were measured using a wall-mounted height measuring tape and a bathroom-type body weight scale (SECA®, Hamburg, Germany), respectively.

2.3. The TASC Proxy

Inspired by the STOP-Bang [12], the TASC proxy dichotomises and sums seven OSA-relevant items. From the KSQ, loud and embarrassing snoring (according to others), breathing pauses during the night (according to others), and restricted daytime activities (spare time, school, or job) were each scored if reported “mostly/at least three times a week” (Supplementary Table 1). The remaining four items were hypertension (by questionnaire or list of medications), BMI above 30 kg/m^2^, age above 50 years, and male gender.

The main TASC proxy used restricted daytime activities as its daytime symptom of OSA because of its inclusion in the main HUNT4 questionnaire and its close resemblance to the DSM-5 insomnia diagnosis (criterion B). However, we also incorporated bothersome daytime sleepiness and bothersome daytime tiredness/fatigue into alternative proxies: “TASC-sleepy” and “TASC-tired.” Finally, a more liberal “TASC-monthly” allowed the three KSQ items to be reported “sometimes/at least once a month.”

2.4. PSG: Setup and OSA Diagnoses

The PSG equipment for the single night of unattended, ambulatory PSG was mounted at St. Olavs Hospital, Trondheim University Hospital, at 12:00 the preceding day. Participants were instructed to avoid alcohol, hypnotic drugs, and napping after dinner (unless done routinely), to go to sleep between 22:00 and 00:00 under undisturbed conditions (as normal), and to document the lights off and lights on time plus any awakenings (e.g., visits to the toilet). The equipment was dismantled at 08:00 the following morning.

The PSG was recorded using SOMNOscreen plus PSG equipment (SOMNOmedics GmbH®, Randersacker, Germany). Six electroencephalography (EEG) electrodes were placed according to the International 10-20 system: F3, F4, C3, C4, O1, and O2. Two electrooculographic electrodes were placed: 1 cm laterally and 2 cm above the right eye cantus and 1 cm laterally and 2 cm below the left eye cantus. Mastoid M1 and M2 were reference electrodes for electrooculographic and the contralateral EEG electrodes. Surface electromyography was registered from the submental and bilateral anterior tibial muscles. Nasal flow and naso-oral thermistor were affixed above the upper lip. Thoracic and abdominal piezoelectric respiratory effort belts were applied. Pulse oximetry was recorded from the index finger. The PSG was analysed using DOMINO® (version 3.0.2, SOMNOmedics).

As per the latest wording of the AASM manual (February 2023) [28], we used both the recommended (1a) and the optional (1b) hypopnoea criteria to calculate the AHI. The recommended hypopnoea criteria score a hypopnoea if there is an airflow signal drop of ≥30% for ≥10 seconds, associated with either an EEG arousal or a 3% oxygen desaturation. The more conservative, optional hypopnoea criteria instead require the hypopnoea to be associated with a 4% oxygen desaturation, disregarding any arousal. Our senior sleep expert calculated the AHI manually for each polysomnogram, first using the optional criteria and then (several months later) using the recommended criteria on raw PSG data, without any previously scored markers but not formally blinded to the initial scoring. For validation purposes, we used AHI cut-offs 5, 15, and 30. For prevalence estimation, we additionally assigned the ICSD-3 diagnosis to participants with AHI ≥ 5 plus at least one of the following: AHI ≥ 15, a complaint of snoring, breathing pauses, a daytime symptom (restricted activities, sleepiness, or tiredness; Table 1) or any insomnia symptom (difficulty falling asleep, falling back asleep, or waking up too early), at least three times a week, or any self-reported comorbidity listed by the ICSD-3 [2].

2.5. Statistics

Sensitivity, specificity, predictive values, and Cohen's kappa (κ) statistic [29] were calculated from two-by-two cross-tabulations of proxy cut-offs versus AHI cut-offs (recommended AASM criteria). The TASC cut-offs ≥2, ≥3, and ≥4 were explored in the main analysis, the results of which guided further analyses. Cohen's κ was interpreted as poor (κ ≤ 0.20), acceptable (0.20 ≤ κ ≤ 0.40), good (0.40 ≤ κ ≤ 0.60), very good (0.60 ≤ κ ≤ 0.80), or excellent (κ ≥ 0.80) overall validity [30]. Validity was stratified by age (below or above 50 years) and gender (women, men) and additionally calculated for the optional AASM criteria. Ninety-five percent confidence intervals (95% CI) for sensitivity, specificity, predictive values, and prevalence were calculated using the exact Clopper-Pearson method for binomial proportions [31]. The 95% CI for Cohen's κ used the asymptotic standard error generated by IBM SPSS®. We also produced some main receiver operating characteristic (ROC) curves. We used Microsoft Office Excel 2016 and IBM SPSS® version 28 to analyse the data.

Out of the 84 participants with complete PSG analyses, response rates for the three KSQ items regarding daytime symptoms were 100%. However, 10 participants (12%) failed to answer the item regarding breathing pauses, of whom four (5%) also failed to answer the item about loud and embarrassing snoring. For the primary analysis, the response option “never” was imputed for these 14 blank responses as to include all 84 participants. In a supplementary sensitivity analysis, we treated the blank responses to these questions as missing observations, initially leaving only 74 participants with determined proxy scores. For many of the proxy cut-offs however, these 10 participants could be classified as either definite positives or definite negatives owing to the score from the remaining five TASC items. Hence, the final sample size in the supplementary analysis varied between 78 and 82.

3. Results

3.1. Population Characteristics

Out of a total of 84 participants with complete PSG analyses, 55 (65%) were women and 60 (71%) were above 50 years of age (Table 1). The mean age of the total sample was 55.0 years, and the mean BMI was 27.2. Twenty-four percent of participants had a BMI above 30, 21% had hypertension, 21% was medicated for other cardiovascular diseases or risk factors, and 32% had DSM-5 insomnia by a diagnostic interview. Other comorbidities were less frequent (e.g., 5% diabetes, 2% asthma, 1% cancer history, and 8% polyneuropathy). The mean AHI of the total sample was 14.9 and 7.9 using the recommended and optional AASM criteria, respectively, being higher among men and participants above 50 years of age (Table 1).

The proportion of participants reporting KSQ items mostly/at least three times a week was 25% for loud and embarrassing snoring, 11% for breathing pauses, 11% for restricted daytime activities, 20% for daytime sleepiness, and 25% for daytime tiredness. Men reported more snoring and breathing pauses (Table 1). Three participants were omitted due to incomplete PSG analyses: two women and one man, all above 50 years of age. Their TASC scores were 1, 1, and 4, respectively. A further 10 participants had blank responses to questions concerning snoring or breathing pauses, of whom 10 had AHI ≥ 5, six had AHI ≥ 15, and one had AHI ≥ 30 (recommended AASM criteria).

3.2. Validity of the TASC Proxy (Recommended AASM Criteria)

The optimal TASC cut-off was ≥2 against AHI ≥ 5, ≥3 against AHI ≥ 15, and ≥4 against AHI ≥ 30 (Table 2). However, TASC ≥ 2 showed only acceptable validity against AHI ≥ 5 (sensitivity = 67%, specificity = 74%, Cohen's κ = 0.35) while TASC ≥ 3 showed good validity against AHI ≥ 15 (sensitivity = 65%, specificity = 87%, Cohen's κ = 0.53), as did TASC ≥ 4 against AHI ≥ 30 (sensitivity = 54%, specificity = 93%, Cohen's κ = 0.48). Higher proxy cut-offs favoured specificity whereas higher AHI cut-offs favoured sensitivity. See Tables 2, 3, and 4 for predictive values. See Figure 1 for the associated ROC curves.

3.3. Validity by Daytime Symptom and Symptom Frequency (Recommended AASM Criteria)

The major difference between the alternative proxies was the higher sensitivity, but lower specificity, of the more liberal TASC-monthly (Table 3). TASC-sleepy and TASC-tired were only slightly more sensitive and less specific, compared to TASC. Still, all four proxies showed acceptable validity using proxy cut‐off ≥ 2 against AHI ≥ 5 (Cohen's κ 0.30−0.35), good validity using proxy cut‐off ≥ 3 against AHI ≥ 15 (Cohen's κ 0.45−0.53), and good validity using proxy cut‐off ≥ 4 against AHI ≥ 30 (Cohen's κ 0.43−0.51). See Figure 2 for the associated ROC curves.

3.4. Gender- and Age-Stratified Validity (Recommended AASM Criteria)

Against AHI ≥ 15, the optimal TASC cut-off was ≥2 among those below 50 years of age and women and ≥3 among those above 50 years of age and men (Table 4). Using these stratum-specific proxy cut-offs, TASC was slightly more sensitive for those above, versus below 50 years of age (69% vs. 60%), while similarly specific (82% vs. 84%), yielding slightly higher validity in the older age group (Cohen's κ = 0.52 vs. 0.41). TASC was also more sensitive among men, compared with women (88% vs. 73%), while similarly specific (69% vs. 75%), resulting in higher validity among men (Cohen's κ = 0.58 vs. 0.43).

3.5. Prevalence of OSA: ICSD-3, AHI Categories, and the TASC Proxy

Using the recommended AASM criteria, the prevalence of ICSD-3 OSA was 61% for the total sample, 53% among women, 76% among men, 42% among those below 50 years, and 68% among those above 50 years of age (Table 5). Using the recommended AASM criteria in the total sample, the prevalence of AHI ≥ 5, AHI ≥ 15, and AHI ≥ 30 was 73%, 37%, and 15%, respectively. The corresponding estimates using the optional AASM criteria were 46%, 18%, and 5%.

The prevalence of TASC ≥ 2, ≥3, and ≥4 was 56%, 32%, and 14%, respectively, in the total sample. Although TASC ≥ 2 produced the closest estimate to the ICSD-3 in the total sample, it overestimated prevalence among men (90% vs. 76%) and underestimated prevalence among women (38% vs. 53%) and those below 50 years of age (25% vs. 42%).

3.6. Validity against Optional AASM Criteria

Using the optional AASM criteria instead, the optimal cut-off for TASC was 3 against AHI ≥ 5 and AHI ≥ 15 and 4 against AHI ≥ 30 (Supplementary Table 2). Using TASC ≥ 3 against AHI ≥ 15, sensitivity was higher (73% vs. 65%), but specificity was lower (77% vs. 87%), than with the use of recommended AASM criteria, and validity was only acceptable (Cohen's κ 0.38 vs. 0.53).

3.7. Validity of the TASC Proxy, excluding Blank Respondents to Snoring and Breathing Pauses (Recommended AASM Criteria)

Excluding blank respondents to snoring and breathing pauses (according to others), instead of imputing missing responses, yielded slightly higher sensitivity and validity but similar validity overall (Cohen's κ 0.33−0.58, Supplementary Table 3).

4. Discussion

In this population-based sample, we found good validity (65% sensitivity, 87% specificity, Cohen's κ = 0.53) of a seven-item STOP-Bang-inspired proxy for OSA (TASC), using the cut‐off ≥ 3, against PSG-based AHI ≥ 15 (recommended AASM scoring criteria). Validity was similar against AHI ≥ 30, but mostly acceptable against AHI ≥ 5. There were minimal differences when incorporating different, alternative daytime symptoms into the TASC proxy. Sensitivity and overall validity were higher among men compared with women and in those above versus below 50 years of age. Validity was only acceptable using the conservative optional AASM criteria. Using the recommended AASM criteria, the prevalence of AHI ≥ 5, AHI ≥ 15, and AHI ≥ 30 was 73%, 37%, and 15%, versus 46%, 18%, and 5%, using the more conservative optional criteria. The prevalence of ICSD-3 OSA was 61% with the recommended and 37% with the optional AASM criteria. TASC ≥ 3 was reasonably prevalent in this sample, at 32% overall.

4.1. Comparisons with Other Validation Studies

In a recent systematic review and meta-analysis, Chen et al. [32] identified five validation studies of the STOP-Bang in the general population [14, 33–36]. Against AHI ≥ 5, AHI ≥ 15, and AHI ≥ 30, the authors estimated pooled sensitivity at 73%, 88%, and 92% and pooled specificity at 66%, 42%, and 38%, respectively. Given the pooled prevalence rates, this corresponds to Cohen's κ estimates of merely 0.39 against AHI ≥ 5, 0.17 against AHI ≥ 15, and 0.07 against AHI ≥ 30. While we found similar estimates of sensitivity and specificity, we found considerably higher validity overall (Cohen's κ = 0.41−0.53). However, one should note that the STOP-Bang cut-off was fixed at ≥3 while we let TASC cut-off varies between ≥2 and ≥4. Considering that three out of the five identified studies [14, 33, 35] used a 4% desaturation threshold only to score hypopnoea, the pooled results should perhaps be compared to our results against the optional AASM criteria instead, which indicated lower validity (Cohen's κ = 0.33−0.38).

Beyond the proxy cut-off, comparisons with the pooled results are hindered by the use of type 3 devices (without sleep and EEG arousal scoring) in two studies [33, 36], which have roughly 90% sensitivity and specificity against the gold-standard PSG [37]. Given an underlying relationship between questionnaire scores and the PSG, the use of type 3 devices introduces nondifferential misclassification of cases and noncases which will weaken the observed STOP-Bang validity. In all, our present TASC scores appear more valid than the STOP-Bang although there are few population-based studies using the gold-standard PSG [14, 34].

Whereas the STOP-Bang only requires the symptom frequency “often” for daytime tiredness, fatigue, or sleepiness (T) [12], the current KSQ-based TASC proxy specifies a symptom frequency of “mostly/at least three times a week” for snoring, breathing pauses, and daytime symptoms. We found no advantage of the more relaxed frequency criteria “sometimes/at least once a month” on overall validity, against any AHI cut-off, using any set of AASM criteria. Considering the focus on frequency criteria, the current proxies may also be likened to the Berlin Questionnaire [38]. It focuses on the “STOP” items (particularly snoring and tiredness) and requires a frequency of “nearly every day” or “3−4 times a week” for five out of 10 items, congruent with our main KSQ response option of “mostly/at least three times a week.” In a systematic review, Senaratna et al. [39] identified two validation studies of the Berlin Questionnaire in the general population. Hrubos-Strøm et al. [40] found only 37% sensitivity and 84% specificity (Cohen's κ = 0.20) against AHI ≥ 5 and 43% sensitivity and 80% specificity (Cohen's κ = 0.13) against AHI ≥ 15, using a 4% desaturation threshold for hypopnoea scoring. Meanwhile, Kang et al. [41] found 69% sensitivity and 83% specificity (Cohen's κ = 0.48) against AHI ≥ 5 and 89% sensitivity and 63% specificity (Cohen's κ = 0.40) against AHI ≥ 15, using a 3% threshold to score hypopnoeas. Hence, from the few available studies in the general population, the explicit use of a minimum symptom frequency (Berlin Questionnaire and the current TASC) may be an improvement on the vaguer wording of the STOP-Bang.

Marti-Soler et al. [34] derived and optimised a five-item score called NoSAS to a large population-based sample. Against AHI ≥ 20, the NoSAS outperformed both the STOP-Bang and Berlin Questionnaire in two separate cohorts (Cohen's κ = 0.37−0.39 vs. 0.15−0.22). This difference may not be completely attributed to the use of predefined cut-offs for the STOP-Bang and Berlin Questionnaire, as the NoSAS also had a greater area (0.74−0.81 vs. 0.63−0.68) under the ROC curve.

The validity of the current TASC ≥ 3 may also be compared with that of proxies for interview-verified headache and sleep disorder diagnoses in the same sample [21–23]. As judged by Cohen's κ, TASC ≥ 3 performed similarly to proxies for headache suffering, migraine, insomnia, and unspecified restless leg syndrome (Cohen's κ = 0.45−0.57) and better than the proxy for tension-type headache (Cohen's κ = 0.33).

4.2. Items and Strata

There were minimal differences in validity between the different daytime symptoms. The agreement proportion between any two TASC proxies (≥3) with different daytime symptoms (restricted activities, sleepiness, or tiredness) was at least 96% (Cohen's κ ≥ 0.92, not tabulated), partially because 72% to 89% of participants with TASC ≥ 3 did not report the targeted symptom at least three times a week. The choice of daytime symptom may then seem insignificant, but the agreement proportion of the three daytime symptoms was comparatively low, at 86% (Cohen's κ 0.46−0.59). Hence, the choice of daytime symptom may have a larger effect on simpler proxies that are less reliant on other items and on proxies that require a lower minimum frequency. We advocate restricted daytime activities as the most clinically relevant daytime symptom of OSA, given both its concordance with the DSM-5 diagnosis of insomnia and recent studies reporting weak associations between OSA and daytime sleepiness [16, 42]. Restricted daytime activities may also be partially viewed as the result of daytime tiredness, sleepiness, or fatigue.

Altogether, 63% to 72% of participants with OSA (depending on the proxy) reported at least one symptom (tiredness, snoring, or breathing pauses) at least three times a week. This contrasts with another population-based study, in the middle age, in which only a minority of participants with moderate-to-severe OSA reported symptoms [42]. The discrepancy may be due to the authors' use of the Epworth Sleepiness Scale, known to correlate poorly with the AHI [16], and highlights the need to standardise symptom evaluation by OSA proxies. Note that proxy-positive participants include asymptomatic cases, as a TASC score of four can be obtained from hypertension, BMI, age, and gender. The fact that asymptomatic cases do not necessarily benefit from treatment [43] deems such proxy scores more suitable to epidemiological studies than to clinical decision-making.

While the STOP-Bang uses BMI ≥ 35, we chose BMI ≥ 30 for better suitability to our population-based sample with a mean BMI of 27.2. Similarly, previous studies have found the optimal BMI cut-off to depend on ethnicity and gender, down to 30 for women [19] and as low as 27.5 in certain populations [44]. However, there is a growing concern that BMI fails to capture adiposity in all demographics. Some studies have used the waist-to-hip ratio as an alternative among women [45]. Perhaps the limitations of the BMI partially explain why our TASC proxy was more valid among men than women. Using health care use as a clinical endpoint, a large longitudinal study among persons aged between 45 and 85 years (age range of 85% of our sample) found similar risks whether obesity was defined by BMI, waist circumference, waist-hip-ratio, or body fat percentage [46]. The correlation between obesity and health care use was however weaker in the higher age categories [46].

Snoring and breathing pauses according to a bed partner may need special consideration since many participants lack a bed partner. As the 10 blank respondents had a higher AHI overall, we found slightly higher validity when not imputing these responses to “never.” Perhaps the lack of a bed partner should be viewed as a marker of poorer health, including greater risk of OSA. In a supplementary analysis (not tabulated), we explored six-item proxies by successively removing one of the 7 TASC items. Notably, the removal of breathing pauses slightly increased the estimate of validity (Cohen's κ = 0.55), while the removal of snoring or BMI slightly decreased the estimate of validity (Cohen's κ 0.48 and 0.45, respectively).

We found higher validity among those above (vs. below) 50 years of age. By comparison, we previously found lower validity of questionnaire-based diagnoses for insomnia and restless leg syndrome among elderly participants in the same population, proposing age-dependent decreases in reading comprehension or increases in competing causes of symptoms as potential mechanisms [21, 22]. Both these factors may have been weakened by the current inclusion of nonquestionnaire items and by the lower age cut-off in the current study (≥50 vs. ≥65 years). We also found the TASC to be more sensitive and more valid overall, among men compared with women, similarly to Bauters et al. [33]. The positive relation between validity and both age and male gender is evident from the stratified summary of the PSG (Table 1), in which older participants and men (in particular) show more obstructive sleep than their counterparts.

4.3. The Choice of Proxy Cut-Off

While we let Cohen's κ compare the overall validity of different proxy cut-offs, the intended application must also be taken into account. TASC ≥ 3 (sensitivity = 65%, specificity = 87%, Cohen's κ = 0.53) may be optimal for correlation studies wherein specificity is key, as to not dilute identified cases with false positives. Conversely, TASC ≥ 2 (sensitivity = 87%, specificity = 62%, Cohen's κ = 0.45) may be more suitable in screening settings, although OSA proxies are deemed unfit to replace objective sleep testing in the clinical setting [11]. Regarding prevalence estimation, one may be tempted to choose the cut-off that produces the closest prevalence estimate to the gold-standard reference in the validation study. However, Diggle [47] has shown that the proxy prevalence (and its closeness to the gold-standard) depends on the interplay between sensitivity, specificity, and the gold-standard prevalence itself. Hence, the optimal proxy cut-off for prevalence estimation should be based on overall validity rather than on the closeness in prevalence between proxy and gold-standard in a given study. In our study, TASC ≥ 2 produced the closest prevalence estimate to the ICSD-3 diagnosis and AHI ≥ 5, while TASC ≥ 3 was the closest to AHI ≥ 15 (recommended AASM criteria).

4.4. The Choice of AHI Cut-Off

The prevalence of OSA varied greatly with the choice of AASM (or ICSD-3) criteria and AHI cut-off and between gender and age categories. The particularly high prevalence of AHI ≥ 5 (and ICSD-3 OSA), at 73% with recommended AASM criteria, raises questions about its clinical and epidemiological relevance, in this population at least. The issue partly remains for AHI ≥ 5 using the more conservative, optional AASM criteria (46% prevalence). We therefore suggest greater relevance of AHI ≥ 15 than AHI ≥ 5, using the latest recommended AASM criteria.

The choice of AHI cut-off also affected the balance between sensitivity and specificity of our OSA proxies (at a fixed proxy cut-off). Compared with AHI ≥ 15, the proxies were more specific (less sensitive) against AHI ≥ 5 and more sensitive (less specific) against AHI ≥ 30, a trend also seen in previous validation studies [32]. Although a formal mathematical proof of this relation is beyond the scope of this study, one should note that changes in the AHI cut-off and the proxy cut-off have opposite effects on the balance between sensitivity and specificity.

4.5. Strengths and Limitations

A major strength of this study is its population-based recruitment of 1,201 HUNT4 participants. However, the sequential recruitment of participants via those who underwent the interview [21–23], and the joint invitation to the PSG study and a nerve conduction study, may have lowered the participation rate to 7% out of the initial 1,201 HUNT4 participants. By comparison, we achieved an 18% participation rate for ambulatory PSG alone in the HUNT3 PSG study [48]. Selection bias is most evident in the over-representation of women, elderly, and persons with insomnia (interview focus). On the other hand, an enrichment of subjects with sleep health issues ensured an adequate number of OSA cases from the 84 PSGs.

Regarding the PSG procedure itself, one major strength was the analysis using both the recommended and the optional AASM criteria. While there was a considerable delay between questionnaire completion and the PSG for many participants, sleep questionnaires like the Pittsburgh Sleep Quality Index seem to be reliable across several months [49], hypertension and untreated OSA can be considered stable traits (OSA prevalence increasing very slowly until age 65) [5], and BMI was calculated during the PSG setup. Although the AHI is known to between consecutive nights, suggesting repeated PSGs for a clinical diagnosis [50], estimated AHI night-to-night reliability is high in most studies [51]. Using a single night of PSG might be considered a weakness, but the so-called “first night effect” appears minimal for ambulatory PSG recordings [52, 53].

5. Conclusion

In this population-based sample, we found good validity of a new seven-item TASC proxy for OSA, against AHI ≥ 15 using a PSG-based gold-standard with the recommended AASM criteria. Validity was similar against AHI ≥ 30, but lower against AHI ≥ 5 and against the more conservative, optional, AASM criteria.

Sensitivity and overall validity were higher among men compared with women and in those above versus below 50 years of age. A seven-item TASC proxy for OSA should accordingly be useful in epidemiological studies. Researchers and clinicians should note how sensitivity, specificity, and validity vary by cut-off, polysomnographic criteria, and demographic strata.

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Lee W. Lee S. A. Ryu H. U. Chung Y. S. Kim W. S. Quality of life in patients with obstructive sleep apnea: relationship with daytime sleepiness, sleep quality, depression, and apnea severity Chronic Respiratory Disease 2016131333910.1177/14799723156063122-s 2.0-8495766461926396158 PMC 5720196 · doi ↗ · pubmed ↗
2American Academy of Sleep Medicine International classification of sleep disorders Sleep-related breathing disorders 20233 rd Darien IL American Academy of Sleep Medicine 6386
3Drager L. F. Mc Evoy R. D. Barbe F. Lorenzi-Filho G. Redline S. INCOSACT Initiative (International Collaboration of Sleep Apnea Cardiovascular Trialists) Sleep apnea and cardiovascular disease: lessons from recent trials and need for team science Circulation 2017136191840185010.1161/CIRCULATIONAHA.117.0294002-s 2.0-8503357228129109195 PMC 5689452 · doi ↗ · pubmed ↗
4Tregear S. Reston J. Schoelles K. Phillips B. Obstructive sleep apnea and risk of motor vehicle crash: systematic review and meta-analysis Journal of Clinical Sleep Medicine 20095657358110.5664/jcsm.2766220465027 PMC 2792976 · doi ↗ · pubmed ↗
5Senaratna C. V. Perret J. L. Lodge C. J. Prevalence of obstructive sleep apnea in the general population: a systematic review Sleep Medicine Reviews 201734708110.1016/j.smrv.2016.07.0022-s 2.0-8499415783927568340 · doi ↗ · pubmed ↗
6Young T. Palta M. Dempsey J. Skatrud J. Weber S. Badr S. The occurrence of sleep-disordered breathing among middle-aged adults The New England Journal of Medicine 1993328171230123510.1056/NEJM 1993042932817042-s 2.0-00274625628464434 · doi ↗ · pubmed ↗
7Duce B. Milosavljevic J. Hukins C. The 2012 AASM respiratory event criteria increase the incidence of hypopneas in an adult sleep center population Journal of Clinical Sleep Medicine 201511121425143110.5664/jcsm.52802-s 2.0-8495454379826285111 PMC 4661335 · doi ↗ · pubmed ↗
8Hirotsu C. Haba-Rubio J. Andries D. Effect of three hypopnea scoring criteria on OSA prevalence and associated comorbidities in the general population Journal of Clinical Sleep Medicine 201915218319410.5664/jcsm.76122-s 2.0-8506206914530736872 PMC 6374086 · doi ↗ · pubmed ↗