The 83 symptoms of tinnitus: Content overlap of commonly used scales for tinnitus burden
Milena Engelke, Jorge Piano Simões, Berthold Langguth, Winfried Schlee, Laura Basso, Paul Delano, Paul Delano, Paul Delano

TL;DR
This study compares eight questionnaires used to assess tinnitus burden and finds they measure different symptoms, suggesting a lack of consistency in how tinnitus is evaluated.
Contribution
The study provides a comprehensive analysis of content overlap among tinnitus PROMs, revealing high heterogeneity and limited consistency.
Findings
83 distinct symptoms were identified across the eight PROMs.
The TQ had the highest number of unique symptoms, while the THI had the least.
The TFI showed the highest mean overlap with other PROMs.
Abstract
Clinical management of tinnitus remains challenging due to unclear etiology and diverse phenotypic manifestations. To quantify its associated burden, a variety of patient-reported outcome measures (PROMs) are used. This study aims to comprehensively evaluate the content overlap of items between eight PROMs commonly used in tinnitus research. A two-stage, blinded multi-rater process was used to analyze the content of all 199 items from the International Tinnitus Inventory (ITI), Subjective Tinnitus Severity Scale (STSS), Tinnitus Functional Index (TFI), Tinnitus Handicap Inventory (THI), Tinnitus Handicap Questionnaire (THQ), Tinnitus Primary Function Questionnaire (TPFQ), Tinnitus Questionnaire (TQ), and Tinnitus Reaction Questionnaire (TRQ). The Jaccard Index was used to measure pairwise content overlap between scales. The analysis revealed 83 distinct symptoms. “Concentration” was…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing, Cochlea, Tinnitus, Genetics · Cerebral Venous Sinus Thrombosis · Hearing Loss and Rehabilitation
Introduction
Tinnitus is a condition characterized by the perception of sounds without a corresponding external stimulus. The condition affects 14.4% of the world population, and 2.3% of the population is debilitated by the condition [1]. The etiology of tinnitus remains incompletely understood, and although several risk factors being associated to it, the evidence for these factors is considered low. However, hearing loss, either due to presbycusis or insult to the auditory system is considered the main risk factor [2,3]. There are several neurophysiological models that explain the origin and maintenance of tinnitus [4]. A main challenge to disentangle tinnitus etiological underpinnings remains a better understanding of its heterogeneity. Given the variable clinical presentation of tinnitus, transitioning from understanding its manifestations in an individual patient to assessing its impact on his/her daily live is crucial for effective management.
Tinnitus heterogeneity has been implicated not only in etiological variability, but also in its diverse clinical manifestation. For instance, tinnitus can be accompanied by several comorbidities, all of which may require clinical care to mitigate disease burden and psychological distress [2,5–7]. As there is no established biomarker, researchers and clinicians usually resort to patient reported outcomes (PROMs) to quantify tinnitus burden. A recent systematic literature review identified 33 tinnitus-related PROMs, spanning constructs like coping, acceptance, catastrophizing and hearing problems [8]. Of these, eight measured the construct “tinnitus burden”, also operationalized as tinnitus distress, tinnitus severity, negative impact of tinnitus, tinnitus handicap and tinnitus-related complaints. Next to tinnitus loudness, tinnitus distress is the primary outcome domain most often reported in clinical trials [9], and therefore a crucial component to measure psychological burden and to evaluate the effectiveness of clinical interventions. Commonly used instruments in tinnitus research and clinical practice to measure tinnitus-related burden include the Tinnitus Handicap Inventory (THI) [10], the Tinnitus Functional Index (TFI) [11], the Tinnitus Questionnaire (TQ) [12] and the Tinnitus Reaction Questionnaire (TRQ) [13]. Thus, different instruments are applied for the quantification of tinnitus-related burden, with the THI and TFI being the most widely used instruments [9].
The lack of a gold-standard instrument can be problematic as authors seldom justify why a specific questionnaire was chosen to be administered. Legacy data, historical gold standards in a subfield, the country/region where the study was conducted, language availability, copyright access, time frame of the study and type of intervention are practical aspects which may play a role for employing certain questionnaires [9], although the justification for choosing a specific questionnaire is not reported in most cases. This gives the impression of an implicit, albeit untested assumption, that PROMs claiming to measure the same construct can be used interchangeably; not only in the tinnitus field, but across health research in general [14,15]. However, there is mounting evidence that PROMs used in psychiatric and psychosomatic research do not capture similar constructs, such as non-overlapping classification of severity [16] or different factor structures [17].
Previous research has shown that total scores of PROMs measuring tinnitus-related burden are strongly correlated (TFI, THI, THQ [Tinnitus Handicap Questionnaire] [18], TQ, TRQ) [19]. Boecking and colleagues showed that the intraclass correlation of total scores of the TQ, TFI, and THI ranges between 0.72 and 0.83 [20]. However, even if total scores of PROMs correlate, it remains unclear whether they measure the same construct. As shown by Fried, high correlation between total scores can be achieved between scales with disparate items, highlighting the relevance of also evaluating the content of individual items [15].
In this study, we investigated the extent to which item content overlaps between tinnitus-burden PROMs. To do so, we utilized the same framework proposed by Fried [15]. Thus, we posit that PROMs measuring tinnitus burden are only interchangeable to the extent that their item content exhibits overlap.
Methods
Data
A previous systematic review identifying questionnaires in otology was consulted to pick all relevant PROMs measuring the construct “tinnitus burden”, also operationalized as tinnitus distress, tinnitus severity, negative impact of tinnitus, tinnitus handicap and tinnitus-related complaints [8], which resulted in the inclusion of the following eight questionnaires: The International Tinnitus Inventory (ITI) [21], the Subjective Tinnitus Severity Scale (STSS) [22], the Tinnitus Functional Index (TFI) [11], the Tinnitus Handicap Inventory (THI) [10], the Tinnitus Handicap Questionnaire (THQ) [18], the Tinnitus Primary Function Questionnaire (TPFQ) [23], the Tinnitus Questionnaire (TQ) [12], and the Tinnitus Reaction Questionnaire (TRQ) [13]. Throughout this manuscript, the terms PROM, questionnaire and scale are used interchangeably.
Questionnaires
The 8-item ITI was designed as a streamlined instrument for utilization in clinical settings to shed light on the predominant complaints of tinnitus [21]. The STSS consists of 16 items and was developed to quantify tinnitus severity within a single score [22]. The 25-item TFI measures severity and negative impact of tinnitus and has been developed with a specific focus on responsiveness to treatment effects [11]. The 25-item THI is one of the most widespread tinnitus PROMs with three subscales reflecting functional, emotional and catastrophic responses of tinnitus [10]. The THQ has 27 items with three underlying factors addressing the patients’ physical, emotional and social health, their hearing ability and their view on tinnitus [18]. The TPFQ is a 20-item questionnaire that queries impairment of tinnitus in the domains emotion, hearing, sleep and concentration [23]. The TQ stands as the earliest questionnaire in this series and is also the lengthiest one with 52 items measuring complaints resulting from tinnitus [12]. The TRQ consists of 26 items and was designed to assess psychological stress associated with tinnitus [13]. All questionnaires have been psychometrically validated [19–24].
Procedure
After the relevant questionnaires had been identified, their item content was assessed in a two-stage process. We used the English version of all questionnaires in the subsequent analysis. In the first stage, the three raters [JPS], [ME] and [LB] individually labelled each of the 199 items with keywords best describing its content. The raters were blinded to each other’s labels and were instructed to identify short keywords to describe each item, and to review their labels after at least 24 hours. Individual rates took place between October and December 2023. The second stage labelling took place immediately after the first stage, as all three reviewers compared their labels and obtained consensus when necessary. Within this second step, labels were also systematically compared within and between questionnaires to ensure the utilization of uniform labels for analogue items, adhering to the conservative methodology outlined by Fried [15]. To give some examples, items were labelled uniformly if they were coded reversed (hearing clearly vs. hearing difficulty), belonged to the same category (job vs. household responsibilities) or were weighted differently (feeling ill vs. feeling terribly diseased). This consensus version was used for the final analysis. All individual rater labels, as well as the final labels, are publicly available (https://doi.org/10.5281/zenodo.17854750). For consistency with the protocol developed by Fried [15], we employ the term “symptom” to designate the label of the items.
Further, to regroup the identified symptoms, we implemented a semi-automated approach: First, a large language model was applied three times to categorize the symptoms. Second, a researcher team consisting of tinnitus experts and psychologists (ME, JPS, LB) reviewed the three versions to develop the final categories. Please note that it was the same team that rated the items in the first stage, as familiarity with the content of the questionnaires was helpful when regrouping. For the first step, we prompted Chat-GPT on the 2^nd^ of January 2024 (version 3.5) three times to categorize them by content criteria (prompt: “Here’s a list of keywords, each presented at a different row: [list of symptoms]. Please cluster them according to similar topics. All words should be assigned to only one topic”). The generated categories (first round: emotional well-being, emotional distress, attitude and perception, health and physical symptoms, communication and support; second round: sleep and sleep-related issues, auditory challenges, emotional well-being, general health issues, coping mechanisms, social and relationship impact, existential and emotional struggles, enjoyment and activities, personal development and well-being; third round: sleep-related issues, auditory disturbances, coping mechanisms, emotional distress, quality of life impact, personal perception and attitude, enjoyment and relaxation) were reviewed and synthesized by the authors and used as the basis for the final categorization, with additional consideration of categories commonly used in the literature. The established categories were then used to visualize the overlapping latent structure of the questionnaires.
Statistical methods
Following the item rating, the Jaccard Index was used to measure pairwise content overlap between items from the questionnaires. This index is commonly used for binary data and ranges from 0 to 1, with 0 indicating no overlap, and 1 indicating complete overlap between items. It can be calculated with the following formula:
With s representing the number of items two questionnaires are sharing, while u1 and u2 represent the number of items that are unique in each questionnaire. According to Fried [15], the following categorization of the Jaccard Index is adopted: very weak 0–0.19, weak 0.20–0.39, moderate 0.40–0.59, strong 0.60–0.79, and very strong 0.80–1 [25].
All the analyses were performed in R [version 4.2.2] [26] and based on the script provided by Fried [15].
Results
The rating procedure of the content of the 199 items from eight PROMs measuring tinnitus burden resulted in 83 different symptoms (see Fig 1). Labeling of items revealed analogue items within-questionnaire for five questionnaires (STSS, THQ, TPFQ, TQ, TRQ) which resulted in a length of 174 adjusted items (see Table 1). Comparing the item labels between-questionnaire resulted in a final symptom list of 83 symptoms. Of those, 41 were idiosyncratic and appeared only in one scale. In relation to the scale length, the THI had least idiosyncratic symptoms (N = 1; 4%), while the TQ had most idiosyncratic symptoms (N = 21, 52.5%).
Table 1: Distribution of symptoms per scale.
Occurrence of 83 tinnitus burden symptoms across eight questionnaires.The symptoms were regrouped and labeled according to a semi-automated approach to aid visualization. ITI (International Tinnitus Inventory), STSS (Subjective Tinnitus Severity Scale), TFI (Tinnitus Functional Index), THI (Tinnitus Handicap Inventory), THQ (Tinnitus Handicap Questionnaire), TPFQ (Tinnitus Primary Function Questionnaire), TQ (Tinnitus Questionnaire), TRQ (Tinnitus Reaction Questionnaire). Auditory D. = Auditory Disturbances.
From a symptom-level perspective, a symptom was measured on average in two scales (Median = 2, SD = 1.5). 41 symptoms (49.4%) occurred only in one scale, while one symptom (1.2%) appeared in seven scales (Concentration) and three symptoms (3.6%) were identified in six scales (Annoyed, Falling/staying asleep and Enjoying life). Anxiety/worry, Depression and Relaxing appeared in five scales. There was no symptom that was featured across all eight scales (see Table 2).
Table 2: Number of symptoms that appeared across number of scales.
According to the Jaccard Index, the scale overlap between the relevant PROMs ranges from very weak to weak (0.02–0.35; see Table 3), with the highest overlap among TFI and THI (0.35) and the lowest overlap between ITI and TQ (0.02). The TFI had the highest mean overlap with other questionnaires (0.26), followed by the THI (0.23). The TQ showed the lowest mean overlap with other questionnaires (0.11). On average, the investigated PROMs had a Jaccard Index of 0.18 (SD = 0.05), which corresponds to a very weak mean overlap of the scales.
Table 3: Scale overlap.
The following 10 higher-level categories were used to group the symptoms: Emotional Distress, Auditory Disturbances, Sleep, Cognitive Disturbances, Personal Perception/Attitude, Coping, Health/Physical Symptoms, Enjoyment/Relaxation, Impact on Daily Life, Communication/Support. The allocation of symptoms and questionnaires to those categories are shown in Fig 1. Overall, the categories Emotional Distress (24 symptoms, captured by eight scales) and Personal Perception/Attitude (19 symptoms, captured by seven scales) were covered the most, while the categories Coping (three symptoms, captured by four scales) and Communication/Support (three symptoms, captured by three scales) were covered the least; see Table 4. The categories Emotional Distress, Enjoyment/Relaxation, and Sleep were the only ones captured by all eight questionnaires.
Table 4: Coverage of higher-level categories by symptoms and scales.
Discussion
In this study, we investigated the content overlap of eight questionnaires measuring tinnitus burden. We identified 83 symptoms from a total of 199 items, indicating high symptom heterogeneity and little content overlap. These findings are aligned with the ones previously found in depression [15], sleep disorder [27], mental health [14], trauma [28], mental pain [29], neurological soft signs [30], obsessive compulsive disorder [31], mania [32], anxiety [33], and romantic-relationship [34].
The highest yet still weak overlap between questionnaires based on the Jaccard Index was observed among TFI and THI (0.35) and the lowest overlap between ITI and TQ (0.02). The TFI had the highest mean overlap with the other questionnaires (0.26), the TQ showed the lowest mean overlap with the other questionnaires (0.11). The highest mean overlap suggests that the TFI comes closest to the content measured by all other PROMs. However, we would like to stretch that this is not a direct measure of content validity as we cannot exclude the possibility that important aspects of subjective suffering are not depicted in any of the questionnaires.
The most featured symptoms across the eight questionnaires were: concentration (present in 7 PROMs), enjoying life, falling/staying asleep, annoyed (present in 6 PROMs), anxiety/worry, depression, relaxing (present in 5 PROMs). Notably, five of these symptoms were identified as core outcome domains of interest in a previous Delphi study (“concentration”, “quality of sleep”, “tinnitus intrusiveness”, “negative thoughts/beliefs”, “mood”) [35]. Almost half of the 83 symptoms (49.4%), including, e.g., acceptance, torment, and avoiding social situations, were only featured in one PROM. The TQ contained the most idiosyncratic symptoms (21 symptoms, i.e., 52.5% of the questionnaire), which can be partially explained by its length (52 items compared to the average of 21 items among the other 7 PROMs included in the analysis). This implies that the TQ assesses many symptoms that are not featured by other PROMs.
Notably, we found surprising few symptoms related to somatic complaints. Of the investigated PROMs, only the TQ featured a few somatic symptoms (Pain in ear/head, Headache, Muscle tension). Its relation with tinnitus burden is well established in the literature, with a previous study reporting 42% of patients with somatic symptom disorder also suffering from tinnitus [36]. For instance, dizziness and hyperacusis are commonly occurring somatic comorbidities which, despite their clinical relevance [37,38], are not part of any of the PROMs investigated. Likewise, the relationship between tinnitus and pain has been previously established in empirical [39] and theoretical [40,41] works. Moreover, patients with hyperacusis and chronic pain were found to have higher TFI scores [42], highlighting the impact of somatic symptoms on tinnitus burden even without being directly measured. However, it remains an open question whether somatic symptoms directly characterize tinnitus burden and should thus be incorporated into a PROM assessing tinnitus burden, or whether these instruments should focus strictly on tinnitus-specific burden and not assess any related (somatic) comorbidities.
Our finding, indicating that tinnitus burden is not measured as a unitary construct but often encompasses various idiosyncratic symptoms, carries another significant implication. Previous research investigated tinnitus heterogeneity in terms of the diverse acoustic presentation, the broad spectrum of comorbidities or unique sociodemographic risk factors [43,44]. Additional to the interindividual variability in tinnitus related symptoms, the use of different outcome measures might lead to different study results. This may serve as a contributing factor to the empirical findings of low consistencies between responder rates among different PROMs [20,45], acknowledging that there may be additional explanations. A practical consequence of this finding is that tinnitus-burden PROMs should not be assumed to be interchangeable.
Based on our analysis, the overlap among the PROMs under investigation ranges from weak to very weak. Consequently, comparing results across trials that employ different outcome measures should either be avoided or approached with great caution. Currently, the THI and the TFI are the most frequently utilized PROMs in clinical tinnitus trials [9]. Standardizing the use of these PROMs in future studies could enhance the comparability of results. Additionally, when evaluating interventions targeting specific symptom domains, it may be advantageous to incorporate supplementary questionnaires that are designed to assess those particular domains more comprehensively.
The interpretation of this analysis should take into account the subjective nature of the rating process, acknowledging that different raters may have arrived at alternative conclusions. Nevertheless, adhering to the methodology outlined by Fried [15], we maintained a conservative approach, suggesting that the number of symptoms collected by the analyzed PROMs is likely underestimated. Simultaneously, a different set of PROMs analyzed would have led to another result. The selection of PROMs was based on their inclusion in a comprehensive review, and we posit that the chosen PROMs effectively represent the questionnaires commonly employed for measuring tinnitus burden. In addition, the items were not checked for completeness, i.e., whether they comprehensively measure the construct tinnitus burden, but only according to their frequency and overlap in PROMs. It’s essential to also recognize that questionnaire data provide only snapshots in time, failing to capture the dynamic and fluctuating nature of tinnitus symptoms along with their associated burden [46]. Moreover, identified symptoms were clustered by topic in a bottom-up manner (see Fig. 1). Future research could compare these data-driven categories with conceptual categories from PROMs.
This work is further limited by the lack of comparison with empirically reported symptoms of tinnitus burden. A recent study labeled and categorized answers from 678 patients to the question Why is tinnitus a problem? [47]. They identified 18 problem domains of which “Reduced quality of life” was most frequently represented. While this symptom is at least not directly covered by the PROMs analyzed here, seven items related to the impact on daily life were identified across seven PROMs. Following that, “Fear,” “Constant Awareness”, “Annoyance,” and “Inability to Concentrate” accounted for the majority of reports. Notably, while concentration is collected by almost every tinnitus burden PROM, awareness is only incorporated in three out of eight questionnaires. Further, a patient survey found loudness reduction to be most relevant from a patients’ perspective, which was only featured across three PROMs (TFI, TQ, STSS) [48]. A systematic comparison between symptoms sampled by PROMs and those reported by patients could establish external validity offering valuable insights for stakeholders in the selection of PROMs. We believe that the patients’ perspective is crucial for the development of relevant outcome measurements.
Conclusion
As demonstrated, we found considerable symptom heterogeneity and limited content overlap across tinnitus burden PROMs. The highest overlap, albeit weak, was found between the TFI and the THI. This finding has important practical implications: tinnitus-burden PROMs should not be assumed to be interchangeable. Consequently, we strongly encourage researchers and clinicians to make informed, domain-based and patient-centered decisions when selecting PROMs.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Jarach CM, Lugo A, Scala M, van den Brandt PA, Cederroth CR, Odone A, et al. Global Prevalence and Incidence of Tinnitus: A Systematic Review and Meta-analysis. JAMA Neurol. 2022;79(9):888–900. doi: 10.1001/jamaneurol.2022.2189 35939312 PMC 9361184 · doi ↗ · pubmed ↗
- 2Baguley D, Mc Ferran D, Hall D. Tinnitus. Lancet. 2013;382(9904):1600–7.23827090 10.1016/S 0140-6736(13)60142-7 · doi ↗ · pubmed ↗
- 3Biswas R, Genitsaridi E, Trpchevska N, Lugo A, Schlee W, Cederroth CR, et al. Low Evidence for Tinnitus Risk Factors: A Systematic Review and Meta-analysis. J Assoc Res Otolaryngol. 2023;24(1):81–94. doi: 10.1007/s 10162-022-00874-y 36380120 PMC 9971395 · doi ↗ · pubmed ↗
- 4Tyler RS. Neurophysiological models, psychological models, and treatments for tinnitus. Tinnitus treatment: clinical protocols. New York: Thieme. 2006:1–22.
- 5De Ridder D, Schlee W, Vanneste S, Londero A, Weisz N, Kleinjung T, et al. Tinnitus and tinnitus disorder: Theoretical and operational definitions (an international multidisciplinary proposal). Prog Brain Res. 2021;260:1–25. doi: 10.1016/bs.pbr.2020.12.002 33637213 · doi ↗ · pubmed ↗
- 6Fuller T, Cima R, Langguth B, Mazurek B, Vlaeyen JW, Hoare DJ. Cognitive behavioural therapy for tinnitus. Cochrane Database Syst Rev. 2020;1(1):CD 012614. doi: 10.1002/14651858.CD 012614.pub 2 31912887 PMC 6956618 · doi ↗ · pubmed ↗
- 7Tyler R, Perreauf A, Mohr A-M, Ji H, Mancini PC. An Exploratory Step Toward Measuring the “Meaning of Life” in Patients with Tinnitus and in Cochlear Implant Users. J Am Acad Audiol. 2020;31(4):277–85. doi: 10.3766/jaaa.19022 31580805 · doi ↗ · pubmed ↗
- 8Viergever K, Kraak JT, Bruinewoud EM, Ket JCF, Kramer SE, Merkus P. Questionnaires in otology: a systematic mapping review. Syst Rev. 2021;10(1):119. doi: 10.1186/s 13643-021-01659-9 33879248 PMC 8059288 · doi ↗ · pubmed ↗
