The Effectiveness of Ultrasound Visual Biofeedback in Articulation Therapy for Children and Adolescents With Speech Sound Disorders: A Systematic Review and Meta‐Analysis
Oi Yan Yiu, Wilfred Hing Sang Wong, Felicia Pokta, Agnes Yun Ting Fong, Winnie Wan Yee Tso, Patrick Ip

TL;DR
This study reviews and analyzes how well ultrasound visual biofeedback helps children and teens with speech disorders improve their articulation.
Contribution
The paper provides a novel meta-analysis of U-VBF effectiveness in SSD therapy, offering a quantitative benchmark for future research.
Findings
A pooled effect size of 1.32 indicates substantial improvement in articulation accuracy with U-VBF.
High heterogeneity and potential publication bias suggest variability in study outcomes.
SSD subtype-specific patterns and intensity influences were identified, but limited by small sample sizes.
Abstract
Speech sound disorders (SSD) may compromise speech intelligibility in children and adolescents, impacting communication, social interactions and academic performance. Ultrasound visual biofeedback (U‐VBF) has emerged as a promising tool for articulation therapy, providing real‐time tongue movement visualization, yet its evidence base requires updating given methodological advancements. This systematic review and meta‐analysis updates the evidence for the effectiveness of the use of U‐VBF in improving articulation accuracy for paediatric SSD, synthesizing recent studies with robust designs to quantify outcomes and inform clinical adoption. Following PRISMA guidelines, a preregistered review (PROSPERO: CRD42024627408) involved searching PubMed, Web of Science, CINAHL Plus, Cochrane Library and Linguistics and Language Behaviour Abstracts (January 2000–July 2025) for studies on U‐VBF in…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3| Study (Country) | Study design | Participants | Treatment details | |
|---|---|---|---|---|
|
| Approaches, target | Intensity | ||
| Bernhardt et al. | Single‐group pre‐post case series |
3M, 1F |
U‐VBF, EPG /s/, /ʃ/, /ɹ/, /l/, /i/, /ɪ/, /u/, /ʊ/ |
|
| Bernhardt et al. | Individual case study design |
3M, 1F |
U‐VBF, EPG /s/, /ʃ/, /ɹ/, /l/, /i/, /ɪ/, /u/, /ʊ/ |
|
| Adler‐Bock et al. | Individual case study design |
2M |
U‐VBF with traditional therapy /r/ |
|
| Bacsfalvi et al. | Single‐group pre‐post case series |
2M, 1F |
U‐VBF, EPG Feedback Therapy /i/, /ɪ/, /u/, /ʊ/ |
|
| Bernhardt et al. | Case series (BCB research design) |
8M, 5F |
U‐VBF consultation, traditional treatment /ɹ/, sibilants |
|
| Modha et al. | SCED—Alternating treatment design |
|
U‐VBF, traditional therapy /ɹ/ |
Dose: Trials not reported, 30–45 min session Dose frequency: 1 time/week Total intervention duration: 9 weeks (9 sessions) |
| Bacsfalvi | SCED—Multiple baseline design |
1M, 2F |
U‐VBF /r/ |
|
| Lipetz and Bernhardt | SCED—Single‐subject two‐phase case study |
|
U‐VBF with traditional therapy hierarchies, U‐VBF with palatrography and voice training /s/, /z/, /ʃ/, /ʧ/ |
|
| Preston et al. | SCED—Multiple baseline design |
6M |
U‐VBF with traditional approaches sound sequences |
|
| Byun et al. | SCED—Multiple baseline design |
2M, 2F |
U‐VBF rhotics |
|
| Byun et al. | SCED—Multiple baseline design |
2M, 2F |
U‐VBF rhotics |
|
| Cleland et al. | Case series |
5M, 2F |
U‐VBF with articulation hierarchy or motor‐based approach /k/, /g/, /r/, /ʃ/, /t/ |
|
| Hitchcock and Byun | Single‐case study design |
|
U‐VBF guided by CPF /r/ |
|
| Lee et al. | Case study |
|
U‐VBF /r/ |
|
| Bressmann et al. | Randomized controlled trial |
5M, 1F |
U‐VBF, articulation therapy /ɹ/ |
|
| Heng et al. | SCED—Multiple baseline design |
1M, 1F |
U‐VBF /k/, /g/ |
|
| Preston et al. | Case series |
3M |
U‐VBF with chaining, auditory perception training /r/, /s/, /tʃ/ |
|
| Preston et al. | SCED—Multiple baseline design |
3M |
U‐VBF /ɹ/ |
|
| Roxburgh et al. | Single‐case study design |
|
U‐VBF with articulation hierarchy or motor‐based approach /n/, velars |
|
| Sjolie et al. | SCED—Single‐subject randomized block design |
not reported |
U‐VBF with chaining procedures /ɹ/ |
|
| Preston et al. | SCED—Multiple baseline design |
8M, 4F |
U‐VBF with principles of motor learning (PML) /ɹ/ |
|
| Preston et al. | SCED—Alternating‐treatments, single‐subject experimental design |
4M, 2F |
U‐VBF, Auditory Perception Training /ɹ/, /s/ |
|
| Preston et al. | SCED—Multiple baseline design |
12M |
U‐VBF guided by CPF Vocalic /ɹ/ |
|
| Cleland et al. | Case series |
12M, 3F |
U‐VBF with motor‐based therapy approach /k/, /ɛ/, /ʃ/, /s/ |
|
| Preston et al. | SCED—Multiple baseline design |
9M, 3F |
U‐VBF guided by CPF. /ɹ/ |
|
| Preston et al. | Randomized controlled trial |
Randomized: 23M, 15F Analyzed: n = 36, not reported |
U‐VBF guided by CPP. /ɹ/ |
|
| Benway et al. | SCED—Single‐case randomization design |
3M, 4F |
U‐VBF with visual‐acoustic biofeedback, traditional visual and auditory placement cues /ɹ/ |
|
| Cleland and Scobbie | Single‐group pre‐post case series |
3M, 2F |
U‐VBF with traditional visual and auditory placement cues /k/ |
|
| Gibson and Lee | SCED—Single‐subject experimental, multiple baseline design |
|
U‐VBF, Motor‐phonetic articulation approach /k/, /g/ |
|
| Raaz et al. | Case report |
|
U‐VBF guided by SAILS and CPP /ɹ/ |
|
| McAllister et al. | Single‐group pretest‐posttest design |
not reported |
U‐VBF with traditional treatment guided by CPP /ɹ/ |
|
| McCabe et al. | Randomized controlled trial |
Enrolled:
11M, 3F Analysed:
|
U‐VBF with motor chaining procedures, ReST. Not reported |
|
| Spencer et al. | Control group study |
Tx group:
10 M, 6F TD group:
9M, 8F |
U‐VBF guided by CPF /ɹ/ |
|
| Preston et al. | Randomized controlled trial |
Enrolled:
39M, 17F Analysed:
|
U‐VBF with motor chaining Not reported |
|
| Hashemi Hosseinabad and Xing | SCED—Single‐subject multiple baseline experiment |
2M, 3F |
U‐VBF with motor‐based therapy /r/, /s/, /sh/ |
|
| Study | Outcomes | Quality rating | Meta‐Analysis | |
|---|---|---|---|---|
| eligible (Y/N) | ||||
| Outcome measures | Treatment effect | |||
| Bernhardt et al. | TSA for treated and untreated consonants in single words | TSA improved significantly for treated (36%) versus untreated (15%) consonants across participants. No significant difference in effectiveness between EPG and ultrasound technologies. | 10/30 | N |
| Bernhardt et al. | Perceptual ratings on consonants and vowels in nonsense word samples | Significantly improved ratings noted in some participants or sounds. | 9/30 | N |
| Adler‐Bock et al. | TSA for targeted words at word and phrase levels | TSA increased at phrase level from pre‐ to post‐treatment for both participants. | 9/30 | N |
| Bacsfalvi et al. | Transcription ratings for trained and untrained vowels in CVC words | Vowel production improved toward adult English targets, with changes in transcription ratings for some vowels. | 7/30 | N |
| Bernhardt et al. | TSA for single words of standardized test containing target sounds | TSA improved in eleven participants. Greater gains were observed in participants having more consultation time with ultrasound. | 8/30 | N |
| Modha et al. | TSA for targeted consonants in probes of various context, pre‐, mid‐, and post‐treatment | TSA improved throughout treatment, with or without ultrasound. | 7/30 | N |
| Bacsfalvi | TSA for targeted consonants in word and syllable probes | Significant /r/ TSA improvement at word level observed in one participant only. | 8/30 | Y |
| Lipetz and Bernhardt | TSA of target words in tasks at different linguistic level from single word to spontaneous narrative production | Similar production accuracy after Phase I, while improvement noted after Phase II and mostly maintained in 4‐months follow‐up time. | 8/30 | Y |
| Preston et al. | TSA for treated and untreated sound sequences and PPC, before and within treatment and 2‐month follow‐up | All participants achieved 80% accuracy for 2 consecutive sessions for some treated sound sequences, maintained at 2 months. PPC increased significantly from pre‐treatment to follow‐up. | 12/30 | Y |
| Byun et al. | TSA for treated and untreated words with targeted sounds, baseline, treatment, and maintenance | Minimal treatment effects; half of participants showed no clinically meaningful effect size. | 18/30 | Y |
| Byun et al. | TSA for treated and untreated words with targeted sounds, baseline, treatment, and maintenance | Generalization to untreated words without biofeedback observed, with large to very large effect sizes for all participants. | 18/30 | Y |
| Cleland et al. | TSA for untreated word probes, baseline, mid‐, and post‐therapy | TSA increased by 95% (range 70–100%) from baseline to 6 weeks post‐therapy. 100% non‐overlapping data across participants indicated high effectiveness. | 13/30 | N |
| Hitchcock and Byun | TSA for /r/ in word and sentence probes, baseline, treatment, maintenance, and follow‐up | /r/ TSA reached ∼100% at word and sentence levels, maintained through follow‐up. Effect sizes ranged from ‐0.82 to 37.6 for vocalic/consonantal /r/. | 16/30 | N |
| Lee et al. | TSA of target sound and non‐treatment target in one‐syllable word | Increase in production accuracy of targets across treatment and maintained for 2 weeks after treatment. | 8/30 | N |
| Bressmann et al. | Speech acceptability judgments; sound classification (perceived as target or not) | Significant improvement in final and follow‐up assessments versus intake. No significant difference between biofeedback and non‐biofeedback groups. | Some concerns | N |
| Heng et al. | TSA for treated and untreated sounds at syllable and word levels, acquisition, retention, and generalization | One participant showed acquisition and retention; no improvement in another participant. | 10/30 | Y |
| Preston et al. | TSA for targeted sounds in onset and rhyme positions | All participants showed acquisition (effect sizes 0.4–16.1); generalization and retention results were mixed across participants. | 18/30 | N |
| Preston et al. | TSA for treated and untreated words during treatment training | TSA increased in two participants during practice trials, but no generalization to untreated words observed. | 14/30 | Y |
| Roxburgh et al. | Listener judgments (perceived as closer to target); TSA for targeted consonants | Significant TSA improvement for both participants in VAM and U‐VBF therapies from baseline to maintenance. One participant showed possible deterioration in U‐VBF. | 12/30 | N |
| Sjolie et al. | TSA for targeted sounds in acquisition, retention, and generalization probes | Two participants showed acquisition; one had significant ultrasound advantage (d = 0.78). No significant retention or generalization differences between ultrasound and non‐ultrasound sessions. | 18/30 | Y |
| Preston et al. | TSA for generalization probes and sentence imitation, pre‐ and 2‐month follow‐up | TSA increased by 29.7% (PML+US) and 33.2% (PML+No US), with effect sizes of 4.9 and 6.16. Retention observed at 2‐month follow‐up (16% to 45%). | 17/30 | Y |
| Preston et al. | TSA for untrained word‐level generalization probes | TSA increased by 38.1% (prosodic) and 31.0% (non‐prosodic), with effect sizes of 14.52 and 8.31, respectively. | 19/30 | Y |
| Preston et al. | TSA for untreated word probes, baseline, treatment, and maintenance | TSA improved across all participants (median effect size 6.16). High‐frequency (HF) UVF outperformed low‐frequency (LF) UVF; treatment order (HF‐LF vs. LF‐HF) impacted gains. | 15/30 | Y |
| Cleland et al. | TSA for untreated wordlists; ICS | TSA showed variable generalization; five children had no effect, others had small to large effects. ICS increased significantly from baseline to maintenance in some children. | 19/30 | N |
| Preston et al. | TSA for untreated word probes | U‐VBF outperformed traditional treatment (effect sizes ‐2.4 to 21.75 vs. ‐0.39 to 8.78). Early U‐VBF yielded better outcomes; individual responses varied. | 17/30 | Y |
| Preston et al. | TSA for untrained word probes, pre‐, mid‐, post‐treatment, and 2‐month follow‐up | No significant difference between U‐VBF and P+U‐VBF groups; both showed 34% TSA increase, retained at 2 months. | Low risk of bias | N |
| Benway et al. | TSA for untrained word probes | Three participants showed clinically significant generalization (effect sizes 1.71–3.11) to untreated words post‐treatment. | 16/30 | Y |
| Cleland and Scobbie | TSA for treated word probes | All participants showed positive TSA changes from pre‐ to post‐intervention, with maintenance in three participants at 6 weeks post‐treatment. | 14/30 | N |
| Gibson and Lee | TSA for treated and untreated sound probes, baseline, treatment, and post‐intervention | TSA reached 80% for target sounds, with maintenance at 2 weeks and generalization at 2 months post‐intervention. | 12/30 | Y |
| Raaz et al. | TSA for trained and untrained liquid phoneme probes, baseline, mid‐, and post‐treatment | Perceptual accuracy for /ɹ/ and /l/ improved substantially, indicating generalization to untreated contexts. | 13/30 | N |
| McAllister et al. | TSA for untreated word probes with target sounds | TSA increased by 38% on average; four participants showed clinically significant responses. Low‐stimulability cluster responded strongly to ultrasound; high‐stimulability cluster gained in both treatments. | 16/30 | N |
| McCabe et al. | PPC and prosodic severity in untreated word and sentence probes, pre‐, post‐, and 1‐month follow‐up | Large PPC improvements from pre‐ to post‐treatment with ReST and U‐VBF, retained at 1 month. Prosodic severity also improved significantly. | Low risk of bias | N |
| Spencer et al. | TSA for untreated word probes with target sounds | Participants showed varied responses to U‐VBF, with mean effect size of 5.25 indicating clinically significant generalization. | 12/30 | Y |
| Preston et al. | TSA for untreated phrase probes at 5, 10, and 15 weeks | U‐VBF showed significant TSA gains at 5 weeks, not sustained at 10 weeks. Intensive treatment enhanced generalization at all time points. | Low risk of bias | N |
| Hashemi Hosseinabad and Xing | TSA for untreated word probes, baseline, mid‐, post‐therapy, and maintenance; ICS | TSA increased significantly (effect sizes 1.11–3.17); generalization varied. ICS improved post‐treatment and remained stable during maintenance. | 17/30 | Y |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Language Development and Disorders · Stuttering Research and Treatment
Introduction
1
Speech sound disorders (SSD) may severely compromise one's ability to produce intelligible speech, which are associated with limitations in communication, social interactions and academic performance (McCormack et al. 2011, 2009, Foster et al. 2023). Socioemotional difficulties have been reported in school children with speech errors, with greater impact observed in older children (Hitchcock et al. 2015; Wren et al. 2023). With SSD forming a substantial part of speech therapy caseloads globally, surveys report that although speech‐language pathologists (SLPs) acknowledge the potential efficacy of instrumental methods such as ultrasound and electropalatography (EPG), their application remains minimal (McLeod and Baker 2014). Recent studies reveal growing interest in technology‐supported tools, despite concerns about over‐reliance on device screens and potential distractions from non‐therapy app features (Furlong et al. 2021), clinicians have expressed increasingly favourable attitudes toward ultrasound biofeedback, notwithstanding ongoing accessibility barriers (Dugan et al. 2023). In light of this shifting landscape, the use of ultrasound visual biofeedback (U‐VBF) has become an attractive tool that helps clients and clinicians overcome these hurdles by providing real‐time visual feedback on tongue movement and supplementary data, enabling precise cueing and adjustment and presenting a new paradigm for augmenting therapy in children with SSD.
U‐VBF constitutes an evolution in speech therapy for the management of articulation, building upon earlier visual biofeedback methods such as EPG. EPG is effective in tracking tongue–palate contact during speech production but requires custom‐fitted palates, limiting accessibility. In contrast, advancements in ultrasound technology have introduced systems with fast frame rates which are more accessible (Cleland et al. 2022), providing dynamic images of tongue movements which allow clinicians and children to visualize and adjust articulatory patterns in real time. Both techniques have unique merits—EPG excels in precise contact data, while U‐VBF facilitates broader tongue shape observation without custom hardware—yet the lower cost and ease of use of U‐VBF have contributed to its growing prominence in clinical and research settings. This is particularly valuable for paediatric populations, where clinical observations suggest that children may find abstract verbal cues challenging to implement, due to their developmental stage. There is increasing evidence supporting the use and effectiveness of U‐VBF in achieving improved speech outcomes across various subtypes of SSDs (Sugden et al. 2019). As the use of U‐VBF becomes increasingly available, the potential of this approach to transform clinical practice continues to grow, warranting a comprehensive synthesis of the latest evidence.
The need to update the evidence base for U‐VBF stems from considerable advancements since the foundational systematic review by Sugden et al. (2019) was published. That review collated early studies on U‐VBF for developmental SSD, which indicated its potential to be an effective adjunct for persistent speech errors. However, the evidence was limited by small sample sizes, the largest being 13 and predominantly lower‐strength designs, including only one randomized controlled trial (RCT). Its findings were further limited by the lack of effect size reporting in the included studies. Since that time, more recent studies have reported RCTs with larger sample sizes (McAllister et al. 2020, Preston et al. 2020, Cleland et al. 2022, McCabe et al. 2023) that have advanced the evidence base with more robust designs, while other studies (e.g. Cleland et al. 2019, Benway et al. 2021, Spencer et al. 2023) have provided data allowing proper calculation of effect size for meta‐analysis, further strengthening the quantitative evidence base through statistical power.
This systematic review and meta‐analysis aims to update the evidence for U‐VBF in paediatric SSD, extending the work of the published review by combining data from recent studies with robust designs and to evaluate effectiveness using pooled effect sizes of articulation accuracy outcomes. The findings are intended to equip SLPs with evidence‐based insights to inform the adoption of U‐VBF in clinical practice and direct future research toward individualized interventions for children and adolescents with SSD that enhance cross‐study comparability and build a more robust evidence base.
Methods
2
Protocol Registration
2.1
The systematic review was conducted in accordance with established methodological guidelines (Higgins et al. 2019) and reported following the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) (Page et al. 2021). It was registered on the international systematic review registry PROSPERO (CRD42024627408).
Eligibility Criteria
2.2
Studies were included based on the following PICO criteria: Population—children and adolescents aged under 18 with SSD; Intervention—U‐VBF aimed at articulation; Comparator—speech or articulation therapy (as an active comparison condition) or a no‐treatment control group; Outcome—articulation accuracy (e.g. percentage of consonants correct [PCC], speech intelligibility or patient‐reported speech outcomes). All study designs were eligible, including RCTs, cohort studies, case‐control studies and single‐case studies. Studies were excluded if they were not treatment studies, involved adults or reported outcomes unrelated to speech sound production.
Search Strategy
2.3
A systematic literature search was conducted using the following databases: PubMed, Web of Science, CINAHL Plus, Cochrane Library and Linguistics and Language Behaviour Abstracts, from January 2000 to July 2025. Compared to Sugden et al. (2019), this protocol adopts a post‐2000 cutoff to focus on contemporary ultrasound advancements and excludes studies with adult participants for a stricter paediatric emphasis, while expanding database coverage. Search terms targeted four concepts: ultrasound, speech therapy, speech disorders and child. Medical Subject Headings (MeSH) were used in PubMed search, for example, ‘ultrasonography’, ‘speech therapy’, ‘speech disorders’ and ‘child’, with keyword variations adapted for other databases, with Boolean operators (AND) to combine terms. No language restrictions were applied. The full search strategy is provided in Table S1 in supplementary material.
Study Selection and Data Extraction
2.4
Study selection and data extraction were conducted using Covidence, an online systematic review tool. Three SLPs participated as reviewers; two independently screened titles and abstracts, reviewed full texts to determine inclusion and extracted data from eligible articles. Discrepancies were resolved by consensus during videoconferences. Extracted data included study characteristics (e.g. country of origin, study design), participant details (e.g. demographics, medical background), intervention specifics (e.g. dose, dose frequency, total intervention duration) and speech outcome measures (e.g. PCC, time point of assessment). For meta‐analysis, Busk and Serlin's d2 effect sizes, Cohen's d, Hedges’ g or data enabling their calculation (e.g. means of pre‐ and post‐treatment data, standard deviations) were extracted, to evaluate articulation accuracy outcomes.
Risk of Bias Assessment
2.5
Risk of bias was evaluated using standard tools within Covidence. Two reviewers independently rated each study and reached a consensus when discrepancies arose. For non‐RCTs, the Risk of Bias in N‐of‐1 Trials (RoBiNT) Scale (Tate et al. 2013) was employed, comprising 15 items related to internal and external validity (e.g. design, blinding, treatment adherence, data analysis). These items are rated as 0, 1 or 2 points where higher scores reflect lower bias, with a maximum of 30 points. For RCTs, the Risk of Bias 2 (RoB 2) tool (Sterne et al. 2019) was used to assess five domains: randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome and selection of reported result, yielding judgments of low risk of bias, some concerns or high risk of bias. Bias ratings informed meta‐analysis sensitivity analyses, which test primary result robustness by excluding high risk studies (defined as RoBiNT scores <10/30 or RoB 2 ‘high risk of bias’).
Data Synthesis and Meta‐Analysis
2.6
To prepare for meta‐analysis, Busk and Serlin's d2 effect size, Cohen's d or Hedges’ g were extracted directly from included studies if available. For studies that reported sufficient data (e.g. means and standard deviations at pre‐ and immediate post‐treatment, as other follow‐up time points varied widely across studies and lacked consistency for pooled analysis), individual‐level d2 (for single‐case experimental designs [SCEDs]) or Cohen's d/ Hedges’ g (for groups) were calculated based on study type (e.g. d2 as the mean difference divided by the pooled standard deviation; Busk and Serlin (1992)). For multi‐participant studies, individual d2s were then aggregated to study‐level d2 via weighted summarization (inverse variance using standard errors [SEs], approximated using proxies such as the reported number of trials in probe lists where SEs were unreported (Borenstein et al. 2009)). Busk and Serlin's d2 was adopted to report the effect size of SCEDs, while Cohen's d or Hedges’ g was used for group studies. Studies of case study/case series, or with insufficient data for deriving effect size would not be included in meta‐analysis.
A random‐effects meta‐analysis was conducted for eligible studies, prioritizing d2 for SCEDs, with narrative synthesis for group studies where pooling was infeasible due to limited data. The DerSimonian‐Laird method was used to pool effect sizes in Jeffrey's Amazing Statistics Program (JASP), given the anticipated heterogeneity in study designs and populations (DerSimonian and Laird 2015). This model accounts for both within‐study sampling error and between‐study variability by assuming that the true effect size differs across studies due to factors such as population differences or methodological variations, rather than a single fixed effect, providing a more conservative estimate suitable for heterogeneous data. Heterogeneity was assessed using Cochran's Q test. Sensitivity analyses excluded studies with high risk of bias—defined as scores below 10 points on RoBINT or high risk of bias on RoB 2, to test robustness of results by evaluating changes in heterogeneity (Q), effect size and predictive interval. Forest plot showing effect sizes and confidence intervals for individual studies alongside the pooled overall effect, and funnel plot of effect sizes against study precision measure to detect asymmetry indicative of publication bias, were generated in JASP.
Results
3
The study selection process for this systematic review is illustrated in a PRISMA flow chart (Figure 1). From 545 references imported, 77 duplicates were removed and 203 were excluded as irrelevant by automated screening tools, leaving 265 for title and abstract review. Of the 265 studies screened by title and abstract, 208 were excluded. Fifty‐seven studies underwent full‐text review, with 22 excluded for reasons stated in the chart. Thirty‐five studies, reported in 34 articles (including one article reporting two studies), were included for analysis.
Preferred reporting items for systematic reviews and meta‐analyses flowchart of the process of study selection.
Study Characteristics
3.1
The 35 studies spanned four countries: the USA (n = 20, 57.1%), Canada (n = 9, 25.7%), the UK (n = 4, 11.4%) and Australia (n = 2, 5.7%). This distribution highlights a significant Anglophone bias, as English is the official or primary national language in all included countries. These studies included 13 case studies and case series, 16 SCEDs and six group designs. Among the 16 SCEDs, 11 were of multiple baseline design (68.8%). Other SCED types included alternating treatment designs with randomization (n = 1, 6.3%), randomized block design (n = 1, 6.3%), alternating treatment designs (n = 2, 12.5%), as well as a two‐phase design (n = 1, 6.3%). The six group designs (17.1%) consisted of RCT (n = 4, 66.7%) and two quasi‐experimental designs of single‐group pretest‐posttest design (n = 1, 16.7%) and a control group design (n = 1, 16.7%).
Participant Characteristics
3.2
The studies enrolled a total of 234 participants, including 17 in control groups who did not receive U‐VBF therapy. After excluding controls, accounting for 13 participants who appeared in multiple included studies (Bernhardt et al. 2005, Bacsfalvi et al. 2007, Cleland and Scobbie 2021, Raaz et al. 2021) and removing 12 who withdrew or were lost to follow‐up (including participants from McCabe et al. (2023) and Preston et al. (2024)), 192 unique individuals who received U‐VBF therapy were included in this review. Their ages ranged from 4 to 18 years, with the majority aged 6–16 years (see Table 1 for details). Gender, specific SSD details and comorbid diagnoses were not fully reported across all studies. The most identified SSD subtypes included childhood apraxia of speech (CAS) and residual speech sound disorders. Other reported comorbidities potentially related to SSD encompassed hearing loss, cleft palate, auditory processing disorders and 22q deletion syndrome. Most studies (n = 28, 80.0%) indicated that their participants had received prior speech therapy. Excluding the duplicated participants mentioned earlier, two individuals had previously undergone speech therapy with ultrasound techniques (Bacsfalvi 2010, Hitchcock and Byun 2015). Some authors described the prior speech therapy as traditional without biofeedback techniques, while others did not explicitly report on this aspect. Details of participants’ gender and description are reported in Table S2 in supplementary material.
Intervention Details
3.3
Participants received interventions targeting sounds that were either individualized or consistent within studies (e.g. English /ɹ/ across all participants). Approaches predominantly featured U‐VBF, often combined with or alternated with methods such as traditional therapy (n = 8, 22.9%), motor‐based approaches (n = 8, 22.9%; e.g. phonetic placement or dynamic cueing techniques emphasizing articulatory movement, which often overlap with traditional therapy but were reported distinctly in studies), EPG (n = 3, 8.6%), auditory perception training (n = 3, 8.6%), with conditions ranging from sequential use to combined or optional applications of U‐VBF. Regarding treatment intensity, most studies (n = 33, 94.3%) reported session durations, which ranged from 30 to 101 min, though exact trial doses were not always specified. More recent studies, however, tended to document trial doses (n = 23, 65.7%), reporting targeted attempts or the average number of productions achieved within a session or across the entire treatment period, ranging from 50 to 300 attempts per session. Dose frequency varied widely, from three times per month to an intensive schedule of 10 times per week, with 1–2 times per week being the most common (n = 28, 80.0%). The shortest intervention duration was 10 days, while the longest lasted 22 weeks. Interventions were primarily delivered by SLPs or supervised SLP trainees (e.g. graduate students). Details on treatment targets, approaches and intensity are reported in Table 1. Some common components in treatment included pre‐practice or elicitation (e.g. initial modelling and cueing to elicit target sounds before structured practice; n = 12, 34.3%), structured practice (n = 13, 37.1%) and the use of hierarchical approaches, such as linguistic levels (n = 11, 31.4%) and the challenge point framework (n = 6, 17.1%). Additional details on personnel, conditions and session components are provided in Table S2 in supplementary material.
Outcome Measures and Treatment Effect
3.4
All 35 studies incorporated perceptual evaluations of speech production, with outcomes detailed in Table 2. The majority (82.9%) assessed target sound accuracy across treated and untreated words or consonants, utilizing word probes across linguistic levels at different time points, enabling some studies to report treatment, generalization and maintenance effect. Other than binary choice in rating sound accuracy, some studies adopted measures including speech acceptability judgments, perceptual ratings on the degree of ‘on‐target’, transcription ratings, collectively indicating any enhanced speech production aligned with target sounds. A relatively lower percentage of studies (5.7%) utilized PCC measure and the Intelligibility in Context Scale (ICS) to assess overall speech accuracy and how understandable speech is to listeners. Treatment effects across the 35 included studies were predominantly positive, with improved speech accuracy and intelligibility reported in 26 studies (74.3%). These included substantial gains in target sound production (e.g. target sound accuracy increases of 30.0%–95.0%), generalization to untreated items and maintenance at follow‐up. However, the remaining nine studies (25.7%) showed mixed or limited results, such as variable participant responses or non‐sustained effects. These patterns varied by design, with positive effects observed in 69.2% of case studies/case series (9/13) and 66.7% of group studies (4/6), compared to 81.3% (13/16) for SCEDs—whose pooled effect is addressed via meta‐analysis later.
Quality Ratings
3.5
Quality evaluation of the 35 studies was performed via two methods, with each study's quality ratings presented in Table 2. Thirty‐one were rated with the RoBiNT on a 30‐point scale, with scores ranging between 7/30 and 19/30 (median: 13), suggesting moderate quality overall. External validity generally scored higher than internal validity, with studies commonly having deficits in study design, randomization and blinding. Conversely, four studies were assessed using RoB 2 because of their RCT design, with three being ‘low risk of bias’ and one ‘some concerns’. Higher scores or ratings indicated more robust designs, although variability across studies was apparent. Following completion of outcomes and quality evaluation, meta‐analysis was conducted.
Meta‐Analysis
3.6
Eligibility for meta‐analysis of each study is detailed in Table 2. Of the 35 studies, SCEDs and group studies were further assessed on their eligibility to be included in meta‐analysis. The majority of the SCEDs include calculable effect sizes (n = 15). Among the six group studies, only two of them provide adequate data for Cohen's d and Hedges’ g calculation (Bressmann et al. 2016, Preston et al. 2020). Another study (Spencer et al. 2023) reported individual d2 of each participant in one single group and is comparable to other SCEDs. No meta‐analysis was implemented for group studies due to the small sample size (n < 5). The 15 SCEDs and 1 group study with single‐group effect size reporting were put under meta‐analysis, with the detailed data extracted for calculation listed in Table S3 in Supplementary material.
Cochran's Q test revealed significant heterogeneity among the studies (Q = 494.36, p < 0.001), with I ^2^ = 97.0% indicating considerable between‐study variability (Higgins et al. 2003). The estimated between‐study variance (Tau^2^) was 0.665 [95% CI, 1.016, 5.877], which indicates moderate‐to‐high variability in true effect sizes across studies. Figure 2 shows the forest plot for speech accuracy outcomes, with most effect sizes positive, ranging from −0.26 to 4.78. The pooled effect size was 1.32 (95% CI [0.54, 2.10], p < 0.001). The predictive interval was −0.59 to 3.22. The funnel plot (Figure 3) showed rightward asymmetry, confirmed with the meta‐regression test (z = 2.478, p = 0.013), suggesting potential publication bias (Egger's test threshold: |z| > 1.96, Harrer et al. (2021)), consistent with patterns in small‐sample speech therapy reviews where null results from underpowered studies may be underrepresented. To further assess result robustness, two studies at high risk of bais indicated by RoBiNT scores less than 10/30 were excluded from the meta‐analysis for sensitivity analyses. These analyses revealed decreased but still significant heterogeneity (Q = 491.63, p < 0.001), a larger effect size and a predictive interval that continued to include zero. Therefore, all studies were retained in order to preserve statistical power with a larger sample.
Forest plot of study‐specific and pooled d2 effect sizes (n = 16 Studies).
Funnel plot of study effect sizes (d2) against standard error.
Discussion
4
This systematic review extends the foundational work of Sugden et al. (2019) by updating the evidence for the effectiveness of U‐VBF for SSD in the paediatric population. Of an initial pool of 545 references, 35 studies were included (13 new since Sugden et al. (2019)), encompassing 192 unique participants aged 4–18 years, adding to the evidence base with a larger dataset focusing on the paediatric population. Furthermore, a quantitative synthesis via meta‐analysis of 16 studies—primarily SCEDs plus one single‐group reported in a control group study—was conducted to complement the prior review's qualitative approach. This broader dataset, incorporating stronger designs such as additional RCTs and more SCEDs eligible for meta‐analysis and a large pooled effect size in the quantitative synthesis, contributes to a more comprehensive assessment of U‐VBF's emerging position in research and clinical practice.
Developments in Evidence and Methodology Since 2019
4.1
Since Sugden et al. (2019)’s review, this study indicates improvement in the geographical representation, study designs and reporting practices that strengthen the quality of U‐VBF research for paediatric SSD. Geographical distribution is still biased toward North America, yet the UK and Australia have become emerging contributors. These findings may indicate a gradual diversification potentially due to wider access to ultrasound technology or research grants. This shift contrasts with the previous review's almost exclusive focus on North America, though the persistent Anglophone and WEIRD (Western, Educated, Industrialized, Rich, Democratic) skew represents a systemic limitation in the field, restricting insights into diverse linguistic typologies and professional contexts. This limited global representation highlights the urgent need for initiatives to support research in underrepresented regions such as Asia and broaden the evidence base beyond non‐English languages.
Study designs have diversified, with SCEDs still prevailing (16 of 35 studies, 45.7%), but the inclusion of five group designs comprising four RCTs represents a change from the scarcity of RCTs before 2019. This shift has enhanced reliability by minimizing bias and increased sample sizes, especially with half of the post‐2019 studies featuring cohorts larger than the maximum of 13 participants noted by Sugden et al. (2019), although the continued dominance of SCEDs restricts generalizability. Despite methodological maturity, quality ratings were similar, with the median RoBiNT score being 13 in this review compared to 12 in Sugden et al. (2019) in which challenges in randomization and blinding continue. Reporting has also improved, with 66.0% of studies now reporting dose (targeted or actual trials per session) versus 37.9% in Sugden et al. (2019), demonstrating better treatment intensity description in accordance with Warren et al. (2007)’s framework, which includes dose, dose form, dose frequency, total intervention duration and cumulative intervention intensity. Preliminary evidence from visual biofeedback specific reviews supports the value of this progress, revealing small but significant relationships between treatment intensity and outcomes (Hitchcock et al. 2019). This improvement, complemented by regular reporting of session durations (94.3%) and frequency (80.0% at 1–2 times/week), highlights a growing awareness of the role of intensity in therapy outcomes. These advances enable more accurate study replication, support the development of individualized interventions and provide a basis for comparing outcomes across studies according to various treatment intensities.
Treatment Effect and Its Implications
4.2
The meta‐analysis of eligible studies yielded a moderate‐to‐large pooled effect on speech accuracy in children and adolescents with SSD, but with wide variability in individual study effect sizes and a predictive interval crossing zero (Figure 2). This variability could be related to several influencing factors, such as participant factors (SSD subtype, comorbidities, age) and treatment intensity (dose, dose frequency and duration). Within studies on specific disorders such as CAS, effect sizes ranged widely from 0.24 to 2.65, illustrating intra‐group variability potentially due to participant age or prior therapy. Effect sizes of individual study also varied across disorders (e.g. 0.49 and 4.78 for hearing loss, 1.95 for cleft palate with various comorbidities, as compared to CAS just mentioned), suggesting subtype‐specific responsiveness to U‐VBF. Similarly, treatment intensity influenced outcomes: Two studies with comparable protocols (162 trials per session) yielded similar effect sizes (d2 = 1.15 and 1.25), while another with slightly reduced number of trials per session (∼149) and sessions yielded a much lower effect (d2 = −0.24), highlighting how multiple factors—including intensity, subtype and comorbidities—underlie the observed effect size variability. However, with only 16 studies in the meta‐analysis, formal subgroup analyses were not feasible, precluding confirmation of these hypotheses and underscoring the need for larger datasets. These patterns contribute to the overall heterogeneity observed.
The high heterogeneity and predictive interval are indicative of inconsistent effects among studies, a common phenomenon in meta‐analyses given variations in study design, sample size and intervention protocol. Heterogeneity related to study design may stem from discrepancies among SCEDs including sample size, as well as differences in intervention intensity (e.g. dose frequency and duration, reported variably across studies). In contrast, heterogeneity tied to individual differences may arise from participant variability, including SSD subtypes and comorbidities as discussed earlier, as well as their ages, different targeted speech sounds, levels of complexity or prior therapy exposure with or without U‐VBF.
The slight asymmetry of funnel plot suggests publication bias, which may distort findings in the direction of positive effects, a concern corroborated by the review's prioritization of peer‐reviewed studies, which may limit the inclusion of gray literature. However, this asymmetry could also be explained by genuine variance in populations and designs, such as a smaller number of group studies contributing to outliers. The substantial between‐study variance estimate (Tau^2^) reflects considerable differences among studies, attributable to both treatment‐related factors (e.g. U‐VBF protocols) and subject‐related factors (e.g. age), as discussed above. Taken together, these results concur with Sugden et al. (2019)’s positive outcomes; however, the present meta‐analysis quantifies this effect amid notable variability.
Strengths, Limitations and Future Directions
4.3
This review has several strengths that add to its contribution to the U‐VBF literature. The fact that it includes 35 studies with a greater number of participants, for example, provides a wider evidence base than the earlier review by Sugden et al. (2019), while the meta‐analysis based on 16 of these studies gives a quantitative grounding to the measurement of effectiveness. The incorporation of a variety of study designs represents a methodological improvement over the previous review, thereby improving result reliability.
However, several limitations temper these strengths. The pooled effect size may be subject to flawed or overestimated due to calculation approximations in some studies, such as lacking full participant SDs (requiring use of available reported data for d2 estimation) or missing SEs (leading to weighted summarization via proxies such as estimated number of trials to approximate participant‐level SEs before aggregating to study‐level d2), warranting caution in interpretation as these methods may introduce imprecision. Additionally, while the meta‐analysis covers nearly half the studies (16/35) and is representative of SCED‐dominant evidence, it primarily reflects single‐case designs and excludes most group studies, potentially underrepresenting more robust designs and biasing the pooled effect toward SCED‐specific positives. The focus on SSD in children and adolescents with exclusion of studies with mixed adult‐child cohorts may result in incomplete representation of relevant paediatric evidence and potentially neglect valuable insights from adult U‐VBF applications that might inform paediatric approaches. In addition, the reliance on peer‐reviewed publication reinforces possible publication bias, as suggested by the funnel plot asymmetry, which may limit the inclusion of negative or null results from unpublished studies and inflate the pooled effect size. Finally, the predominance of English‐speaking participants, consistent with the findings of Sugden et al. (2019), restricts cross‐linguistic generalizability and hinders applicability to non‐English phonologies. This limitation is not confined to phonology and phonetics but also extends to broader social and healthcare contexts, such as varying speech‐language therapy models in non‐Western settings.
Clinically, the real‐time visualization provided by U‐VBF supports therapists in providing specific feedback and complements client self‐correction. Building on these mechanisms, the enhanced reporting of treatment intensity facilitates personalized U‐VBF protocols, including dose or dose frequency titration for particular SSD subtype (e.g. higher doses for CAS than residual speech errors). Future research should follow up on the representation of RCTs within the 35 studies to increase generalizability beyond the prevalent single‐case designs, standardize reporting of participant variables and treatment intensity to detect differential responses, and investigate U‐VBF effectiveness in non‐English‐speaking populations and non‐Western contexts to increase global applicability, possibly using the effect size range to inform power calculations for future trials. To enable stronger future meta‐analyses and cross‐study comparisons, improved standardization of reporting—such as consistent documentation of intensity metrics and participant‐level data—is essential, as preliminary reviews highlight small but significant links between treatment intensity and efficacy while noting ongoing inconsistencies (Hitchcock et al. 2019).
Conclusion
5
This systematic review and meta‐analysis of U‐VBF for children and adolescents with SSD, updating Sugden et al. (2019), provides evidence of significant speech accuracy improvements despite substantial heterogeneity and wide variability in effect sizes. The inclusion of RCTs and better reporting of treatment intensity suggest methodological improvement, in addition to the widened evidence base with 192 unique participants, although generalizability is limited by publication bias, the predominance of English‐speaking participants and the exclusion of mixed‐age studies. Nonetheless, the large pooled effect and enhanced intensity documentation provide a robust foundation for clinical implementation. U‐VBF provides real‐time tongue visualization that aids clients in self‐correction and therapists in providing targeted feedback, as seen in the protocols described in the included studies. These findings support individualized interventions, potentially transforming SSD therapy by addressing persistent errors more effectively than traditional methods alone. To extend these insights, future treatment studies should prioritize RCTs and investigate non‐English populations, to strengthen the evidence base of U‐VBF for the optimization of therapy in children with SSD. Future reviews should incorporate gray literature to mitigate bias.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Supporting File: jlcd70209‐supp‐0001‐Appendices.docx
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1References marked with an asterisk (*) indicate studies included in the review.
- 2* Adler‐Bock, M. , B. Barbara May , B. Gick , and P. Bacsfalvi . 2007. “The Use of Ultrasound in Remediation of North American English /R/in 2 Adolescents.” American Journal of Speech—Language Pathology 16, no. 2: 128–139.17456891 10.1044/1058-0360(2007/017) · doi ↗ · pubmed ↗
- 3* Bacsfalvi, P. 2010. “Attaining the Lingual Components of /R/With Ultrasound for Three Adolescents With Cochlear Implants.” Canadian Journal of Speech‐Language Pathology and Audiology/Revue canadienne d 'orthophonie et d'audiologie 34, no. 3: 206–217.
- 4* Bacsfalvi, P. , B. M. Bernhardt , and B. Gick . 2007. “Electropalatography and Ultrasound in Vowel Remediation for Adolescents With Hearing Impairment.” Advances in Speech‐Language Pathology 9, no. 1: 36–45.
- 5* Benway, N. R. , E. R. Hitchcock , T. Mc Allister , G. T. Feeny , J. Hill , and J. L. Preston . 2021. “Comparing Biofeedback Types for Children with Residual /ɹ/Errors in American English: A Single‐Case Randomization Design.” American Journal of Speech‐Language Pathology 30: 1819–1845.34232693 10.1044/2021_AJSLP-20-00216 PMC 8702873 · doi ↗ · pubmed ↗
- 6* Bernhardt, B. , P. Bacsfalvi , B. Gick , B. Radanov , and R. Williams . 2005. “Exploring the Use of Electropalatography and Ultrasound in Speech Habilitation.” Journal of speech‐language pathology and audiology 29, no. 4: 169–182.
- 7* Bernhardt, B. , B. Gick , P. Bacsfalvi , and J. Ashdown . 2003. “Speech Habilitation of Hard of Hearing Adolescents Using Electropalatography and Ultrasound as Evaluated by Trained Listeners.” Clinical Linguistics & Phonetics 17, no. 3: 199–216.12858839 10.1080/0269920031000071451 · doi ↗ · pubmed ↗
- 8* Bernhardt, M. B. , P. Bacsfalvi , M. Adler‐Bock , et al. 2008. “Ultrasound as Visual Feedback in Speech Habilitation: Exploring Consultative Use in Rural British Columbia, Canada.” Clinical linguistics & phonetics 22, no. 2: 149–162.18253873 10.1080/02699200701801225 · doi ↗ · pubmed ↗
