# Stability Amidst Change in the Measurement of Implementation Fidelity Over Time

**Authors:** Sydni A. J. Basha, Qiyue Cai, Melanie M. Domenech Rodriguez, Abigail H. Gewirtz, Margrét Sigmarsdóttir, David S. DeGarmo, Melissa Uribe, Marion S. Forgatch

PMC · DOI: 10.1007/s11121-025-01864-1 · 2026-01-06

## TL;DR

This study shows that a tool for measuring how well a parenting program is implemented has remained reliable and consistent over 17 years.

## Contribution

The study demonstrates that the FIMP system's revisions have not affected its reliability or comparability of implementation fidelity ratings over time.

## Key findings

- Therapist differences accounted for the largest variance in fidelity ratings (38.1%).
- Test–retest ICCs for FIMP domains ranged from 0.73 to 0.92, indicating acceptable-to-excellent reliability.
- FIMP revisions have not undermined earlier fidelity metrics, supporting the comparability of historical and current ratings.

## Abstract

Children’s mental health disorders are rising, underscoring the need to implement behavioral parent training (BPT) programs. However, wide variability in BPT effectiveness often reflects inconsistencies in implementation fidelity. This study examines test–retest reliability of the GenerationPMTO model’s Fidelity of Implementation Rating System (FIMP) over a 17-year period. Seven coders provided ratings of 34 video segments from families participating in the Marriage and Parenting in Stepfamilies (MAPS) intervention, coded at two time points (2004, 2021) using first and third iterations of the FIMP manual. Variance decomposition analyses determined how much variability in scores was attributable to the interventionist, the observational coder, the session, and the year the data were coded. Test–retest intraclass correlation coefficients (ICCs) examined reliability across FIMP domains (knowledge, structure, teaching, process, and overall). Therapist differences accounted for the largest variance (38.1%), followed by coders (14.1%) and session (10.7%). Year did not significantly contribute, indicating that FIMP revisions have not undermined earlier fidelity metrics. Reliability analyses showed acceptable-to-excellent ICCs (range = 0.73–0.92), supporting the comparability of historical and current ratings. These findings indicate that GenerationPMTO’s FIMP refinements maintain core fidelity metrics. By demonstrating stable fidelity data over time, the study bolsters confidence in both historical results and current coding practices. These outcomes reinforce the utility of long-standing training materials and support the use of stable fidelity tools in ongoing implementation and training contexts. Such synergy between fidelity measurement and adaptation fosters sustained program effectiveness across service contexts, allowing providers to align newer fidelity protocols with established best practices.

The online version contains supplementary material available at 10.1007/s11121-025-01864-1.

## Full-text entities

- **Diseases:** mental health disorders (OMIM:603663)

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12804281/full.md

---
Source: https://tomesphere.com/paper/PMC12804281