A Physics-Aware Diffusion Framework for Robust ECG Synthesis Using Mesoscopic Lattice Boltzmann Constraints
Xi Qiu, Hailin Cao, Li Yang, Hui Wang

TL;DR
A new AI system uses physics rules to convert wearable pulse data into accurate ECGs, avoiding incorrect or impossible results.
Contribution
Introduces PhysDiff-LBM, a physics-aware diffusion model with Lattice Boltzmann constraints for robust ECG synthesis.
Findings
The physics-guided AI system generates accurate and medically valid ECGs from pulse data.
Incorporating fluid dynamics constraints improves signal fidelity and clinical applicability.
The model outperforms data-driven approaches by adhering to hemodynamic conservation laws.
Abstract
Smartwatches and fitness bands track our pulse using simple optical sensors, but diagnosing true heart disease requires an electrical electrocardiogram (ECG) typically recorded in hospitals with sticky patches and wires, while artificial intelligence (AI) has been used to translate simple wrist pulse data into clinical ECGs, standard AI often “guesses” the waveforms, creating medically incorrect or physically impossible results. To overcome this, we developed a new AI system that directly embeds the natural physical rules of blood circulation into its learning process. Instead of just learning from data patterns, we taught the AI the actual physical laws of how blood pumps from the heart and flows through blood vessels. Constrained by these natural laws of fluid dynamics, our AI is prevented from making impossible physiological guesses. Our tests show this physics-guided approach…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12- —National Key Research and Development Program of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLattice Boltzmann Simulation Studies · Advanced Sensor and Energy Harvesting Materials · Generative Adversarial Networks and Image Synthesis
1. Introduction
Cardiovascular diseases (CVDs) remain the leading cause of mortality globally, necessitating continuous and ubiquitous cardiac monitoring [1], while the Electrocardiogram (ECG) stands as the clinical gold standard for diagnosing cardiac arrhythmias and myocardial anomalies [2,3], its acquisition typically requires adhesive electrodes and professional operation, limiting its utility for long-term home monitoring. In contrast, Photoplethysmography (PPG) has become ubiquitous in wearable devices due to its non-invasive and low-cost nature [4,5]. Consequently, the cross-modal synthesis of ECG signals from PPG—effectively translating peripheral blood volume changes back to cardiac electrical activity—has emerged as a pivotal research frontier in AI-driven healthcare [6,7,8]. Fundamentally, the feasibility of such estimation is rooted in the physiological principle of cardiac electromechanical coupling [9,10,11]. The electrical excitation of the heart (captured by ECG) triggers the mechanical contraction of the myocardium, which subsequently generates a pulsatile pressure wave propagating through the vascular tree (captured by PPG). Consequently, the PPG signal inherently embeds latent information regarding the cardiac electrical cycle. However, retrieving the ECG from PPG is non-trivial; the vascular system acts as a biological low-pass filter, dampening the high-frequency components associated with sharp electrical transitions [12,13,14]. Despite this, the intrinsic correlation between these modalities suggests that with advanced modeling, wearable PPG can serve as a viable surrogate for reconstructing high-fidelity ECGs, thereby significantly expanding the clinical utility of consumer-grade devices for ambulatory cardiac monitoring [15].
Technically, reconstructing ECG from PPG is an ill-posed inverse problem, as the mapping from peripheral hemodynamic signals to cardiac electrical signals is complex, non-linear, and subject to individual variability. Early approaches primarily relied on signal processing techniques or shallow machine learning models, which struggled to capture the intricate temporal dependencies of physiological signals [16,17]. With the advent of deep learning, generative models such as Generative Adversarial Networks and standard Variational Autoencoders have demonstrated improved performance [18,19]. More recently, Denoising Diffusion Probabilistic Models have achieved remarkable success in various generative tasks due to their stable training dynamics and high sample quality. These attributes position them as a promising candidate to overcome the limitations of GANs and VAEs, offering immense potential for capturing the complex, non-linear mappings essential for high-fidelity physiological signal reconstruction [20,21].
However, as illustrated in Figure 1, existing data-driven approaches, including standard diffusion models, face significant limitations in biomedical domains. They tend to prioritize statistical correlation over physiological plausibility [22,23]. Without explicit physical constraints, these models often suffer from “hallucinations”—generating waveforms that may appear visually realistic but violate fundamental hemodynamic principles or lack clinical consistency in critical regions, such as distorted QRS durations or physically impossible repolarization patterns. Furthermore, pure data-driven models often struggle to balance the reconstruction of low-frequency trends governed by hemodynamics and high-frequency details reflecting electrophysiological nuances [24,25].
To mitigate these issues, we argue that the generation process should be guided by the underlying physics of blood flow. In this paper, we propose PhysDiff-LBM, a novel physics-aware framework that rigidly incorporates hemodynamic constraints into a conditional diffusion model. Our approach augments the time-series prediction of pulse wave propagation by integrating a mesoscopic particle streaming and collision process governed by the Lattice Boltzmann Method (LBM).
Specifically, we design a unique dual-stream architecture. The first stream employs a cross-attention-guided diffusion model equipped with region-wise adaptability. This mechanism allows the model to adaptively focus on and refine high-frequency details in critical cardiac phases while maintaining global coherence. The second stream integrates a differentiable LBM solver that simulates the fluid dynamics of pulse propagation, ensuring the generated signals adhere to conservation laws. These two components are synergistically coupled to enforce physical consistency between the electrical and hemodynamic domains, effectively suppressing non-physiological artifacts. The main contributions of this work are summarized as follows:
(1) We propose PhysDiff-LBM, the first framework to our knowledge that integrates Lattice Boltzmann hemodynamic constraints with diffusion models for ECG synthesis, effectively bridging the gap between data-driven generation and physical modeling.
(2) We introduce a region-aware cross-attention mechanism within the diffusion backbone, enabling the model to capture high-frequency morphological details with local adaptability, significantly enhancing the fidelity of critical waveform segments.
(3) We develop a differentiable LBM solver to impose physics-informed constraints, thereby enforcing hemodynamic plausibility and reducing the occurrence of hallucinations common in pure deep learning approaches.
(4) Extensive experiments, comprising quantitative benchmarks against leading competitive baselines, qualitative visual analysis, and downstream clinical evaluations, demonstrate the superiority of our framework. Our method not only achieves significantly improved accuracy in reconstruction metrics but also preserves high-fidelity morphological details and exhibits robust performance in practical cardiac diagnostic tasks.
2. Method
2.1. Problem Formulation
The primary objective of this study is to reconstruct high-fidelity Electrocardiogram (ECG) signals from single-lead Photoplethysmography (PPG) recordings (see Figure 2). Let denote the observed input PPG sequence of length L, and denote the corresponding ground-truth ECG signal. The relationship between PPG and ECG is governed by complex physiological coupling—specifically the interaction between cardiac electrical excitation and peripheral hemodynamic propagation—rendering the inverse mapping a highly ill-posed problem with non-unique solutions.
We formulate this reconstruction task as a conditional generative modeling problem. Our goal is to learn the conditional distribution , enabling the synthesis of ECG waveforms that are not only morphologically accurate but also physically consistent with the underlying hemodynamic principles extracted from the PPG modality.
To address the ill-posed nature of this cross-domain inverse problem, we propose PhysDiff-LBM, a physics-informed generative framework illustrated in Figure 3. The system pipelines three synergistic components to enforce physiological fidelity across domains. First, the Implicit LBM Physics Encoder (Section 2.2) lifts macroscopic PPG signals into a mesoscopic phase space, simulating blood flow dynamics to extract robust hemodynamic invariants. This physical context then conditions the Region-Disentangled Diffusion Backbone (Section 2.3), a dual-stream architecture that synthesizes the ECG signal by jointly optimizing for fine-grained morphological geometry and global topological rhythm. Finally, to rigorously bind the generation to physical laws, our Physics-Informed Generative Learning strategy (Section 2.4) employs Tweedie’s formula to integrate fluid dynamic constraints directly into the diffusion training objective, ensuring the synthesized waveforms maintain strict hemodynamic consistency.
2.2. Implicit Lattice Boltzmann Physics Encoding
2.2.1. Bridging Hemodynamics and Kinetic Theory
The physiological link between PPG and ECG is fundamentally mediated by the cardiovascular system’s hemodynamic response. The heart’s electrical activation triggers a mechanical pressure wave that propagates through the vascular tree. Classically, this blood flow dynamics is governed by the conservation laws of mass and momentum, formalized as the Navier–Stokes equations for incompressible fluid [26]:
where denotes blood density, is the flow velocity vector, p is the pressure, represents dynamic viscosity, and denotes external body forces. Additionally, ∇ and denote the gradient and Laplace operators, respectively.
While Equation (1) provides a rigorous macroscopic description, directly embedding it as a prior in generative modeling imposes significant limitations. The N-S formulation models fluid motion through averaged macroscopic variables, effectively assuming local thermodynamic equilibrium [27]. In the context of deep generative learning, enforcing these macroscopic constraints acts as an aggressive low-pass filter. The neural network is forced to prioritize the smoothness of the pressure field to satisfy the partial differential operators, often at the expense of high-frequency signal components. Consequently, N-S based approaches tend to generate overly smoothed waveforms, failing to capture the stochastic, sharp morphological details that are critical for clinical diagnosis.
To overcome this “spectral bias,” we resort to Kinetic Theory. Specifically, we adopt the Lattice Boltzmann Method (LBM), which describes the fluid not by macroscopic variables, but by the particle distribution function , representing the probability of finding a particle with continuous microscopic velocity at position and time t. In the discrete LBM formulation, the continuous velocity is reduced to a finite set of Q lattice vectors , effectively absorbing the velocity parameter into the subscript to form the discrete distribution . The evolution of is governed by the discrete Boltzmann equation with the BGK collision operator:
where are the discrete distributions along lattice velocities , Q represents the total number of discrete velocities, is the relaxation time related to viscosity, and is the equilibrium distribution. By lifting the dynamics into this mesoscopic phase space, LBM captures non-equilibrium kinetic states that store detailed morphological information, thereby enabling the generation of high-fidelity signals without suffering from the over-smoothing characteristic of pure N-S constraints.
2.2.2. Neural Mesoscopic Discretization
As illustrated in Figure 4, We propose a Neural LBM module that internalizes the physics of Equation (2) into differentiable layers. This module lifts the scalar PPG input into a high-dimensional phase space, evolves it according to kinetic rules, and projects it back to extract hemodynamic invariants.
Phase Space Lifting (Macroscopic to Mesoscopic). Treating the 1D temporal signal as our spatial domain, we adopt the classic D1Q3 discrete velocity configuration ( ). The normalized velocity set is , representing backward, stationary, and forward wave propagation, respectively.
We define a learnable lifting operator to map the observed scalar PPG signal to the initial particle distribution functions . This initializes the computational domain by distributing the macroscopic pressure energy into Q virtual kinetic modes:
Here, and denote the unified notation for the learnable weight matrices and bias vectors of the projection layer, respectively. The Softplus activation ensures the non-negativity of the distribution functions, adhering to the physical requirement that particle density cannot be negative.
Learnable Collision (Rheological Modeling). The collision term in Equation (2) dictates how distributions relax towards equilibrium, implicitly modeling the fluid’s viscosity. Since real blood exhibits non-Newtonian behavior (shear-thinning) where varies dynamically, a fixed collision parameter is insufficient. We implement a Generalized Neural Collision Operator using a channel-mixing convolution layer:
To strictly conserve mass and momentum, we apply a Moment Projection that subtracts the raw mass ( ) and momentum ( ) residuals from the neural output:
The post-collision distribution is then updated as follows:
By learning the collision kernel from data under these conservation constraints, this layer adaptively models the complex viscoelastic interactions between the blood and the arterial wall, which are difficult to derive analytically.
Streaming as Temporal Advection. The left-hand side of Equation (2) represents the streaming step, describing the exact advection of particles. In our neural formulation, we implement this as a deterministic shift operation. For each latent velocity channel k, the features are shifted temporally by a stride proportional to the lattice velocity vector :
Unlike standard convolution which mixes local information, this structured shifting strictly enforces the causality of wave propagation, mimicking the physical travel of the pulse wave along the vessel without numerical dissipation.
Macroscopic Moment Projection. Finally, we recover the macroscopic hemodynamic features by computing the statistical moments of the evolved distributions, as dictated by Kinetic Theory. The density (zeroth moment) and momentum (first moment) are computed as follows:
We concatenate these physically derived moments to form the final condition embedding . This provides the diffusion model with rigorous descriptors of the blood flow state—such as local pressure and wall shear stress trends—ensuring the synthesized ECG is physiologically grounded.
2.3. Region-Disentangled Diffusion Backbone
2.3.1. Dual-Branch Architecture for Structural-Morphological Synergy
To synthesize high-fidelity ECG signals, the generative model must simultaneously capture fine-grained morphological details and global topological rhythms. Standard U-Net architectures, however, often struggle to balance these conflicting objectives within a single output stream. To address this, we propose a Region-Disentangled U-Net that decouples the reconstruction task into two specialized pathways sharing a common feature extractor.
The backbone network processes the noisy latent state , combined with the timestep embedding and the physical condition . A shared encoder first extracts a multi-scale latent representation . Subsequently, the architecture bifurcates into a Geometry Branch and a Topology Branch. The Geometry Branch focuses on local signal reconstruction by predicting the noise residual . It projects the latent features back to the signal space via a convolutional head:
where ∗ denotes the convolution operation, and GN represents Group Normalization. This output is primarily responsible for recovering the high-frequency waveform details required for the reverse diffusion process.
Simultaneously, the Topology Branch functions as an auxiliary semantic segmentation module. Instead of predicting signal values, it estimates a probability mask that highlights the Regions of Interest (ROI), specifically the QRS complexes. This is achieved through a separate projection head followed by a sigmoid activation:
By jointly optimizing this branch, we enforce a strong inductive bias that compels the shared encoder to learn representations that are not only effective for denoising but also semantically aware of the underlying cardiac cycle phases, ensuring global rhythmic coherence.
2.3.2. Physically Guided Cross-Attention Mechanism
To rigorously condition the generative process on hemodynamic laws, we introduce a Physically Guided Cross-Attention mechanism embedded at the bottleneck of the U-Net. This layer acts as a dynamic interface, aligning the fluid domain features extracted by the LBM encoder with the electrical domain latents of the diffusion model.
Let denote the intermediate feature map of the noisy ECG in the U-Net bottleneck, and denote the downsampled hemodynamic features. We formulate the interaction by projecting these inputs into query, key, and value subspaces:
where , and are learnable projection matrices. The attention-driven feature injection is then computed as follows:
In this formulation, the attention matrix effectively models the physiological transfer function between the two modalities. It computes the temporal correlation between specific hemodynamic events and electrical triggers. Unlike simple concatenation, this mechanism allows the model to adaptively compensate for the Pulse Transit Time the variable delay between electrical activation and mechanical pulse—thereby ensuring precise temporal synchronization in the reconstructed signal (Figure 5).
2.4. Physics-Informed Generative Learning
2.4.1. Conditional Diffusion Dynamics
We formulate the ECG synthesis as a conditional diffusion process involving a forward trajectory that corrupts the data and a learnable reverse trajectory that reconstructs it. The forward diffusion process is defined as a fixed Markov chain that gradually adds Gaussian noise to the ground-truth signal according to a variance schedule . For each step t, the transition probability is given by . By defining and , we can derive the marginal distribution of conditioned on in closed form. This allows us to sample at any arbitrary timestep directly using the reparameterization trick:
The generative reverse process aims to invert this diffusion chain to recover from a standard Gaussian prior , explicitly conditioned on the hemodynamic features . Since the exact posterior is intractable, we approximate it with a parameterized Gaussian distribution
The variance is typically fixed to or . The core learning objective lies in estimating the mean . By matching the generative distribution to the true posterior , the optimal mean can be parameterized as a linear combination of the current noisy state and a predicted noise component:
Here, is the function approximated by our Geometry Branch, which learns to predict the noise present in given the physical condition.
2.4.2. Physics-Consistent Estimation via Tweedie’s Formula
A fundamental challenge in incorporating physical laws into diffusion models is that constraints such as signal continuity and derivatives are defined on the clean signal manifold , whereas the network operates on the noisy latent space . Applying physical regularization directly to is ineffective due to the dominance of high-frequency Gaussian noise. To resolve this, we employ Tweedie’s formula to analytically estimate the denoised signal from the current noisy state and the predicted noise . By rearranging the forward marginal equation, we obtain the projection operator:
This formulation acts as a single-step denoising approximation. It enables us to enforce the physics-based loss directly on the estimated clean manifold at every training step t, ensuring that the gradients backpropagated to the network effectively guide the generation towards hemodynamically consistent waveforms.
2.4.3. Multi-Objective Optimization
The training of PhysDiff-LBM is governed by a composite objective function designed to jointly satisfy signal fidelity, structural alignment, and physical consistency requirements. The total loss is constructed as a weighted sum of three complementary terms:
where are hyperparameters balancing the optimization landscape.
The primary reconstruction capability is driven by the denoising loss , which minimizes the variational upper bound on the negative log-likelihood. Following the reparameterization of the reverse process mean, this simplifies to minimizing the distance between the sampled Gaussian noise and the noise residual predicted by the network:
While this term ensures pixel-level accuracy, standard MSE optimization often prioritizes local smoothness at the expense of global structural coherence. To mitigate this, we simultaneously optimize a topology loss through the auxiliary Topology Branch. By employing a Binary Cross-Entropy objective against the ground-truth QRS mask r, formulated as , we compel the shared encoder to explicitly capture the cardiac rhythm, ensuring the generated waveforms are topologically aligned with the physiological cycle.
Crucially, to ensure the synthesis adheres to fluid dynamic laws, we incorporate a hemodynamic consistency loss . Leveraging the Tweedie’s projection derived in Equation (16), we explicitly penalize discrepancies between the electrical evolution of the estimated signal and the fluid momentum extracted by the LBM encoder:
Here, represents the temporal derivative operator. This term enforces the constraint that the gradient of the generated ECG must correspond to the hemodynamic momentum flux , thereby effectively suppressing non-physiological hallucinations and ensuring strict cross-domain physical coupling.
3. Experiments
3.1. Experimental Setup
3.1.1. Datasets and Protocol
We evaluate PhysDiff-LBM using four datasets divided into reconstruction benchmarks:
Training and Reconstruction Benchmarks:
- MIMIC-III Waveform Database: A large-scale ICU dataset from which we curated 5000+ subjects. It represents complex, noisy hemodynamic conditions, serving as the primary training corpus [3,28].
- VitalDB: High-fidelity intraoperative recordings capturing hemodynamics under anesthesia. These dataset is used to assess model generalization across distinct physiological states outside the ICU domain [29].
- MIMIC PERform AF Dataset: This binary classification task focuses on distinguishing Atrial Fibrillation from Normal Sinus Rhythm. Since AF is characterized by the absence of P-waves and irregular R-R intervals, this task explicitly tests the model’s ability to capture high-frequency atrial depolarization details and temporal rhythm consistency [30,31].
- PhysioNet Challenge 2015 Dataset: We performed a 5-class arrhythmia classification task covering Asystole, Bradycardia, Tachycardia, Ventricular Tachycardia (VTA), and Ventricular Fibrillation (VFB). This challenging scenario requires the reconstructed signals to reflect diverse morphological anomalies, ranging from extreme rate variations to complete waveform disorganization [32].
3.1.2. Preprocessing
All signals were resampled to 128 Hz and segmented into non-overlapping 4-s windows ( ). We utilized NeuroKit2 for signal conditioning, applying standard bandpass filtering for PPG and the Pan-Tompkins algorithm for ECG cleaning to remove artifacts. To stabilize the diffusion process, we applied instance-wise Min-Max normalization to scale each window x to the range :
Furthermore, to supervise the topological branch, we generated binary Region-of-Interest (ROI) masks . Using R-peaks detected via the Pan-Tompkins method, we defined the QRS complex region as a 32-sample window centered on each peak index :
where is the indicator function, effectively highlighting the structural cardiac events.
3.1.3. Baselines
We compare PhysDiff-LBM against three representative state-of-the-art approaches:
- CardioGAN: An attention-based SOTA GAN framework utilizing dual discriminators for ECG synthesis. It serves as the primary adversarial baseline to evaluate the trade-off between perceptual quality and training stability [33].
- PhysDiff-NS: A PINN-based variant of our model where the LBM encoder is replaced by explicit 1D Navier–Stokes residual regularization. This baseline is designed to benchmark our implicit kinetic formulation against traditional macroscopic fluid constraints with constant viscosity.
- RecQSR: A specialized deterministic framework focusing on fine-grained QRS complex reconstruction. It represents the state-of-the-art in non-generative, regression-based deep learning approaches [34].
3.1.4. Implementation Details and Environment
To ensure the reproducibility of our study, we detail the computational environment and software frameworks utilized for model development and evaluation. All neural network architectures, including the implicit LBM encoder and the region-disentangled diffusion backbone, were implemented using the PyTorch deep learning framework (version 2.6.0) in Python 3.10. For physiological signal preprocessing, filtering, and R-peak detection, we leveraged the NeuroKit2 and SciPy libraries. All model training, hyperparameter tuning, and performance evaluations were accelerated using CUDA 11.7 and conducted on a high-performance Linux workstation equipped with a single NVIDIA GeForce RTX 4090 GPU (NVIDIA Corporation, Santa Clara, CA, USA, 24 GB VRAM).
3.2. Comparative Experiment
3.2.1. Evaluation Metrics
To assess the performance of PhysDiff-LBM from morphological, clinical, and distributional perspectives, we employ three key metrics:
- Waveform Fidelity (RMSE): Measures point-wise reconstruction accuracy. .
- Clinical Accuracy (HR-MAE): Evaluates diagnostic validity by calculating the absolute deviation in Heart Rate (BPM) derived via the Pan-Tompkins algorithm: .
- Distributional Consistency (FD): Fréchet Distance measures the Wasserstein-2 distance between the empirical distributions of the raw generated and real ECG signals, treating each one-dimensional signal window directly as a high-dimensional vector, to assess structural realism and mode coverage.
3.2.2. Results
We evaluate the complete methods on the two reconstruction benchmarks—MIMIC-III Waveform Database and VitalDB—using the three aforementioned metrics, with quantitative results summarized in Figure 6, Figure 7 and Figure 8.
3.2.3. Overall Performance Analysis
As illustrated in Figure 6, PhysDiff-LBM consistently achieves superior performance across all evaluation metrics on both benchmarks. The improvements are particularly pronounced in distributional consistency and clinical accuracy, indicating that our physics-constrained framework not only reconstructs morphologically accurate waveforms but also preserves the underlying physiological semantics essential for clinical interpretation. CardioGAN exhibits the weakest performance among all compared methods, particularly in terms of distributional alignment. This degradation can be attributed to the inherent mode collapse and training instability issues associated with adversarial training paradigms, while GANs excel at generating perceptually plausible samples, they often fail to capture the full diversity of cardiac morphologies, leading to hallucinated features that deviate from physiologically valid patterns. PhysDiff-NS, which employs explicit Navier–Stokes residual constraints, demonstrates competitive reconstruction fidelity but falls short of PhysDiff-LBM in capturing fine-grained morphological details. This performance gap highlights the limitations of macroscopic fluid equations with constant viscosity assumptions, which cannot adequately model the spatially heterogeneous and nonlinear dynamics of cardiovascular systems. RecQSR, as a regression-based deterministic approach, achieves reasonable waveform fidelity but exhibits limited capacity in preserving distributional characteristics, tending to produce over-smoothed outputs that average out subtle morphological variations.
The representative waveform comparison in Figure 8 provides intuitive evidence of the qualitative differences among methods. PhysDiff-LBM faithfully reconstructs the characteristic morphological features including sharp R-peaks, appropriate QRS complex duration, and physiologically consistent ST-T wave transitions. In contrast, baseline methods exhibit various artifacts: CardioGAN produces irregular baseline wandering and spurious oscillations; PhysDiff-NS generates overly smooth transitions that blur the boundaries between waveform components; RecQSR tends to miss subtle inflection points in the P-wave and T-wave regions. Furthermore, the t-SNE projections in Figure 7 reveal distinct clustering behaviors across methods. PhysDiff-LBM achieves the highest degree of overlap between real and generated distributions, indicating that our model successfully captures the intrinsic manifold structure of ECG signals. The compact and well-aligned clusters suggest that the LBM-encoded physics constraints effectively regularize the generation process, preventing the model from drifting into physiologically implausible regions of the feature space.
Notably, PhysDiff-LBM maintains robust performance on the out-of-distribution VitalDB dataset, which captures hemodynamics under anesthesia-induced physiological perturbations, while all methods exhibit some degree of performance degradation when transferring from ICU to intraoperative settings, our approach demonstrates the smallest generalization gap. This robustness stems from the physics-informed inductive bias embedded in our LBM encoder, which captures domain-invariant hemodynamic principles rather than dataset-specific statistical correlations. In contrast, our LBM-based formulation operates at the mesoscopic scale, enabling adaptive viscosity encoding through learned collision operators that better reflect the complex rheological properties of blood flow, thereby achieving superior cross-domain transferability and clinical reliability.
3.3. Interpretability and Mesoscopic Physics Analysis
To investigate the underlying mechanism of PhysDiff-LBM, we visualize the inference dynamics for many representative samples in Figure 9. This qualitative breakdown reveals how the mesoscopic constraints bridge the domain gap between hemodynamic boundary conditions, represented by the PPG, and the electrophysiological responses of the ECG.
The visualization of the latent manifold, depicted in the second row of Figure 9, provides empirical evidence that our model learns explicit fluid dynamic descriptors rather than abstract statistical features. The activation patterns in the “Velocity” and “Kinetic Energy” channels are not random but exhibit high-intensity responses corresponding precisely to the maximum systolic upstroke of the input PPG. From a fluid mechanics perspective, these activations represent the gradient of blood volume changes, denoted as , and the associated momentum flux. Crucially, observing the alignment with the gray dashed lines reveals a consistent physiological phenomenon where the peak activation of these mesoscopic physical features lags slightly behind the electrical R-peaks. This latency corresponds to the Pulse Transit Time, or PTT, which is the interval required for the pressure wave generated by ventricular contraction to propagate to the peripheral measurement site.
This observation suggests that PhysDiff-LBM has successfully modeled the inverse hemodynamic transfer function. Instead of simply translating waveform textures, the diffusion process is guided by the collision operator to locate the precise moment of maximum momentum generation associated with cardiac ejection, effectively back-tracing the preceding electrical depolarization event. This physics-based reasoning explains the superior temporal precision of the model observed in quantitative results. Even in the presence of irregular heartbeats or baseline wander, the rigid coupling between mechanical flow captured by LBM and the electrical trigger of the ECG forces the generated QRS complexes to align strictly with the ground truth. This mechanism effectively prevents the phase shifts commonly seen in pure end-to-end regression baselines.
Furthermore, the “Force” channel in the heatmap acts as a second-order regularization term. By highlighting regions of high acceleration, defined as rapid changes in flow velocity, it enforces structural sharpness in the reconstructed signal. This is evident in the reconstructed ECG shown in the bottom row, where the QRS complexes maintain high-frequency fidelity without the over-smoothing artifacts typical of MSE-based approaches. Consequently, the LBM module acts as a dynamic filter that allows the diffusion model to distinguish between genuine high-frequency cardiac features possessing corresponding hemodynamic acceleration signatures and random Gaussian noise lacking physical momentum support, thereby ensuring robustness across diverse physiological states.
3.4. Ablation Study
To systematically evaluate the contribution of each proposed component, we conduct comprehensive ablation experiments on both MIMIC-III and VitalDB benchmarks. We design four ablation variants by progressively removing or replacing key modules from the full PhysDiff-LBM framework, as summarized in Table 1.
The first variant (M1) replaces the entire implicit LBM physics encoding module with a standard 1D convolutional encoder, assessing the contribution of mesoscopic physical modeling compared to purely data-driven feature extraction. The second variant (M2) substitutes the dilated convolutions in the streaming operator with standard convolutions of unit dilation rate, examining the role of multi-scale feature propagation in preserving pulse wave phase information. The third variant (M3) eliminates the region detection head along with the auxiliary segmentation objective , investigating the contribution of explicit cardiac rhythm supervision. The fourth variant (M4) replaces the physically guided cross-attention mechanism with simple channel-wise concatenation, evaluating the necessity of dynamic temporal alignment for Pulse Transit Time compensation.
Table 2 presents the quantitative performance of all ablation variants across both benchmarks. The complete PhysDiff-LBM framework (M0) consistently achieves the best performance across all metrics, validating the effectiveness of our integrated design.
The ablation results reveal several important observations regarding the contribution of each component. Most notably, the removal of the LBM encoder (M1) leads to catastrophic performance degradation across all metrics. This substantial gap confirms that the mesoscopic physics-informed encoding constitutes the core contribution of our framework. Without the implicit hemodynamic constraints embedded in the LBM formulation, the model degenerates into a purely data-driven paradigm that fails to capture the underlying fluid dynamic principles governing cardiovascular wave propagation.
Interestingly, we observe distinct degradation patterns across different ablation variants, suggesting that each component addresses complementary aspects of the reconstruction problem. The removal of dilated streaming (M2) causes the most severe degradation in FD, while maintaining relatively modest HR-MAE increase. This asymmetric pattern indicates that multi-scale feature propagation primarily affects the global distributional consistency of generated waveforms rather than local peak detection accuracy. The dilated convolutions enable the model to capture long-range temporal dependencies that are essential for preserving the characteristic shape of the entire cardiac cycle.
In contrast, the topology branch removal (M3) exhibits an opposite degradation pattern, with HR-MAE surging dramatically while FD remains relatively controlled. This observation validates our hypothesis that explicit cardiac rhythm supervision through the auxiliary segmentation task provides crucial inductive bias for learning the temporal structure of ECG signals. Without direct guidance on QRS complex localization, the model loses awareness of the underlying topological organization, resulting in rhythmic hallucinations where cardiac cycles may exhibit plausible morphology but incorrect temporal placement.
The cross-attention mechanism (M4) demonstrates the most balanced contribution across metrics, with moderate degradation in both HR-MAE and FD. This suggests that the dynamic temporal alignment provided by attention weights benefits both local and global reconstruction quality by adaptively compensating for the variable Pulse Transit Time between cardiac electrical activation and peripheral pulse arrival.
Across both datasets, the ranking of variants remains largely consistent, with M1 performing worst, followed by M2 and M3 with complementary weakness profiles, and M4 exhibiting the smallest overall degradation. The consistent performance gap between M0 and all ablation variants validates that all proposed components contribute synergistically to the final reconstruction quality, with the LBM physics encoding serving as the foundational element upon which other components build.
3.5. Downstream Clinical Validation
Beyond morphological reconstruction fidelity, the ultimate criterion for the generated ECGs is their clinical utility—specifically, whether they preserve the pathological biomarkers required for automated diagnosis. To evaluate this, we trained a standard VGG-19 [35,36] classifier on ground-truth real ECG signals to serve as a fixed diagnostic evaluator. This pre-trained network was then used to predict cardiac pathologies from three types of input signals: the ground-truth Real ECG, the reconstructed ECG generated by each method (Gen ECG), and the raw PPG signal. High classification performance on generated ECGs indicates that the generative model successfully recovers the subtle diagnostic features rather than merely fitting low-frequency trends.
We conducted evaluations on two distinct diagnostic tasks (Table 3):
3.5.1. Evaluation Metrics
To comprehensively assess the diagnostic fidelity of reconstructed ECG signals, we employ a suite of classification metrics that capture both overall performance and class-specific diagnostic accuracy. Let , , , and denote the number of true positives, true negatives, false positives, and false negatives, respectively.
Accuracy represents the proportion of correctly classified samples among all predictions, providing an overall measure of classification correctness:
Precision is defined as the ratio of true positive predictions to all positive predictions, indicating the reliability of positive classifications:
Recall, also known as sensitivity, represents the ratio of true positive predictions to all actual positive samples, reflecting the model’s ability to detect pathological conditions:
The F1-Score is the harmonic mean of precision and recall, providing a more balanced performance measure when class distributions are imbalanced:
The Area Under the ROC Curve (AUC) is a threshold-independent metric that quantifies the discriminative capability across all classification thresholds by computing the area under the Receiver Operating Characteristic curve. AUC values range from , where indicates random guessing and represents perfect classification [37].
For the multi-class arrhythmia diagnosis task, we extend these metrics using macro-averaging. Let K denote the total number of classes, and let and represent the precision and recall for class k, respectively. The macro-averaged precision and recall are defined as:
Macro-averaging ensures equal weighting across all five arrhythmia categories, preventing dominant classes from masking performance on rare but clinically critical conditions such as Ventricular Fibrillation.
To establish a comprehensive benchmark, we evaluate and compare three types of signals: (1) Real ECG, the ground-truth ECG recordings serving as the upper-bound reference; (2) Gen ECG, the ECG waveforms reconstructed by PhysDiff-LBM and baseline methods; and (3) PPG, the raw photoplethysmography input signal serving as the lower-bound reference. The diagnostic gap between Gen ECG and Real ECG quantifies the information loss during reconstruction, while the improvement from PPG to Gen ECG demonstrates the value added by the translation process. A successful reconstruction method should achieve Gen ECG performance approaching that of Real ECG while substantially exceeding PPG performance.
3.5.2. Clinical Validation on Atrial Fibrillation
Before quantifying the diagnostic performance, we first visually inspect the reconstruction quality across different cardiac rhythms. Figure 10 illustrates the waveforms of Raw PPG, Real ECG, and our generated Gen ECG for both Normal Sinus Rhythm (NoAF) and Atrial Fibrillation (AF). As observed, the Raw PPG signals (top row) exhibit smoothed peaks, where the fine-grained timing of depolarization is often obscured. In contrast, the Gen ECG (bottom row) sharpens these features, strictly aligning with the Real ECG (middle row). Crucially, in the AF scenario, our model captures the characteristic irregularity of the rhythm, a biomarker essential for clinical diagnosis that is less distinct in the PPG domain.
To validate the clinical utility, we utilized the pre-trained VGG-19 network to classify cardiac pathologies. Table 4 summarizes the performance metrics with 95% confidence intervals (CIs), while Figure 11 provides a detailed visualization of the classification boundaries and error distribution.
We computed 95% CIs via 1000-resample bootstrapping. Statistical tests (paired permutation for F1 and DeLong’s for AUC) confirm that our Gen ECG significantly outperforms the Raw PPG baseline ( ).
The quantitative results reveal distinct performance characteristics inherent to each signal modality. The Raw PPG baseline demonstrates a notable imbalance between sensitivity (Recall) and reliability (Precision), while the optical signal successfully captures the majority of pathological events, its comparatively lower Precision suggests a tendency toward false positives. This limitation likely stems from the susceptibility of PPG to motion artifacts, which can mimic the irregularities of arrhythmia.
The transition from PPG to Gen ECG addresses this critical bottleneck. As visualized in the confusion matrices in Figure 11, the Gen ECG significantly tightens the classification distribution. Comparing the off-diagonal elements between Figure 11a,b, we observe that the generative process effectively suppresses the ambiguity that leads to misclassification in the PPG domain. By enforcing electrophysiological constraints, the model acts as a semantic filter, reducing the noise that classifiers typically confound with pathological features.
Furthermore, the performance of the generated signals closely converges with that of the ground-truth Real ECG. The minimal gap across all metrics (and overlapping confidence intervals in certain metrics) indicates that the information loss during the cross-domain translation is negligible for diagnostic purposes. The high alignment in both the statistical metrics (Table 4) and the ROC space (Figure 11d) confirms that the reconstructed ECGs retain the subtle, clinically relevant biomarkers necessary for automated decision-making.
3.5.3. Clinical Validation on Multi-Class Arrhythmia Diagnosis
Beyond binary classification, we further evaluate the diagnostic fidelity of generated ECGs on a more challenging multi-class arrhythmia task using the PhysioNet Challenge 2015 dataset. This benchmark encompasses five distinct cardiac conditions: Asystole, Bradycardia, Tachycardia, Ventricular Tachycardia (VT), and Ventricular Fibrillation (VF/VFib), each presenting unique electrophysiological signatures that must be preserved through the cross-domain translation process.
Table 5 summarizes the per-class and macro-averaged classification metrics across the three signal modalities. To maintain clarity while ensuring statistical rigor, we report the 95% confidence intervals for the overall Macro Average metrics. Figure 12 provides the corresponding ROC curves and normalized confusion matrices for detailed performance visualization.
The multi-class classification results further substantiate the clinical utility of our generated ECGs. Across all arrhythmia categories, the Gen ECG consistently outperforms the Raw PPG baseline in terms of macro-averaged metrics, with particularly pronounced improvements observed for life-threatening conditions such as Asystole and Ventricular Tachycardia. This enhanced sensitivity for critical cardiac events underscores the capacity of the generative model to recover diagnostically relevant features that are otherwise attenuated in the optical domain.
Notably, the Gen ECG achieves performance that not only approaches but in certain cases surpasses that of the Real ECG reference, especially for rate-related arrhythmias including Bradycardia and Tachycardia. This counterintuitive observation suggests that the generative process may function as an implicit denoising mechanism, yielding cleaner electrophysiological representations that enhance classifier discriminability for conditions primarily characterized by rhythm alterations rather than subtle morphological abnormalities.
The confusion matrices in Figure 12 exhibits considerable inter-class confusion, particularly between conditions sharing similar heart rate profiles but differing in their underlying electrical substrates—such as Asystole versus VF/VFib, and Bradycardia versus VT. These misclassifications are physiologically expected, as the optical signal inherently lacks the resolution to capture morphological distinctions encoded in the electrical domain. By recovering the QRS complex morphology and rhythm regularity, the Gen ECG effectively disambiguates these clinically distinct conditions, achieving diagonal dominance comparable to the Real ECG reference.
The ROC curves further demonstrate that the Gen ECG maintains robust discriminative performance across varying classification thresholds. The curves for Gen ECG closely track those of the Real ECG across most arrhythmia categories while consistently dominating the PPG curves, confirming that the reconstructed signals preserve the essential pathological biomarkers required for reliable automated diagnosis. The marginal performance gap observed in complex conditions such as Ventricular Fibrillation likely reflects the inherent challenge of reconstructing high-frequency chaotic waveforms from bandwidth-limited PPG signals, representing a potential direction for future methodological refinement.
4. Discussion
The reconstruction of high-fidelity ECG signals from photoplethysmography represents a fundamental challenge in ubiquitous health monitoring, primarily due to the complex non-linear mapping between the mechanical pulsatile flow and the underlying electrical cardiac activity. Our results demonstrate that PhysDiff-LBM effectively addresses this ill-posed inverse problem by integrating mesoscopic fluid dynamics into a generative diffusion framework. Unlike traditional deep learning approaches that rely solely on statistical correlations, our method leverages the Lattice Boltzmann Method to impose explicit hemodynamic constraints, effectively grounding the generative process in physical reality. This structural coupling ensures that the reconstructed waveforms are not only texturally realistic but also physiologically compliant with the input boundary conditions.
A comparative analysis with state-of-the-art baselines elucidates the specific advantages of our mesoscopic formulation, while adversarial frameworks like CardioGAN excel at generating sharp waveforms, they frequently suffer from mode collapse and “rhythmic hallucination,” where the generated beats fail to align temporally with the input PPG during irregular rhythms. In contrast, PhysDiff-LBM utilizes the kinetic energy and momentum features derived from the PPG to rigidly anchor the temporal alignment, preventing phase shifts even in the presence of arrhythmias. Furthermore, compared to deterministic regression models (e.g., RecQSR) which tend to produce over-smoothed signals by averaging out high-frequency details, our diffusion-based approach preserves the spectral richness of the ECG, capturing subtle morphological variations in the QRS complex and T-wave. Crucially, our method also outperforms macroscopic physics-informed baselines (PhysDiff-NS). We attribute this to the fact that standard Navier–Stokes formulations often assume constant viscosity, which oversimplifies the complex, non-Newtonian behavior of blood flow in microvascular beds. By operating at the mesoscopic scale with a learnable collision operator, PhysDiff-LBM better captures these heterogeneous rheological properties, leading to superior reconstruction fidelity.
The clinical significance of these findings is underscored by the model’s performance in downstream diagnostic tasks. The substantial improvement in Atrial Fibrillation detection and multi-class arrhythmia classification indicates that the generated signals retain the critical diagnostic features necessary for automated screening. By accurately recovering the “irregularly irregular” rhythm characteristic of AF without introducing generation artifacts, PhysDiff-LBM effectively functions as a software-defined sensor enhancement, potentially upgrading standard consumer wearables into medical-grade diagnostic tools. Moreover, the robustness demonstrated on the out-of-distribution VitalDB dataset suggests that our physics-constrained inductive bias facilitates strong generalization across diverse physiological states, ranging from intensive care units to intraoperative anesthesia settings.
A critical factor in realizing this wearable diagnostic potential is the model’s robustness to motion artifacts, while our preprocessing pipeline utilizes standard bandpass filtering (via NeuroKit2) to mitigate basic sensor noise, severe motion artifacts in real-world ambulation often overlap with physiological frequency bands, rendering simple linear filtering insufficient. Fortunately, PhysDiff-LBM provides an inherent, two-fold defense against such disturbances without requiring explicit artifact-simulation training. First, motion artifacts typically present as non-physiological fluctuations that violate cardiovascular fluid dynamics. Because our LBM encoder rigorously enforces mass and momentum conservation, these unphysical noise components are naturally suppressed during the macroscopic moment projection step. Second, the region-disentangled diffusion backbone operates as a strong generative prior. Rather than deterministically mapping noisy inputs to outputs, it guides the reverse diffusion process toward a learned manifold of clean ECG signals. Guided by the topological branch (supervised by the structural ROI masks defined during preprocessing), the model effectively bridges artifact-induced gaps and repairs distortions, preserving both the morphological fidelity and the underlying cardiac rhythm.
Despite these promising advances, several avenues for future research remain. The primary limitation of the current framework lies in the computational cost associated with the iterative sampling process of diffusion models, which poses a challenge for real-time deployment on resource-constrained edge devices. Future work could explore consistency distillation or latent diffusion techniques to accelerate inference speeds without compromising reconstruction quality. Additionally, while this study focuses on single-lead ECG synthesis, extending the LBM-guided framework to reconstruct full 12-lead ECGs from single-point PPG signals remains an open and transformative direction. Such an extension would require incorporating vectorcardiographic constraints to model the spatial propagation of electrical potential, further closing the gap between wearable monitoring and comprehensive clinical cardiology.
5. Conclusions
In this paper, we presented PhysDiff-LBM, a novel physics-informed generative framework designed to reconstruct medical-grade ECG signals from ubiquitous PPG recordings. By embedding a differentiable Lattice Boltzmann Method module within a diffusion probabilistic model, we successfully introduced mesoscopic kinetic constraints into the generation process, effectively bridging the domain gap between hemodynamic mechanics and cardiac electrophysiology.
Our extensive experimental evaluation on the MIMIC-III and VitalDB datasets demonstrates that PhysDiff-LBM establishes a new state-of-the-art in reconstruction fidelity, significantly outperforming existing adversarial and regression-based baselines. Crucially, the integration of fluid dynamic principles enables the model to capture the complex non-linear mappings of the cardiovascular system, ensuring precise temporal alignment and morphological consistency even in challenging arrhythmia scenarios. The superior performance observed in downstream clinical tasks, particularly in atrial fibrillation detection, validates the practical utility of our generated signals for automated diagnosis.
Ultimately, this work highlights the potential of synergizing physical laws with generative artificial intelligence. By moving beyond black-box learning to a physics-aware paradigm, PhysDiff-LBM provides a robust and interpretable solution for non-invasive cardiac monitoring, paving the way for accessible, continuous, and reliable cardiovascular healthcare through standard wearable devices.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1World Health Organization Cardiovascular Diseases (CV Ds)Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)(accessed on 10 January 2026)
- 2Clifford G.D. Liu C. Moody B. Lehman L.-W. Silva I. Li Q. Johnson A.E.W. Mark R.G. AF classification from a short single lead ECG recording: The Physio Net/Computing in Cardiology Challenge 2017 Comput. Cardiol.2017441410.22489/Cin C.2017.065-469PMC 597877029862307 · doi ↗ · pubmed ↗
- 3Goldberger A.L. Amaral L.A.N. Glass L. Hausdorff J.M. Ivanov P.C. Mark R.G. Mietus J.E. Moody G.B. Peng C.-K. Stanley H.E. Physio Bank, Physio Toolkit, and Physio Net: Components of a new research resource for complex physiologic signals Circulation 2000101 e 215e 22010.1161/01.CIR.101.23.e 21510851218 · doi ↗ · pubmed ↗
- 4Allen J. Photoplethysmography and its application in clinical physiological measurement Physiol. Meas.200728 R 1R 3910.1088/0967-3334/28/3/R 0117322588 · doi ↗ · pubmed ↗
- 5Charlton P.H. Kyriacou P.A. Mant J. Marozas V. Chowienczyk P. Alastruey J. Wearable Photoplethysmography for Cardiovascular Monitoring Proc. IEEE 202211035538110.1109/JPROC.2022.3149785 PMC 761254135356509 · doi ↗ · pubmed ↗
- 6Pinto R.A. De Oliveira H.S. Souto E. Giusti R. Veras R. Inferring ECG Waveforms from PPG Signals with a Modified U-Net Sensors 202424604610.3390/s 2418604639338791 PMC 11436109 · doi ↗ · pubmed ↗
- 7Fang X. Jin J. Wang H. Liu C. Cai J. Nie G. Li J. Hong S. PPG Flow ECG: Latent Rectified Flow with Cross-Modal Encoding for PPG-Guided ECG Generation and Cardiovascular Disease Detectionar Xiv 20252509.19774
- 8Vo K. El-Khamy M. Choi Y. PPG-to-ECG Signal Translation for Continuous Atrial Fibrillation Detection via Attention-Based Deep State-Space Modelingar Xiv 20232309.1537510.1109/EMBC 53108.2024.1078163040039489 · doi ↗ · pubmed ↗
