Medical Data over Sound—CardiaWhisper Concept

Radovan Stojanović; Jovan Đurković; Mihailo Vukmirović; Blagoje Babić; Vesna Miranović; Andrej Škraba

PMC · DOI:10.3390/s25154573·July 24, 2025

Medical Data over Sound—CardiaWhisper Concept

Radovan Stojanović, Jovan Đurković, Mihailo Vukmirović, Blagoje Babić, Vesna Miranović, Andrej Škraba

PDF

Open Access

TL;DR

CardiaWhisper is a medical system that uses sound to transmit vital signs from wearable sensors to devices like smartphones, offering a low-power alternative to traditional wireless methods.

Contribution

CardiaWhisper introduces a novel medical data-over-sound framework for transmitting biomedical signals using acoustic transmission.

Findings

01

CardiaWhisper successfully transmits ECG, PPG, RR, and ACC signals via sound to nearby devices.

02

The system is evaluated for performance metrics like SNR, power consumption, and noise immunity in realistic scenarios.

03

CardiaWhisper offers a low-power, eco-friendly alternative to RF or Bluetooth-based medical wearables.

Abstract

Data over sound (DoS) is an established technique that has experienced a resurgence in recent years, finding applications in areas such as contactless payments, device pairing, authentication, presence detection, toys, and offline data transfer. This study introduces CardiaWhisper, a system that extends the DoS concept to the medical domain by using a medical data-over-sound (MDoS) framework. CardiaWhisper integrates wearable biomedical sensors with home care systems, edge or IoT gateways, and telemedical networks or cloud platforms. Using a transmitter device, vital signs such as ECG (electrocardiogram) signals, PPG (photoplethysmogram) signals, RR (respiratory rate), and ACC (acceleration/movement) are sensed, conditioned, encoded, and acoustically transmitted to a nearby receiver—typically a smartphone, tablet, or other gadget—and can be further relayed to edge and cloud…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals6

glucose DoS water AgCl Ag oxygen

Diseases9

ZC sleep apnea cough COVID-19 arrhythmia FM rhythm disorder injury to cancer

Figures11

Click any figure to enlarge with its caption.

Funding4

—Innovation Fund of Montenegro
—European Union Interreg VI ARCA project
—Slovenian Research and Innovation Agency
—Ministry of Higher Education, Science, and Innovation of the Republic of Slovenia

Keywords

medical wearablesData Over Sound (DoS)IoT in healthcareedge computingreal-time signal processingmodulation and demodulationJavaScriptnear-ultrasound communication

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Body Area Networks · Context-Aware Activity Recognition Systems · ECG Monitoring and Analysis

Full text

1. Introduction

Health is humanity’s greatest wealth, and regular monitoring is essential for timely intervention and effective treatment. Today, a wide range of health parameters can be self-monitored using wearable medical devices, often referred to as healthcare wearables or self-meters. These include blood pressure monitors, glucose meters, temperature sensors, oxygen saturation monitors, smartwatches, Holter monitors, loop recorders, pulse and rhythm analyzers, fall detectors, and more. Recent advances in wearable sensing technologies have significantly enhanced the accuracy and functionality of these devices, bringing their performance closer to that of clinical instruments [1,2]. The healthcare wearables market is expanding rapidly and is projected to reach EUR 140 billion by 2030, with one in three adults expected to incorporate these devices into their daily lives. Furthermore, the COVID-19 pandemic has accelerated the adoption and integration of wearables within broader telemedicine networks [3,4].

Despite these advancements, several research and innovation challenges persist in the design of healthcare wearables. Key issues include sensor accuracy, noise reduction, advanced signal processing, reliable communication in both near-field and far-field scenarios, data compression and storage, power efficiency, feature extraction and classification, AI-based assistance, and seamless integration with telemedicine and edge computing systems [5,6]. Additional limitations include high costs for consumers in developing countries, setup difficulties, limited memory for long-term recordings, and compatibility issues across various mobile operating systems (Android, iOS, Windows). Despite the widespread use of wireless technologies such as Bluetooth, Wi-Fi, and ZigBee in wearable healthcare systems, these protocols face critical limitations in medical environments, including electromagnetic interference (EMI) with sensitive equipment and high power consumption [7,8]. Studies have shown that EMI can cause hazardous incidents in medical devices and disrupt deep brain stimulation systems, underscoring the need for alternative communication methods—such as near-ultrasound acoustic communication—that are both low-power and resistant to EMI.

Data over sound (DoS) is a well-established technique for encoding and transmitting data using audible or inaudible sound waves. Originating with the Morse telegraph, DoS has found applications in telecommunications, touch-tone systems (DTMF, dual-tone multi-frequency signaling), and modem communications. Sound-based communication is also widely used underwater, enabling data transfer between submarines and underwater robots over distances ranging from tens to hundreds of kilometers. Data can be transmitted acoustically through air, wires, water, or even solid objects [9,10,11,12]. Audible frequencies typically range from 20 Hz to 20 kHz, while inaudible communication utilizes near-ultrasonic and ultrasonic frequencies above 20 kHz. The basic operation of DoS involves three main steps: (1) encoding, where analog or digital data are converted into a sound signal; (2) transmission, where the sound is emitted by a speaker or other sound-wave generator; and (3) reception, where a microphone captures the sound and decodes it back into data. Various modulation techniques—such as frequency-shift keying (FSK), amplitude-shift keying (ASK), phase-shift keying (PSK), and orthogonal frequency-division multiplexing (OFDM)—are commonly employed, with modulation and demodulation typically performed in software.

Beyond historical and underwater uses, DoS is now employed in a variety of real-world scenarios, including contactless payments, device pairing (e.g., TVs, smart speakers), authentication and presence detection, marketing and retail (audio beacons), toys, and offline data transfer in environments lacking Bluetooth or Wi-Fi connectivity. DoS offers several advantages: it works on most devices equipped with a speaker and microphone, does not require Internet or Bluetooth, can operate through walls (depending on frequency), requires minimal setup, consumes little power, supports a wide range of platforms, enables secure and private data transfer, integrates seamlessly without special tools, and is scalable and robust across extreme environments. It also supports offline, peer-to-peer, and one-to-many communication.

However, DoS has certain limitations. Data rates are generally slower than Bluetooth or Wi-Fi; it is more susceptible to noise and interference and has a limited range and bandwidth; and transmission can be attenuated by certain building materials. In terms of health risks, electromagnetic (EM)-based wireless communication has been extensively studied and is generally considered safe at low exposure levels, though concerns about long-term RF exposure remain. In contrast, DoS does not emit electromagnetic radiation and is not associated with cancer risk, but it can cause annoyance, stress, or hearing discomfort if misused (e.g., at high volumes)—a concern that is minimized in near-field communication. Overall, current evidence suggests that EM-based wireless technologies carry a low-to-moderate health risk, while DoS remains a low-risk alternative [13,14].

In this study, we propose the “CardiaWhisper” concept, which extends the data over sound (DoS) paradigm into the field of medical data over sound (MDoS). While microphones, smart speakers, and other acoustic sensors are already used in medicine for applications such as monitoring breathing patterns, snoring, coughing, sneezing, fall detection, distress sounds, voice biomarkers, probe health, on-body communication, and near-field implant communication, the direct application of MDoS remains largely unexplored. We contend that MDoS has the potential to address several outstanding challenges in medical wearables, including local connectivity, gateway integration, reduced power consumption, low cost, compatibility with widely available consumer devices, and support for haptic interfaces and smart clothing.

Section 2 presents the architecture, operational principles, and signal processing concepts underlying CardiaWhisper. Section 3 provides preliminary test results, evaluating key parameters and discussing design challenges, limitations, and potential applications. The paper concludes with a summary and an overview of the relevant literature.

2. Methodology

Figure 1 illustrates the principle of medical signal transmission over sound. As a case study, the electrocardiogram (ECG) signal is used, captured via three electrodes in Einthoven’s triangle configuration placed on the patient’s chest. The signal is amplified, preprocessed, and transmitted by the CardiaWhisper transmitter (TX) device. Instead of relying on traditional electromagnetic radio waves, the TX employs a piezo speaker to transmit the ECG signal in the audio range of 16 kHz to 20 kHz. The receiver (RX), which may be a mobile phone, desktop computer, or tablet, captures the acoustic signal through its built-in microphone, processes the data, and can perform additional tasks such as visualization, decision support, logging, and networking.

Multiple receivers can simultaneously capture the transmitted signal, each using its own microphone. This capability ensures both redundancy and flexibility. For example, a doctor can view the data on a laptop, while a family member, the patient, or a nurse can monitor it on a mobile phone or tablet. The system design is scalable, allowing any number of compatible devices to join the network without the need for additional transmitters or specialized receivers. In addition, receivers can serve as gateways to edge computing platforms, other IoT devices, or data collectors. Local and remote staff, as well as the patient, can monitor the data in real time and take appropriate action as needed.

The hardware/software (HW/SW) architecture of the CardiaWhisper system is shown in Figure 2 illustrating the sequential chain of the sensor, transmitter, and receiver.

2.1. Transmitter

The CardiaWhisper transmitter, illustrated in Figure 2a, enables the transformation of signals as follows:

[eqn]

where $[eqn]$ is the analog electrocardiogram signal, $[eqn]$ is a digital input, and $[eqn]$ is the voltage-modulated output.

The transmitter consists of a signal amplifier (AMP), encoder (ENC), speaker driver (SD), speaker (SC), and power management (PM) module. AMP is a biomedical signal amplifier, specifically an ECG amplifier, which amplifies small electrical signals generated by the heart (typically 0.1 mV to 1 mV) and captured by electrodes. This results in a voltage signal $[eqn]$ at a level suitable for further processing (0 V to 5 V).

The signal $[eqn]$ is then encoded into a form suitable for transmission over air or wire. Analog and digital signals, such as a character $[eqn]$ , a string, or any message, can be encoded for transmission. In the CardiaWhisper system, encoding is achieved through frequency modulation (FM), implemented either by a voltage-controlled oscillator (VCO) in a custom-designed chip or by a general-purpose microcontroller (MC).

The output from the VCO or MC is fed to the speaker driver (SD), which drives the piezo speaker (SC) with the FM-modulated signal $[eqn]$ . The instantaneous transmission frequency is given by

[eqn]

where $[eqn]$ and $[eqn]$ are the frequencies corresponding to the minimum ( $[eqn]$ ) and maximum ( $[eqn]$ ) values of $[eqn]$ , respectively. The frequency response of the speaker (SC) should be flat in the range $[eqn]$ .

A piezo speaker is selected for its low power consumption and high impedance. In our implementation, $[eqn]$ and $[eqn]$ correspond to the inaudible frequency range, specifically 16–20 kHz. For microcontroller-based implementations, FM modulation is achieved in software (firmware) by configuring built-in timers to operate in frequency generation mode. While MC-based modulation has lower resolution, it provides greater flexibility and simplifies the encoding of both analog and digital signals.

The power supply consists of a 9 V battery (BAT), with supply voltages $[eqn]$ and $[eqn]$ .

2.2. Receiver

In its basic configuration, the receiver (Figure 2b) consists of a microphone (MIC) and its amplifier, which are integral parts of the Audio Stack/Interface (AS) within any device, such as a mobile phone or tablet. The modules for acquisition, filtering, demodulation, and visualization are implemented in software. A microphone (MIC) captures the sound wave, in which the modulated ECG signal is embedded, and converts it into an electrical signal. This signal is then amplified and digitized by the AS, resulting in $[eqn]$ , the digital equivalent of the transmitter’s analog signal $[eqn]$ . Here, $[eqn]$ represents a discrete-time vector of audio samples, commonly denoted as $[eqn]$ .

The sampling frequency of the AS is $[eqn]$ , typically $[eqn]$ or $[eqn]$ . In addition to the sampling circuit, the AS includes several additional components, such as gain control, a limiter, echo cancellation, and more. The resolution of the AS generally ranges from 16 to 24 bits. The signal $[eqn]$ is then forwarded to the software signal processing block for further processing.

2.3. Signal Processing Approach

Here, we address a segment of signal processing algorithms, focusing primarily on software-based signal processing tasks performed on the receiver side (see Figure 2b), specifically those related to decoding medical signals embedded in sound. The illustration is implemented in MATLAB^®^ 24.1.0 [15], allowing for both methodological analysis and observation of effects.

The process of demodulating the $[eqn]$ signal begins with its Band-Pass Filtering (BPF) within the range $[eqn]$ . The BPF can be implemented in both the time and frequency domains. In the time domain, the BPF is realized as an N-th order IIR Butterworth filter, represented by the following difference equation:

[eqn]

where $[eqn]$ and $[eqn]$ are the feedback and feedforward coefficients, respectively, calculated from the filter design. An order of $[eqn]$ is sufficient for satisfactory filtering.

In the spectral domain, the BPF1 filter is implemented as

[eqn]

[eqn]

where $[eqn]$ and $[eqn]$ are the operators of the forward and inverse Fast Fourier Transform (FFT), respectively; $[eqn]$ is the frequency response of the window filter; and $[eqn]$ denotes element-wise multiplication (Hadamard product).

Figure 3 shows the time representation of the modulated signal $[eqn]$ , its FFT spectrum, and the technique of spectrum-based filtering implemented with a window $[eqn]$ .

Several FM demodulation (FM DEM) methods—in fact, estimators of instantaneous frequency—are tested in order to select the most appropriate for application in the CardiaWhisper system: the Hilbert transform-based estimator (HIL), implemented in versions HIL1 and HIL2; the derivative-based estimator (DIFF); and the zero crossing (ZC)-based estimator.

FM demodulation using the Hilbert transform is a technique that exploits the relationship between the phase and frequency of the FM signal. By extracting the instantaneous phase, differentiating it to obtain the frequency, and then recovering the message signal, this method provides a non-coherent and efficient way to demodulate FM signals.

Two implementations of Hilbert-transform-based FM demodulation were evaluated, denoted as HIL1 and HIL2, with MATLAB-style pseudocode provided below:

HIL1

y = hilbert(V1(n));

yd = y(2:end) .* conj(y(1:end-1));

V2(n) = angle(yd);

HIL2

y = hilbert(V1(n));

yd = unwrap(angle(y));

V2(n) = diff(yd) / (2pi(1/Fs));

Here, hilbert denotes the Hilbert transform operator, angle returns the phase of the complex analytic signal, and unwrap removes phase discontinuities. In the HIL1 equation, $[eqn]$ is obtained by element-wise multiplication of $[eqn]$ (the analytic signal starting at the second sample) with the complex conjugate of $[eqn]$ . $[eqn]$ is the sampling frequency. $[eqn]$ is then filtered by BPF2 to obtain the final demodulated signal:

[eqn]

BPF2 is an Nth-order Butterworth filter, as in Equation (2), and additionally performs DC blocking. Satisfactory results are obtained with $[eqn]$ . The outputs of HIL1 and HIL2 are illustrated in Figure 3 as “blue” and “magenta” plots, respectively.

The derivative-based estimator computes the difference of the filtered signal $[eqn]$ to produce an amplitude-modulated signal $[eqn]$ , which is then filtered with BPF2 to recover the original message:

[eqn]

In our practical implementation, the band-pass filter (BPF1) used to isolate the modulated FM carrier within the 16–20 kHz band is realized as a digital infinite impulse response (IIR) Butterworth band-pass filter, implemented in the time domain.

Specifically, we use a 4th-order (or, in some tests, 6th-order) digital Butterworth band-pass filter, designed using the standard bilinear transform approach. The Butterworth design was chosen for its maximally flat frequency response in the passband and for moderate computational complexity, making it suitable for real-time processing in both MATLAB and JavaScript/Web Audio API environments. The filter is implemented as a cascade of biquad (second-order) sections for numerical stability, with the following specifications:

Sampling Frequency: 48 kHz;
Passband: 16–20 kHz.
Filter Type: IIR Butterworth, order 4 or 6.
Design Method: Bilinear transform with digital frequency pre-warping to preserve the sharpness of the band edges near the Nyquist frequency.
Causality: The filter is fully causal, implemented in the standard direct form II transposed structure for efficiency.
Zero-Phase Filtering: For offline MATLAB analysis (i.e., to demonstrate ideal performance), we sometimes apply zero-phase filtering using filtfilt to eliminate group delay; however, for real-time and browser-based applications, the filter is run in causal, single-pass mode.
Transition Bands: With a 4 kHz wide passband at such a high frequency, filter order is a critical trade-off. We found that a 4th-order design provides acceptable attenuation (>30 dB) outside the 16–20 kHz band, but a 6th-order filter offers steeper roll-off if increased selectivity is required.
Stability and Real-Time Suitability: The chosen implementation does not introduce instability or excessive group delay in the passband. Group delay at the upper edge (20 kHz) is moderate, and pre-detection of signal edges is not required for our application.

Given the close proximity of the upper band edge (20 kHz) to the Nyquist frequency (24 kHz for a 48 kHz sample rate), care was taken to avoid aliasing and to ensure that filter coefficients were designed with appropriate pre-warping.

Due to inherent limitations of digital filters operating at high frequencies near the Nyquist limit, practical implementations necessarily differ from theoretical models. Nevertheless, the implemented algorithm provides sufficient isolation of the modulated carrier to enable reliable demodulation with acceptable levels of distortion.

The demodulated signal obtained via the DIFF method is shown as the “green” plot in Figure 3.

In the case of the zero-crossing (ZC) estimator/discriminator, the time $[eqn]$ is defined as the difference between the current and previous zero crossing:

[eqn]

where

[eqn]

The value $[eqn]$ is then passed through the BPF2 filter to obtain the final demodulated signal:

[eqn]

where c is a proportional constant. The ZC-demodulated signal is illustrated in Figure 3 (red plot). As can be seen, the ZC technique is the simplest to implement while still providing satisfactory results. Figure 4 illustrates the demodulation processing steps for this method.

There are several additional methods for software demodulation of FM signals, such as demodulation using the FFT spectrum. As the FM signal varies, the maximum values (i.e., the highest peaks) in the FFT spectrum shift accordingly. The positions of these peaks are proportional to the instantaneous frequency of the FM signal. By tracking the peak frequency over time—specifically, by identifying the maximum value in each FFT frame—it is possible to reconstruct the instantaneous frequency.

Phase-Locked Loop (PLL)-based FM demodulation is another widely used technique for demodulating frequency-modulated signals, particularly due to its efficiency and simplicity. A PLL is a feedback control system that synchronizes the phase of a local oscillator with the phase of an incoming signal. In FM demodulation, the PLL tracks the instantaneous frequency of the FM signal, and the phase difference between the PLL’s oscillator and the incoming signal yields the message signal. The PLL can be implemented in either a coherent or non-coherent manner, depending on how the system is designed to synchronize with the received signal.

3. Real-Time Signal Processing Implementation

The previously described demodulation blocks—BPF1, FM DEM, and BPF2—were implemented in MATLAB for the purposes of concept development and testing. However, this implementation is not suitable for real-time applications. To enable real-time processing, the appropriate code and algorithms must be ported to a real-time environment, ideally one that is open source and platform-independent. For this reason, a real-time solution using JavaScript, HTML, and CSS was chosen, with the implementation logic illustrated in Figure 5.

Many of the functional blocks, such as AS, BPF1, ScriptPreprocessorNode, and AudioWorklet, are components of the Web Audio API—a powerful JavaScript API that enables audio manipulation and analysis directly in the browser. The Web Audio API is part of the modern web platform and is widely supported across most browsers, including Chrome, Firefox, Safari, and Edge.

The functional blocks are utilized from the Web Audio API according to the following signal flow logic [16,17,18]:

source → node1 → node2 → … →nodek → →custom audio processing script → destination [19].

The FM DEM, BPF2, and visualization blocks are implemented within a custom audio processing script, which is activated upon receiving an Audio Processing Event (APE) from the Web Audio API. The ScriptProcessorNode and AudioWorklet are employed to enable direct, synchronized access to input data, allowing for real-time, frame-by-frame manipulation with precise timing. The commonly used AnalyserNode from the Web Audio API cannot synchronize the process and is therefore not applicable in this scenario.

It is important to note that ScriptProcessorNode operates on the main thread and may experience glitches if the user interface is busy, making it unsuitable for high-precision, real-time audio processing. In contrast, AudioWorklet runs on its own thread, avoiding such glitches and providing smooth and reliable audio processing. For these reasons, AudioWorklet is strongly recommended for real-time applications requiring high precision [20].

Unlike MATLAB, the processing blocks BPF1, FM DEM, and BPF2 are implemented entirely in the time domain in the web-based approach. BPF1 utilizes built-in Web Audio API functions, while FM DEM and BPF2 are custom-implemented in JavaScript. Figure 6 and Figure 7 illustrate the real-time methodology for processing medical data over sound using the Web Audio API and JavaScript code.

Figure 6 shows the scenario in which the analog ECG signal, $[eqn]$ , is transmitted over sound. The upper diagram presents an experimental spectrogram recording obtained using the Spectrogram application [21], with the following parameters: sampling rate = 48 kHz, FFT size = 1024 bins (47 Hz/bin), decimation = 5 (1.5 Hz/bin at DC), window function = Hamming, desired transform interval = 10 ms (100 Hz), and exponential smoothing factor = 0.1. The lower left panel displays the digitized signal $[eqn]$ in the form of $[eqn]$ after preprocessing with BPF1, implemented in JavaScript using the Web Audio API, with the following filter parameters: BPF1 = [15 kHz, 20 kHz], BPF2 = [0.5, 15] Hz, ZC demodulation, and gain = 100. The lower right panel shows the reconstructed, decoded ECG signal $[eqn]$ .

Figure 7 presents a similar situation, where the $[eqn]$ signal is replaced by a digital value or character—for example, the ASCII character ‘A’ in UART format, with a baud rate of 10 bits/s. The filter parameters are as follows: BPF1 = [15 kHz, 20 kHz], BPF2 = [0.1, 50] Hz, ZC demodulation, and gain = 100. The spectrogram parameters remain the same as described above.

4. Verification and Testing

In this phase of the development and piloting of the CardiaWhisper device, the design methodology and preliminary results are evaluated according to the following criteria: the functionality of the underlying principle, the selection of suitable software demodulation algorithms for platform-independent real-time applications, power consumption, and the communication range.

4.1. Functionality

The functionality of the system was evaluated using the experimental setup shown in Figure 8. The ECG signal, acquired with a standard three-electrode configuration, is captured by the CardiaWhisper device prototype (1) and transmitted to multiple mobile phones and desktop computers, following the general scenario outlined in Figure 1. The electrodes used are of the standard type found in Holter or external loop recorders, capable of continuous 24–72 h recording. These are disposable, adhesive, gel-based Ag/AgCl (silver/silver chloride) surface electrodes. In some experiments, clamp electrodes for the extremities (hands and legs) were also used.

The CardiaWhisper device consists of an ECG front-end based on the AD8232 chip [22], an FM modulator built with a CD4046 VCO [23], and a piezo speaker with appropriate power and spectral response. The transmitter is powered by a $[eqn]$ battery. Additionally, a transmitter running on Arduino NANO was used in tests, particularly for digital signals and ECG signals with reduced resolution.

A control and monitoring mobile phone (2), running the Spectroid Application [21], displays the FFT and STFT spectra of the transmitted ECG signal. This allows the verification of system parameters and communication quality in both indoor and outdoor environments. A second mobile phone (3) is used to run custom-developed JavaScript software for demodulation and visualization, originally developed in Visual Studio Code Version 1.98.2.

Testing was conducted in environments ranging from quiet rooms (30–40 dB) to urban streets (60–70 dB) and normal restaurants (up to 80 dB). Environments exceeding 80 dB were not considered; however, at close proximity (within 10 cm), the system remains functional even in noise levels up to 100 dB.

Users access the application via a web page hosted on a company server, launching the CardiaWhisper software (JavaScript+HTML+CSS) to receive and display the “over-the-air” ECG signal. The desktop computer (4) simultaneously receives the same ECG signal via its built-in microphone and is used for software development, parameter adjustment, testing, and debugging. This experimental setup fully demonstrates the functionality of the CardiaWhisper approach. The JavaScript+HTML+CSS application operates in real time on modestly equipped phones, including older Android and iPhone models such as the Samsung Galaxy J4. ECG signals were sourced both from a simulator and from a group of volunteers.

Figure 9 presents real-time, JavaScript-based demodulation of an ECG signal from a 58-year-old male volunteer, using the CardiaWhisper device and Google Chrome browser decoding software with the following parameters: BPF1 = [15 kHz–20 kHz], BPF2 = [0.5–15 Hz], ZC demodulation, and gain = 1. Both the modulated and demodulated signals are illustrated.

4.2. Signal Quality and SNR Measurements in Indoor Environments

Signal reconstruction quality depends on several factors, including transmitter power, receiver distance, receiver sensitivity, the presence and type of obstacles, wall reflection, and the performance of the reconstruction (demodulation) algorithm. In this phase of the research, we evaluated signal quality at the receiver location by calculating the signal-to-noise ratio (SNR) as follows:

[eqn]

where $[eqn]$ (Root Mean Square) expresses the effective value of the received signal at the point of measurement.

The experimental setup is depicted in Figure 10a. In the indoor area, the transmitter (TX, 10 mW output) emits a frequency-modulated (FM) sound signal with an 18 kHz carrier and a 1 Hz, 1 V_pp_ sinusoidal modulating signal. This signal is detected and analyzed by a microphone connected to a Keuwlsoft spectral analyzer application [24]. The receiver (RX) was placed at various locations inside the premises: in direct line of sight, behind an obstacle, behind an ajar door, and outside (with an open door). The distance d represents the straight-line separation between the transmitter and the receiver. Environmental loudness was varied from silent to loud room conditions.

Figure 10b shows the measured SNR as a function of distance for different scenarios. The accepted thresholds for signal quality were defined as follows: VERY GOOD ( $[eqn]$ dB), GOOD ( $[eqn]$ dB), and FAIR ( $[eqn]$ dB).

As can be observed, even under “Loud + obstacles” conditions, VERY GOOD signal quality ( $[eqn]$ dB) is achieved within a radius of $[eqn]$ m from the TX, and GOOD quality is maintained up to $[eqn]$ m. These results demonstrate the robustness of the system in real-world indoor environments.

4.3. Platform-Independent Implementation

After implementation and testing in MATLAB, the code was translated to JavaScript + HTML + CSS, which serves as a platform-independent solution thanks to its cross-browser compatibility and the “write once, run anywhere” (WORA) paradigm. The zero crossing (ZC), derivative-based (DIFF), and Hilbert-transform-based demodulation algorithms were evaluated in terms of signal reproduction quality, noise resistance, and processing speed.

Figure 11 illustrates representative results for both “calm” environments and noisy environments with background speech or music. As shown, ZC demodulation provides satisfactory results in both cases. The DIFF method also performs well but is less immune to signal transitions, artifacts, and noise. The Hilbert-based demodulation, however, yields the highest quality of signal reconstruction, with superior resolution and noise resistance.

In terms of speed, as shown in Table 1, ZC is the fastest method, followed by DIFF, while the Hilbert implementation is significantly slower due to its computational complexity. Speed testing was conducted on an Intel(R) Core(TM) i5-5350U CPU @ 1.80 GHz, 1801 MHz, with two cores and four logical processors, and 8.00 GB of installed physical memory (RAM). A 10.92-s signal was sampled at a frequency of 48,000 Hz, resulting in 524,160 samples, with tests repeated over 30 trials.

Table 2 summarizes the comparison of four demodulation techniques—HIL1 (Hilbert-based), HIL2 (Hilbert-based), ZC (zero crossing), and DIFF (Slope-based)—evaluating each method based on complexity, implementation suitability, noise immunity, artifact resistance, and real-time suitability.

4.4. Power Consumption

The CardiaWhisper is a low-power device. The ECG signal is acquired using a low-cost ECG amplifier based on the AD8232 chip [22] and is subsequently modulated by a micro-power VCO modulator based on the CD4046B [23], with both components powered at $[eqn]$ . The microcontroller-based modulator utilizes the ATMEGA328P chip, operating at $[eqn]$ and powered at $[eqn]$ . The required working voltages for the modulators are derived from a $[eqn]$ battery using a low-dropout (LDO) regulator. The overall power consumption of the transmitter for different configurations is summarized in Table 3.

These results are comparable to those of standard Bluetooth Low Energy (BTLE) microcontrollers, which typically consume approximately $[eqn]$ in active mode during real-time signal streaming. The measured consumption is somewhat higher than that of ultra-low-power configurations, such as the nRF52 family.

4.5. Range

CardiaWhisper has been tested primarily in indoor environments. For reference, a sensitivity of $[eqn]$ was measured at a distance of $[eqn]$ from the emitter (speaker), with $[eqn]$ corresponding to a full-scale $[eqn]$ audio signal, which is difficult to achieve in air with an audio codec gain of 1. Satisfactory signal reconstruction was achieved down to $[eqn]$ in indoor use, corresponding to approximately $[eqn]$ in open space and about $[eqn]$ in a furnished environment. Reliable operation was also observed through walls at distances up to $[eqn]$ . It should be noted that the achievable range depends on multiple parameters, and CardiaWhisper is generally not intended for distances greater than $[eqn]$ . The simplest way to increase the sensitivity of CardiaWhisper is to adjust the gain of the microphone amplifier within the audio codec; in some applications, gains of up to 200 were used.

5. Discussion

Through work on the CardiaWhisper project, the authors have gained valuable insights into the use of sound as a medium for transmitting medical data. Key findings include, but are not limited to, the following:

The classic concept of “data over sound” can be extended to include medical data over sound, or indeed any sensor data over sound. Data may be analog, digital, or mixed.
Air, fluids, and wires can all serve as transmission media, although the effective range is typically limited to around $[eqn]$ – $[eqn]$ indoors, depending on transmitter power and receiver sensitivity.
Transmission of analog signals supports 2–3 channels, while digital transmission allows up to 4 channels, albeit at low data rates (tens of bits per second).
One-way communication is preferred, although duplex communication is possible.
Transmitters can be simple piezo speakers powered by basic circuits, and receivers can be inexpensive and based on widely available microphones (e.g., electret). MEMS microphones utilizing pulse density modulation (PDM) are currently very effective and promising.
The system is subject to noise from ambient sound, physical obstacles, movement, and, rarely, electrical interference.
Very low power is required: transmitters operate with only a few milliwatts, and receivers can function with low power consumption—modern MEMS microphones can operate in the tens of microwatts to milliwatt range, depending on the implementation.
Signal encoding is typically implemented via frequency or phase modulation; amplitude modulation is impractical in air but viable over wire. Square waves may be used in place of sinusoids for simplicity.
Receivers can employ software-based filter banks and real-time demodulation methods, ranging from basic zero-crossing detection to more advanced techniques.
The system is safe for health at low power levels.
Applications in medicine are numerous, including but not limited to
-Passive sensing (e.g., stethoscope, cough detection);
-In-body ultrasound communication;
-Communication for setting or reading implant devices;
-Near-field alerting (e.g., sleep apnea detection, fall alerts);
-Local broadcasting of events (e.g., arrhythmia alerts, rhythm disorder notifications);
-Vital sign visualization (ECG, PPG, ACC) in time, frequency, and time-frequency domains;
-Haptic signaling via smart wearables (e.g., alerting nearby devices using a speaker in clothing);
-Zero-configuration monitoring of vital signs via web browser interfaces, accessible from any device;
-Fall and presence detection;
-etc.

While EM-based communication is generally more robust and broadly applicable, sound-based transmission offers unique advantages for low-power, short-range, and privacy-sensitive medical use cases, all under certain conditions.

6. Conclusions

This paper has introduced an alternative methodology for near-field communication in medical sensor devices, utilizing transmission within the near-ultrasound (audio/acoustic) range. Building upon the established concept of data over sound (DoS), which has gained renewed attention across various applications, we extend the paradigm to medical data over sound (MDoS). As a case study, the CardiaWhisper system demonstrates the real-time transmission of cardiological (ECG) data over sound.

The proposed approach encodes analog ECG signals, acquired through standard loop recorder or Holter configurations, and transmits them acoustically to nearby consumer devices—such as mobile phones, tablets, or desktop computers—equipped only with built-in microphones. These signals can then be visualized, decoded, analyzed, and logged using open-source, platform-independent software, with the option to forward data to local or global networks. The system’s architecture—including both transmitter and receiver components—was described, with particular emphasis on the receiver side, where several software-based demodulation algorithms were implemented using HTML, JavaScript, and CSS.

Preliminary results have been presented, compared, and discussed, showing that the CardiaWhisper system achieves reliable data transfer, robust real-time demodulation, and satisfactory signal quality under typical indoor conditions. The advantages and limitations of the medical-data-over-sound methodology have also been discussed, including its low power consumption, platform independence, and EMI resistance, alongside challenges such as limited range and sensitivity to acoustic noise.

The proposed approach opens new possibilities for low-power, privacy-conscious, and easily deployable medical sensing and communication, with potential applications ranging from home health monitoring to telemedicine and implant communication. Future work will focus on optimizing data rates, expanding use cases, and further improving noise immunity and system robustness.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Kong F. Zou Y. Li Z. Deng Y. Advances in Portable and Wearable Acoustic Sensing Devices for Human Health Monitoring Sensors 202424535410.3390/s 2416535439205054 PMC 11359461 · doi ↗ · pubmed ↗
2Perez A.J. Zeadally S. Recent advances in wearable sensing technologies Sensors 202121682810.3390/s 2120682834696040 PMC 8541055 · doi ↗ · pubmed ↗
3Lee N.K. Kim J.S. Status and Trends of the Digital Healthcare Industry Healthc. Inform. Res.20243017218310.4258/hir.2024.30.3.17239160777 PMC 11333813 · doi ↗ · pubmed ↗
4StojanovićR. Škraba A. Lutovac B. A headset like wearable device to track COVID-19 symptoms Proceedings of the 2020 9th Mediterranean Conference on Embedded Computing (MECO)Budva, Montenegro 8–11 June 2020 IEEE New York, NY, USA 202014
5Mukhopadhyay S.C. Suryadevara N.K. Nag A. Wearable sensors for healthcare: Fabrication to application Sensors 202222513710.3390/s 2214513735890817 PMC 9323732 · doi ↗ · pubmed ↗
6Ambrosanio M. Franceschini S. Grassini G. Baselice F. A multi-channel ultrasound system for non-contact heart rate monitoring IEEE Sens. J.2019202064207410.1109/JSEN.2019.2949435 · doi ↗
7Van der Togt R. Van Lieshout E.J. Hensbroek R. Beinat E. Binnekade J. Bakker P. Electromagnetic interference from radio frequency identification inducing potentially hazardous incidents in critical care medical equipment JAMA 20082992884289010.1001/jama.299.24.288418577733 · doi ↗ · pubmed ↗
8Rahimpour S. Kiyani M. Hodges S.E. Turner D.A. Deep brain stimulation and electromagnetic interference Clin. Neurol. Neurosurg.202120310657710.1016/j.clineuro.2021.10657733662743 PMC 8081063 · doi ↗ · pubmed ↗