# Predicting the availability of power line communication nodes using semi-supervised learning algorithms

**Authors:** Kareem Moussa, Khaled Mostafa Elsayed, M. Saeed Darweesh, Abdelmoniem Elbaz, Ahmed Soltan

PMC · DOI: 10.1038/s41598-025-01064-5 · Scientific Reports · 2025-05-21

## TL;DR

This paper uses machine learning to predict whether nodes in a power line communication network are available for data transmission.

## Contribution

The study introduces a semi-supervised learning approach using label spreading to improve prediction accuracy for PLC node availability.

## Key findings

- Label Spreading achieved 94.67% accuracy in predicting node availability.
- The model required minimal training time (0.018 sec) and low memory (0.99 MB).
- Semi-supervised methods outperformed traditional supervised models like Random Forest and Logistic Regression.

## Abstract

Power Line Communication (PLC) facilitates the usage of power cables to transmit data. The issue is that sending data to unavailable nodes is time-consuming. Machine Learning has solved this by predicting a node having optimum readings. The more the machine learning models learn, the more accurate they become, as the model becomes always updated with the node’s continuous availability status, so self-training algorithms have been used. A dataset of 2000 instances of a node of a 500-node implemented PLC network has been collected. These instances consist of CINR(Carrier-to-Interference plus Noise Ratio), SNR(Signal-to-Noise Ratio), and RSSI(Received Signal Strength Indicator) as features for the label, which is a node is UP/Down. The data set has been split into 85% as a training set and 15% as a testing set. 15% of the training data are unlabeled. Self-training classifier has been used to allow Light Gradient Boosting Machine (LGBM) and Support Vector Machine (linear and non-linear kernel) to behave in a self-training manner as well as the training of label propagation and label spreading algorithms. Supervised Learning algorithms (Random Forest and logistic regression) have been trained on the dataset to compare the results. The best model is the Label Spreading, which resulted in accuracy equals 94.67%, f1-score equals 0.947, precision is 0.946, and recall equals 0.947 with training time equals 0.018 sec. and memory consumption equals 0.99 MB.

## Full-text entities

- **Genes:** HSPG2 (heparan sulfate proteoglycan 2) [NCBI Gene 3339] {aka HSPG, PLC, PRCAN, SJA, SJS, SJS1}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12095814/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12095814/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12095814/full.md

---
Source: https://tomesphere.com/paper/PMC12095814