EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography
Rand Muhtaseb, Mohammad Yaqub

TL;DR
EchoCoTr is a novel hybrid model combining CNNs and vision transformers to accurately estimate left ventricular ejection fraction from echocardiogram videos, outperforming existing methods.
Contribution
The paper introduces EchoCoTr, a new approach that leverages both CNNs and vision transformers to improve LVEF estimation from medical videos.
Findings
Outperforms state-of-the-art with MAE of 3.95 and R^2 of 0.82
Demonstrates significant improvement over existing methods
Includes extensive ablation studies and comparisons
Abstract
Learning spatiotemporal features is an important task for efficient video understanding especially in medical images such as echocardiograms. Convolutional neural networks (CNNs) and more recent vision transformers (ViTs) are the most commonly used methods with limitations per each. CNNs are good at capturing local context but fail to learn global information across video frames. On the other hand, vision transformers can incorporate global details and long sequences but are computationally expensive and typically require more data to train. In this paper, we propose a method that addresses the limitations we typically face when training on medical video data such as echocardiographic scans. The algorithm we propose (EchoCoTr) utilizes the strength of vision transformers and CNNs to tackle the problem of estimating the left ventricular ejection fraction (LVEF) on ultrasound videos. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCardiovascular Function and Risk Factors · Cardiac Imaging and Diagnostics · Hemodynamic Monitoring and Therapy
MethodsMulti-Head Attention · Attention Is All You Need · Masked autoencoder · Linear Layer · WordPiece · Adam · Softmax · Dropout · Dense Connections · Residual Connection
