CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction
Jinhong Wang, Jingwen Wang, Tingting Chen, Wenhao Zheng, Zhe Xu,, Xingdi Wu, Wen Xu, Haochao Ying, Danny Chen, and Jian Wu

TL;DR
This paper introduces CTT-Net, a novel multi-view transformer model that integrates clinical prior knowledge to improve postoperative visual acuity prediction from OCT images, outperforming existing methods.
Contribution
The paper proposes a cross-token attention mechanism and auxiliary classification loss within a transformer framework for enhanced multi-view OCT analysis and VA prediction.
Findings
CTT-Net achieves superior accuracy over existing methods.
The model effectively fuses multi-view OCT features.
Incorporating preoperative VA improves prediction performance.
Abstract
Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for medical experts. Deep learning methods for this problem were developed in recent years. Although effective, these methods still face several issues, such as not efficiently exploring potential relations between multi-view OCT images, neglecting the key role of clinical prior knowledge (e.g., preoperative VA value), and using only regression-based metrics which are lacking reference. In this paper, we propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Retinal Diseases and Treatments · Intraocular Surgery and Lenses
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Layer Normalization · Dropout · Byte Pair Encoding · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection
