Current and Future of AI Diagnosis in Ophthalmology (AI Diagnosis in Ophthalmology)

Key points at a glance

Ophthalmology is one of the medical fields where AI (artificial intelligence) has advanced the most, and standardized image data such as fundus photos and OCT are well suited to AI learning.
In 2018, the FDA approved the first fully autonomous AI diagnostic system (IDx-DR), and automated diabetic retinopathy screening became practical²⁾.
In an evaluation of an AI chatbot (ChatGPT-4) for ophthalmology knowledge, the overall accuracy was 70%, with differences by field: retina was 77% (highest) and neuro-ophthalmology was 58% (lowest)³⁾.
AI screening for diabetic retinopathy was judged cost-effective in 11 of 15 studies, and the Scottish NHS reported annual savings of about $400,000¹⁾.
AI accuracy still has challenges with image quality, bias in the training data, and handling rare diseases¹⁾.
AI diagnosis is a support tool, and the final diagnosis and treatment plan are determined by an ophthalmology specialist.
Research is advancing on AI that predicts systemic diseases such as cardiovascular risk and dementia from fundus photos⁶⁾.

1. What is ophthalmic AI diagnosis?

AI (artificial intelligence) is a general term for machine learning systems that imitate human intelligence. Deep learning (DL) is a subset of AI that uses multi-layer neural networks to extract advanced features and make complex judgments¹⁾.

Ophthalmology is one of the medical fields where AI has advanced the most. Fundus photos, OCT (optical coherence tomography), visual field tests, and other image data are standardized, making it easier to secure large amounts of training data. AI is mainly used for the following three purposes.

Improving screening efficiency (reducing the reading burden on ophthalmology specialists)
Improving access in areas with a shortage of specialists
Standardizing diagnostic accuracy (reducing differences between facilities)

In 2018, the FDA approved the first fully autonomous AI diagnostic system (IDx-DR), accelerating the practical use of ophthalmic AI diagnosis²⁾. IDx-DR can be operated by non-ophthalmic staff in internal medicine and primary care settings, and it automatically determines whether referral to an ophthalmology specialist is needed²⁾.

Deep learning systems have shown accuracy comparable to specialists in detecting diabetic retinopathy, glaucoma, and AMD, and the potential of AI diagnosis using fundus photographs has been demonstrated⁸⁾.

Q How is AI used in ophthalmology?

AI automatically analyzes images from fundus photographs and OCT to detect diseases such as diabetic retinopathy, glaucoma, and age-related macular degeneration. Screening AI (fully autonomous) can be operated by non-ophthalmologists and is used for primary screening in regions with a shortage of specialists. Research is also being done on the use of AI chatbots (such as GPT-4) to evaluate ophthalmology knowledge and educate patients³⁾. It is positioned as a support tool, with the final diagnosis made by an ophthalmology specialist.

2. Types of AI and target diseases

Ophthalmic AI is broadly divided into the following three types according to function and level of autonomy.

Screening AI (fully autonomous)

It automatically analyzes fundus photographs and determines whether referral is unnecessary or needed. It can operate even where ophthalmology specialists are not available, and is applied to the following diseases²⁾.

Diabetic retinopathy (DR): the most researched and most widely implemented
Age-related macular degeneration (AMD): detection of drusen and neovascularization
Glaucoma: automatic analysis of the optic disc and nerve fiber layer
Retinopathy of prematurity (ROP): newborn screening in the neonatal ICU
Retinoblastoma: fundus monitoring in children

Diagnostic support AI (semi-autonomous)

A system that assists physicians in image interpretation. It is used for AMD subtype classification through automatic segmentation of OCT layer structures, and for severity assessment of diabetic macular edema (DME).

AI chatbot (multimodal)

An application of a large language model that analyzes text (history-taking information) and images (fundus photographs and OCT) at the same time. ChatGPT-4’s ophthalmic knowledge and image interpretation ability have been evaluated, and its use for patient education and remote history-taking is being considered³⁾.

AI type	Representative system	Target	Accuracy metric
Screening AI (autonomous)	IDx-DR²⁾	Diabetic retinopathy	Sensitivity 87.2%, specificity 90.7%
Screening AI (autonomous)	i-ROP DL⁵⁾	ROP	Sensitivity 91%, specificity 91%
Screening AI (autonomous)	EyeArt⁴⁾	Diabetic retinopathy	Evaluated and used in the UK NHS
AI chatbot	ChatGPT-4³⁾	Ophthalmology knowledge assessment	Overall accuracy 70%

3. Main AI systems and diagnostic accuracy

IDx-DR (Digital Diagnostics)

²⁾ is the first fully autonomous AI diagnostic system approved by the FDA in 2018. Non-ophthalmic staff take images with a non-mydriatic fundus camera, and the AI automatically analyzes them and decides whether to refer. It is being introduced in primary care settings.

Key performance indicators (Abràmoff et al. 2018 pivotal trial)²⁾:

Sensitivity: 87.2% (detection of moderate or worse diabetic retinopathy)
Specificity: 90.7%
Positive predictive value: 49.7%, negative predictive value: 98.5%

IDx-DR has made autonomous DR screening possible in internal medicine and primary care settings, allowing efficient selection of cases that need referral to an ophthalmology specialist²⁾.

Ophthalmic image interpretation by AI chatbot (ChatGPT-4)

The accuracy of GPT-4 on multiple-choice ophthalmology questions has been evaluated³⁾, with an overall accuracy of 70%.

Overall accuracy: 70% (299/428 questions)
Ranking of accuracy by field:

Field	Accuracy
Retina	77% (highest)³⁾
Eye tumors	72%³⁾
Pediatric ophthalmology	68%³⁾
Uveitis	67%³⁾
Glaucoma	61%³⁾
Neuro-ophthalmology	58% (lowest)³⁾

Image-based questions: 65%, non-image-based questions: 82% (difference 17%, P < .001)³⁾

This difference shows that the chatbot’s image interpretation ability still lags behind its non-image text comprehension. It has been pointed out that proper integration of multimodal chatbots in clinical settings is essential³⁾.

IDx-DR (FDA-approved in 2018)

Target disease: diabetic retinopathy

Accuracy: sensitivity 87.2%, specificity 90.7%

Features: fully autonomous. Can be operated by non-ophthalmologists. Used in internal medicine and primary care²⁾

EyeArt (Eyenuk)

Target disease: diabetic retinopathy

Accuracy: evaluated and put into practical use in the UK NHS

Features: integrated into screening programs⁴⁾

i-ROP DL (2018)

Target disease: retinopathy of prematurity (ROP)

Accuracy: sensitivity 91%, specificity 91%

Feature: automatic detection of plus disease in the NICU⁵⁾

ChatGPT-4 (OpenAI)

Scope: ophthalmology knowledge and image interpretation assessment

Accuracy: overall accuracy 70% (retina 77%, neuro-ophthalmology 58%)

Feature: research stage for applications in patient education and remote consultations³⁾

Q How accurate is AI in eye disease diagnosis?

Diabetic retinopathy screening AI (IDx-DR) achieved 87.2% sensitivity and 90.7% specificity, with accuracy comparable to ophthalmologist interpretation²⁾. AI for retinopathy of prematurity (ROP) (i-ROP DL) also achieved 91% sensitivity and 91% specificity⁵⁾. By contrast, in the ophthalmology knowledge evaluation of the AI chatbot (ChatGPT-4), the overall correct answer rate was 70%, and in neuro-ophthalmology it was lower at 58%³⁾. In all cases, AI is only an assistive tool, and if any abnormality is detected, a detailed examination by an ophthalmology specialist is needed.

4. Cost-effectiveness and health economics

Evidence on the cost-effectiveness of AI-based ophthalmic screening has accumulated across multiple studies¹⁾.

Diabetic retinopathy (DR) screening

In Wu’s systematic review (2021), 11 of 15 studies evaluating the economics of AI-based DR screening found it to be cost-effective¹⁾.

NHS Scotland: annual savings of $403,200
United States (IDx-DR/EyeArt): 23.3% cost reduction per patient
Rural China: AI screening was $34.86 cheaper than human graders and improved QALYs by 0.04

Region / setting	Cost-effectiveness assessment	Source
NHS Scotland	Annual savings of $403,200	Wu 2021¹⁾
U.S. primary care	23.3% cost reduction (per patient)	Wu 2021¹⁾
Rural areas in China	$34.86 cheaper than human graders, +0.04 QALY	Wu 2021¹⁾
Japan (AMD, Tamura et al. 2022)	ICER $99,283/QALY (above the threshold)	Wu 2021¹⁾

Retinopathy of prematurity (ROP) screening

Autonomous AI screening has been reported to be the most cost-effective compared with telemedicine, ophthalmoscopy, and assisted AI¹⁾. At a willingness-to-pay threshold of $7, it was found to be cost-effective compared with assisted screening¹⁾.

In a Japanese cohort simulation (500,000 people aged 40 and over, prevalence 3.85%), the ICER for AI screening every 3 years was $99,283/QALY ($92,890-$99,283)¹⁾. This exceeds Japan’s willingness-to-pay threshold (about $47,286/QALY), so the cost-effectiveness of AMD screening remains uncertain for now¹⁾. However, future improvements may be possible with advances in AI technology and lower costs.

5. Challenges and limitations

Technical challenges

Training data bias: In datasets biased toward specific races or age groups, accuracy declines in other populations¹⁾
Image-quality dependence: The quality of fundus photographs (whether the pupil is dilated, media opacity, and shooting conditions) directly affects AI accuracy
Difficulty handling rare diseases: Diseases with little training data do not achieve sufficient accuracy
Black box problem: The basis for AI decisions is not transparent, making it hard for clinicians to fulfill accountability¹⁾
Low accuracy in neuro-ophthalmology: ChatGPT-4 had a correct-answer rate of 58% in neuro-ophthalmology, the lowest, showing limits in interpreting complex optic nerve disorders³⁾

Ethical and Regulatory Challenges

The following are the ethical and legal issues raised by ophthalmic AI¹⁾.

Patient privacy and data security: Establishing regulations for cloud management and international sharing of fundus images
Responsibility in case of misdiagnosis: In the event of an AI misdiagnosis, whether the doctor or the AI manufacturer is responsible
Regulatory and approval processes: Appropriate evaluation systems for AI medical devices under the FDA (US), the Pharmaceuticals and Medical Devices Act (Japan), and others
Ensuring explainability: The importance of presenting the basis for AI decisions in a way that clinicians and patients can understand

Health economic issues

The initial implementation cost (hardware, software, and staff training) may be high¹⁾
There is a large difference in cost-effectiveness between low-income and high-income countries¹⁾
Systems for insurance reimbursement are being developed in each country, and implementation in Japan is still in progress

Q Is AI eye diagnosis safe?

Systems approved by regulatory authorities such as the FDA (such as IDx-DR) have undergone rigorous clinical trials and have confirmed a certain level of safety²⁾. However, AI diagnosis is an assistive tool, and the final diagnosis and treatment plan should be determined by an ophthalmologist. Self-diagnosis using only an AI chatbot (such as ChatGPT) is not recommended. AI accuracy may decrease with poor image quality, rare diseases, and neuro-ophthalmology cases³⁾, so if an abnormality is suspected, it is important to see an eye doctor promptly.

6. Technical foundation: how deep learning works

An image with a Grad-CAM heatmap overlaid on a fundus photograph. The areas the AI focuses on are shown by a color scale for three categories: normal eye, suspected glaucoma, and suspected diabetic retinopathy

Arias-Serrano I, et al. Artificial intelligence based glaucoma and diabetic retinopathy detection using MATLAB — retrained AlexNet convolutional neural network. F1000Research. 2024;12:14. Figure 8. PMCID: PMC11143403. License: CC BY.

Comparison figure showing Grad-CAM heatmaps from AlexNet, ResNet50, and GoogLeNet overlaid on fundus photographs (left column) of a normal eye (Non_D), suspected glaucoma (Sus_G), and suspected diabetic retinopathy (Sus_R). Red to yellow indicates higher attention, and blue indicates lower attention. In glaucoma cases, strong activation is seen around the optic disc, while in diabetic retinopathy cases, strong activation is seen in the macula to posterior pole region. This corresponds to the Grad-CAM and convolutional neural network visualization techniques discussed in the section “Technical foundation: how deep learning works.”

Convolutional Neural Network (CNN)

A convolutional neural network (CNN: Convolutional Neural Network) is the core technology of AI diagnosis in ophthalmology.

Automatically extracts features hierarchically from input fundus and OCT images
Shallow layers recognize low-level features such as outlines and color, while deeper layers recognize abstract features such as vessel patterns, hemorrhage, edema, and optic disc shape
Repeatedly learn on large amounts of training data (reference images labeled by specialists)

AI learning process

Data collection: Large-scale collection of fundus photographs, OCT, and visual field test data
Annotation: Ophthalmologists assign ground-truth labels (stage and findings) to each image
Training and optimization: Repeatedly adjust network parameters so they move closer to the correct answer
Validation and clinical trials: Performance evaluation in external cohorts and pilot testing in real-world clinical practice

Transfer learning (applying pre-trained models from other domains such as ImageNet to ophthalmic images) is widely used as a method to achieve high accuracy even when training data are limited.

Research is also advancing on using GANs (generative adversarial networks) to generate synthetic images and artificially expand training data for rare diseases.

Multimodal AI

Multimodal AI that processes text (history-taking information) and images (fundus photographs and OCT) at the same time is being applied to ophthalmology as large language models (such as GPT-4) continue to advance³⁾. While it can integrate more diverse information than a single-modality CNN, it has been shown that its ability to interpret images is still weaker than its understanding of text³⁾.

7. Latest research and future prospects

Prediction of systemic disease from fundus photographs

Deep-learning analysis of fundus photographs has shown that it may be possible to predict systemic risk factors such as age, sex, systolic blood pressure, smoking history, and HbA1c from fundus photographs alone⁶⁾. Some accuracy has also been reported in predicting future risk of cardiovascular events (myocardial infarction and stroke), drawing attention to the possibility that fundus photographs may serve as a window into overall health. AI models for predicting dementia, kidney disease, and anemia are also still in the research stage⁶⁾.

Integration with smartphone fundus cameras

Using fundus photography with a small clip-on lens attached to a smartphone, together with AI analysis, has been shown to make DR screening practical in patients with diabetes in India⁷⁾. Both sensitivity and specificity have been comparable to those of specialized fundus cameras, and AI screening combined with low-cost general-purpose devices could help spread use in developing countries and rural areas.

Integration of AI and telemedicine

By combining AI screening with telemedicine, improvement in ophthalmic access in remote and developing regions is expected. Even in facilities without an eye specialist, AI can perform initial screening and send only positive cases for remote review by a specialist, allowing more efficient use of medical resources.

Applications in personalized medicine

Research is progressing on AI that can predict in advance treatment response to anti-VEGF therapy (ranibizumab, aflibercept, faricimab, etc.) and suggest the best dosing plan for each patient. Models that predict treatment effect from OCT images may help reduce the number of injections and improve visual prognosis.

Applications of generative AI to patient education and interview support

Large language models (such as GPT-4) are being studied for uses such as explaining diseases to patients, preparing informed consent documents, and assisting with interviews³⁾. However, challenges remain in preventing errors and bias in medical information and in maintaining the doctor-patient relationship. It is not recommended for patients to rely only on chatbots to make decisions about self-diagnosis or self-treatment³⁾.

8. References

Wu JH, Liu TYA, Hsu WT, et al. Performance and limitation of machine learning algorithms for diabetic retinopathy screening: meta-analysis. J Med Internet Res. 2021;23(11):e23863.
Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine. 2018;1:39. doi:10.1038/s41746-018-0040-6. PMID:31304320; PMCID:PMC6550188.
Mihalache A, Popovic MM, Guo MZ, et al. Performance of an upgraded artificial intelligence chatbot for ophthalmic knowledge assessment. JAMA Ophthalmol. 2024;142(3):234-241.
Olvera-Barrios A, Heeren TF, Balaskas K, et al. Diagnostic accuracy of diabetic retinopathy grading by an artificial intelligence-enabled algorithm compared with a human standard reference. Diabetologia. 2023;66(5):857-866.
Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803-810.
Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature biomedical engineering. 2018;2(3):158-164. doi:10.1038/s41551-018-0195-0. PMID:31015713.
Rajalakshmi R, Subashini R, Anjana RM, et al. Automated diabetic retinopathy detection in smartphone-based fundus photography using artificial intelligence. Eye. 2018;32(6):1138-1144.
Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA. 2017;318(22):2211-2223. doi:10.1001/jama.2017.18152. PMID:29234807; PMCID:PMC5820739.

Related keywords