Skip to content
Other Eye Conditions

Current and Future of AI Diagnosis in Ophthalmology (AI Diagnosis in Ophthalmology)

AI (artificial intelligence) is a general term for machine learning systems that imitate human intelligence. Deep learning (DL) is a subset of AI that uses multi-layer neural networks to extract advanced features and make complex judgments1).

Ophthalmology is one of the medical fields where AI has advanced the most. Fundus photos, OCT (optical coherence tomography), visual field tests, and other image data are standardized, making it easier to secure large amounts of training data. AI is mainly used for the following three purposes.

  • Improving screening efficiency (reducing the reading burden on ophthalmology specialists)
  • Improving access in areas with a shortage of specialists
  • Standardizing diagnostic accuracy (reducing differences between facilities)

In 2018, the FDA approved the first fully autonomous AI diagnostic system (IDx-DR), accelerating the practical use of ophthalmic AI diagnosis2). IDx-DR can be operated by non-ophthalmic staff in internal medicine and primary care settings, and it automatically determines whether referral to an ophthalmology specialist is needed2).

Deep learning systems have shown accuracy comparable to specialists in detecting diabetic retinopathy, glaucoma, and AMD, and the potential of AI diagnosis using fundus photographs has been demonstrated8).

Q How is AI used in ophthalmology?
A

AI automatically analyzes images from fundus photographs and OCT to detect diseases such as diabetic retinopathy, glaucoma, and age-related macular degeneration. Screening AI (fully autonomous) can be operated by non-ophthalmologists and is used for primary screening in regions with a shortage of specialists. Research is also being done on the use of AI chatbots (such as GPT-4) to evaluate ophthalmology knowledge and educate patients3). It is positioned as a support tool, with the final diagnosis made by an ophthalmology specialist.

Ophthalmic AI is broadly divided into the following three types according to function and level of autonomy.

Screening AI (fully autonomous)

It automatically analyzes fundus photographs and determines whether referral is unnecessary or needed. It can operate even where ophthalmology specialists are not available, and is applied to the following diseases2).

Diagnostic support AI (semi-autonomous)

A system that assists physicians in image interpretation. It is used for AMD subtype classification through automatic segmentation of OCT layer structures, and for severity assessment of diabetic macular edema (DME).

AI chatbot (multimodal)

An application of a large language model that analyzes text (history-taking information) and images (fundus photographs and OCT) at the same time. ChatGPT-4’s ophthalmic knowledge and image interpretation ability have been evaluated, and its use for patient education and remote history-taking is being considered3).

AI typeRepresentative systemTargetAccuracy metric
Screening AI (autonomous)IDx-DR2)Diabetic retinopathySensitivity 87.2%, specificity 90.7%
Screening AI (autonomous)i-ROP DL5)ROPSensitivity 91%, specificity 91%
Screening AI (autonomous)EyeArt4)Diabetic retinopathyEvaluated and used in the UK NHS
AI chatbotChatGPT-43)Ophthalmology knowledge assessmentOverall accuracy 70%

3. Main AI systems and diagnostic accuracy

Section titled “3. Main AI systems and diagnostic accuracy”

2) is the first fully autonomous AI diagnostic system approved by the FDA in 2018. Non-ophthalmic staff take images with a non-mydriatic fundus camera, and the AI automatically analyzes them and decides whether to refer. It is being introduced in primary care settings.

Key performance indicators (Abràmoff et al. 2018 pivotal trial)2):

  • Sensitivity: 87.2% (detection of moderate or worse diabetic retinopathy)
  • Specificity: 90.7%
  • Positive predictive value: 49.7%, negative predictive value: 98.5%

IDx-DR has made autonomous DR screening possible in internal medicine and primary care settings, allowing efficient selection of cases that need referral to an ophthalmology specialist2).

Ophthalmic image interpretation by AI chatbot (ChatGPT-4)

Section titled “Ophthalmic image interpretation by AI chatbot (ChatGPT-4)”

The accuracy of GPT-4 on multiple-choice ophthalmology questions has been evaluated3), with an overall accuracy of 70%.

  • Overall accuracy: 70% (299/428 questions)
  • Ranking of accuracy by field:
FieldAccuracy
Retina77% (highest)3)
Eye tumors72%3)
Pediatric ophthalmology68%3)
Uveitis67%3)
Glaucoma61%3)
Neuro-ophthalmology58% (lowest)3)
  • Image-based questions: 65%, non-image-based questions: 82% (difference 17%, P < .001)3)

This difference shows that the chatbot’s image interpretation ability still lags behind its non-image text comprehension. It has been pointed out that proper integration of multimodal chatbots in clinical settings is essential3).

IDx-DR (FDA-approved in 2018)

Target disease: diabetic retinopathy

Accuracy: sensitivity 87.2%, specificity 90.7%

Features: fully autonomous. Can be operated by non-ophthalmologists. Used in internal medicine and primary care2)

EyeArt (Eyenuk)

Target disease: diabetic retinopathy

Accuracy: evaluated and put into practical use in the UK NHS

Features: integrated into screening programs4)

i-ROP DL (2018)

Target disease: retinopathy of prematurity (ROP)

Accuracy: sensitivity 91%, specificity 91%

Feature: automatic detection of plus disease in the NICU5)

ChatGPT-4 (OpenAI)

Scope: ophthalmology knowledge and image interpretation assessment

Accuracy: overall accuracy 70% (retina 77%, neuro-ophthalmology 58%)

Feature: research stage for applications in patient education and remote consultations3)

Q How accurate is AI in eye disease diagnosis?
A

Diabetic retinopathy screening AI (IDx-DR) achieved 87.2% sensitivity and 90.7% specificity, with accuracy comparable to ophthalmologist interpretation2). AI for retinopathy of prematurity (ROP) (i-ROP DL) also achieved 91% sensitivity and 91% specificity5). By contrast, in the ophthalmology knowledge evaluation of the AI chatbot (ChatGPT-4), the overall correct answer rate was 70%, and in neuro-ophthalmology it was lower at 58%3). In all cases, AI is only an assistive tool, and if any abnormality is detected, a detailed examination by an ophthalmology specialist is needed.

4. Cost-effectiveness and health economics

Section titled “4. Cost-effectiveness and health economics”

Evidence on the cost-effectiveness of AI-based ophthalmic screening has accumulated across multiple studies1).

In Wu’s systematic review (2021), 11 of 15 studies evaluating the economics of AI-based DR screening found it to be cost-effective1).

  • NHS Scotland: annual savings of $403,200
  • United States (IDx-DR/EyeArt): 23.3% cost reduction per patient
  • Rural China: AI screening was $34.86 cheaper than human graders and improved QALYs by 0.04
Region / settingCost-effectiveness assessmentSource
NHS ScotlandAnnual savings of $403,200Wu 20211)
U.S. primary care23.3% cost reduction (per patient)Wu 20211)
Rural areas in China$34.86 cheaper than human graders, +0.04 QALYWu 20211)
Japan (AMD, Tamura et al. 2022)ICER $99,283/QALY (above the threshold)Wu 20211)

Retinopathy of prematurity (ROP) screening

Section titled “Retinopathy of prematurity (ROP) screening”

Autonomous AI screening has been reported to be the most cost-effective compared with telemedicine, ophthalmoscopy, and assisted AI1). At a willingness-to-pay threshold of $7, it was found to be cost-effective compared with assisted screening1).

Section titled “AMD (age-related macular degeneration) screening”

In a Japanese cohort simulation (500,000 people aged 40 and over, prevalence 3.85%), the ICER for AI screening every 3 years was $99,283/QALY ($92,890-$99,283)1). This exceeds Japan’s willingness-to-pay threshold (about $47,286/QALY), so the cost-effectiveness of AMD screening remains uncertain for now1). However, future improvements may be possible with advances in AI technology and lower costs.

  • Training data bias: In datasets biased toward specific races or age groups, accuracy declines in other populations1)
  • Image-quality dependence: The quality of fundus photographs (whether the pupil is dilated, media opacity, and shooting conditions) directly affects AI accuracy
  • Difficulty handling rare diseases: Diseases with little training data do not achieve sufficient accuracy
  • Black box problem: The basis for AI decisions is not transparent, making it hard for clinicians to fulfill accountability1)
  • Low accuracy in neuro-ophthalmology: ChatGPT-4 had a correct-answer rate of 58% in neuro-ophthalmology, the lowest, showing limits in interpreting complex optic nerve disorders3)

The following are the ethical and legal issues raised by ophthalmic AI1).

  • Patient privacy and data security: Establishing regulations for cloud management and international sharing of fundus images
  • Responsibility in case of misdiagnosis: In the event of an AI misdiagnosis, whether the doctor or the AI manufacturer is responsible
  • Regulatory and approval processes: Appropriate evaluation systems for AI medical devices under the FDA (US), the Pharmaceuticals and Medical Devices Act (Japan), and others
  • Ensuring explainability: The importance of presenting the basis for AI decisions in a way that clinicians and patients can understand
  • The initial implementation cost (hardware, software, and staff training) may be high1)
  • There is a large difference in cost-effectiveness between low-income and high-income countries1)
  • Systems for insurance reimbursement are being developed in each country, and implementation in Japan is still in progress
Q Is AI eye diagnosis safe?
A

Systems approved by regulatory authorities such as the FDA (such as IDx-DR) have undergone rigorous clinical trials and have confirmed a certain level of safety2). However, AI diagnosis is an assistive tool, and the final diagnosis and treatment plan should be determined by an ophthalmologist. Self-diagnosis using only an AI chatbot (such as ChatGPT) is not recommended. AI accuracy may decrease with poor image quality, rare diseases, and neuro-ophthalmology cases3), so if an abnormality is suspected, it is important to see an eye doctor promptly.

6. Technical foundation: how deep learning works

Section titled “6. Technical foundation: how deep learning works”
An image with a Grad-CAM heatmap overlaid on a fundus photograph. The areas the AI focuses on are shown by a color scale for three categories: normal eye, suspected glaucoma, and suspected diabetic retinopathy
An image with a Grad-CAM heatmap overlaid on a fundus photograph. The areas the AI focuses on are shown by a color scale for three categories: normal eye, suspected glaucoma, and suspected diabetic retinopathy
Arias-Serrano I, et al. Artificial intelligence based glaucoma and diabetic retinopathy detection using MATLAB — retrained AlexNet convolutional neural network. F1000Research. 2024;12:14. Figure 8. PMCID: PMC11143403. License: CC BY.
Comparison figure showing Grad-CAM heatmaps from AlexNet, ResNet50, and GoogLeNet overlaid on fundus photographs (left column) of a normal eye (Non_D), suspected glaucoma (Sus_G), and suspected diabetic retinopathy (Sus_R). Red to yellow indicates higher attention, and blue indicates lower attention. In glaucoma cases, strong activation is seen around the optic disc, while in diabetic retinopathy cases, strong activation is seen in the macula to posterior pole region. This corresponds to the Grad-CAM and convolutional neural network visualization techniques discussed in the section “Technical foundation: how deep learning works.”

A convolutional neural network (CNN: Convolutional Neural Network) is the core technology of AI diagnosis in ophthalmology.

  • Automatically extracts features hierarchically from input fundus and OCT images
  • Shallow layers recognize low-level features such as outlines and color, while deeper layers recognize abstract features such as vessel patterns, hemorrhage, edema, and optic disc shape
  • Repeatedly learn on large amounts of training data (reference images labeled by specialists)
  1. Data collection: Large-scale collection of fundus photographs, OCT, and visual field test data
  2. Annotation: Ophthalmologists assign ground-truth labels (stage and findings) to each image
  3. Training and optimization: Repeatedly adjust network parameters so they move closer to the correct answer
  4. Validation and clinical trials: Performance evaluation in external cohorts and pilot testing in real-world clinical practice

Transfer learning (applying pre-trained models from other domains such as ImageNet to ophthalmic images) is widely used as a method to achieve high accuracy even when training data are limited.

Research is also advancing on using GANs (generative adversarial networks) to generate synthetic images and artificially expand training data for rare diseases.

Multimodal AI that processes text (history-taking information) and images (fundus photographs and OCT) at the same time is being applied to ophthalmology as large language models (such as GPT-4) continue to advance3). While it can integrate more diverse information than a single-modality CNN, it has been shown that its ability to interpret images is still weaker than its understanding of text3).

Prediction of systemic disease from fundus photographs

Section titled “Prediction of systemic disease from fundus photographs”

Deep-learning analysis of fundus photographs has shown that it may be possible to predict systemic risk factors such as age, sex, systolic blood pressure, smoking history, and HbA1c from fundus photographs alone6). Some accuracy has also been reported in predicting future risk of cardiovascular events (myocardial infarction and stroke), drawing attention to the possibility that fundus photographs may serve as a window into overall health. AI models for predicting dementia, kidney disease, and anemia are also still in the research stage6).

Integration with smartphone fundus cameras

Section titled “Integration with smartphone fundus cameras”

Using fundus photography with a small clip-on lens attached to a smartphone, together with AI analysis, has been shown to make DR screening practical in patients with diabetes in India7). Both sensitivity and specificity have been comparable to those of specialized fundus cameras, and AI screening combined with low-cost general-purpose devices could help spread use in developing countries and rural areas.

By combining AI screening with telemedicine, improvement in ophthalmic access in remote and developing regions is expected. Even in facilities without an eye specialist, AI can perform initial screening and send only positive cases for remote review by a specialist, allowing more efficient use of medical resources.

Research is progressing on AI that can predict in advance treatment response to anti-VEGF therapy (ranibizumab, aflibercept, faricimab, etc.) and suggest the best dosing plan for each patient. Models that predict treatment effect from OCT images may help reduce the number of injections and improve visual prognosis.

Applications of generative AI to patient education and interview support

Section titled “Applications of generative AI to patient education and interview support”

Large language models (such as GPT-4) are being studied for uses such as explaining diseases to patients, preparing informed consent documents, and assisting with interviews3). However, challenges remain in preventing errors and bias in medical information and in maintaining the doctor-patient relationship. It is not recommended for patients to rely only on chatbots to make decisions about self-diagnosis or self-treatment3).

  1. Wu JH, Liu TYA, Hsu WT, et al. Performance and limitation of machine learning algorithms for diabetic retinopathy screening: meta-analysis. J Med Internet Res. 2021;23(11):e23863.

  2. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine. 2018;1:39. doi:10.1038/s41746-018-0040-6. PMID:31304320; PMCID:PMC6550188.

  3. Mihalache A, Popovic MM, Guo MZ, et al. Performance of an upgraded artificial intelligence chatbot for ophthalmic knowledge assessment. JAMA Ophthalmol. 2024;142(3):234-241.

  4. Olvera-Barrios A, Heeren TF, Balaskas K, et al. Diagnostic accuracy of diabetic retinopathy grading by an artificial intelligence-enabled algorithm compared with a human standard reference. Diabetologia. 2023;66(5):857-866.

  5. Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803-810.

  6. Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature biomedical engineering. 2018;2(3):158-164. doi:10.1038/s41551-018-0195-0. PMID:31015713.

  7. Rajalakshmi R, Subashini R, Anjana RM, et al. Automated diabetic retinopathy detection in smartphone-based fundus photography using artificial intelligence. Eye. 2018;32(6):1138-1144.

  8. Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA. 2017;318(22):2211-2223. doi:10.1001/jama.2017.18152. PMID:29234807; PMCID:PMC5820739.

Copy the article text and paste it into your preferred AI assistant.