Sign up for our daily and weekly newsletters for the latest updates and exclusive content on leading AI advancements.
Four advanced large language models (LLMs) were shown an image that initially appeared to be a mauve-colored rock. However, it was actually a potential eye tumor, and the models had to determine its location, origin, and extent.
LLaVA-Med incorrectly identified the tumor as being in the inner lining of the cheek, while LLaVA suggested it was in the breast, which was even more inaccurate. GPT-4V provided a lengthy, vague response and failed to identify its location. However, PathChat, a newly developed, pathology-focused LLM, correctly identified the tumor in the eye, explaining that it could be significant and lead to vision loss.
PathChat was developed in the Mahmood Lab at Brigham and Women’s Hospital and signifies a breakthrough in computational pathology. It acts as a consultant for human pathologists, helping to identify, assess, and diagnose tumors and other serious conditions.
PathChat performs significantly better than leading models on multiple-choice diagnostic questions and generates clinically relevant responses to open-ended questions. It is now available through an exclusive license with the Boston-based biomedical AI company, Modella AI.
Richard Chen, Modella’s founding CTO, explained in a demo video that PathChat 2 is a multimodal large language model that understands pathology images and clinically relevant text, enabling it to converse with pathologists.
In developing PathChat, researchers adapted a vision encoder for pathology, combined it with a pre-trained LLM, and fine-tuned it with visual language instructions and question-answer sessions. The questions spanned 54 diagnoses across 11 major pathology practices and organ sites. Each question included two evaluation strategies: an image with 10 multiple-choice questions, and an image with additional clinical context such as patient sex, age, clinical history, and radiology findings.
PathChat achieved 78% accuracy with image-only prompts and 89.5% accuracy with clinical context. The model could summarize, classify, and caption; describe notable morphological details; and answer questions that typically require background knowledge in pathology and general medicine.
In comparison to ChatGPT-4V, the open-source LLaVA model, and the biomedical domain-specific LLaVA-Med, PathChat outperformed all three in both evaluation settings. PathChat scored more than 52% better than LLaVA and over 63% better than LLaVA-Med with image-only prompts. With clinical context, it performed 39% better than LLaVA and nearly 61% better than LLaVA-Med. It also performed over 53% better than GPT-4 with image-only prompts and 27% better with clinical context.
Faisal Mahmood, an associate professor of pathology at Harvard Medical School, stated that previous AI models for pathology were typically developed for specific diseases or tasks and couldn’t adapt beyond their training. PathChat, however, moves towards general pathology intelligence, acting as an interactive assistant for pathologists across various areas of pathology, tasks, and scenarios.
In one example, PathChat was presented with an image-only, multiple-choice prompt of a 63-year-old male with chronic cough and weight loss over five months. The model correctly identified the condition as lung adenocarcinoma. In another example, with additional clinical context, it correctly identified a liver tumor as metastasis, indicating the spread of melanoma.
Mahmood highlighted that the most surprising result was the model’s ability to adapt to tasks like differential diagnosis and tumor grading, even without labeled training data for these instances. This marked a significant shift from previous research, which required numerous labeled examples for specific tasks to achieve reasonable performance.
PathChat’s practical applications include supporting human-in-the-loop diagnosis where an AI-assisted assessment is followed by context. For instance, it could analyze histopathology images, provide structural information, and identify malignancy features. The pathologist could then provide more information and receive a differential diagnosis, followed by further testing and a final diagnosis.
PathChat could be particularly valuable in complex or resource-limited settings. In research, the AI copilot could help summarize and interpret large datasets of images, enhancing automated quantification and morphological marker interpretation.
While PathChat showcases significant advancements, issues like hallucinations could be addressed with reinforcement learning from human feedback. Continuous model training with updated knowledge is also recommended to keep up with evolving medical terminology and guidelines.
The potential applications of an interactive, multimodal AI copilot for pathology are vast, and LLMs and generative AI are opening new frontiers in computational pathology with an emphasis on natural language and human interaction.
PathChat’s capabilities could extend to other medical imaging specialties and data types such as genomics and proteomics. Researchers plan to gather extensive human feedback to align the model’s behavior with human intent and integrate it with clinical databases for improved patient information retrieval. They also aim to work with expert pathologists to evaluate PathChat comprehensively across diverse disease models and workflows.