Beyond Fluency: A Clinical Benchmark and Anomaly-Enhanced Baseline for Spine MRI Report Generation

Jun 2026·

Palau B.

Vogt F.

Laslo D.

Li H.

Konukoglu E.

Maria Monzon

Shared last authorship

Jutzeler C.R.

Shared last authorship

· 0 min read

Link

Abstract

Radiology reporting is time-consuming and subject to inter-rater variability, making automated report generation an attractive clinical application for Vision-Language Models (VLMs). We benchmark state-of-the-art VLMs on lumbar spine MRI with a focus on diagnostic accuracy and demonstrate that standard lexical and semantic metrics poorly reflect clinical correctness: fluent, well-structured reports can score highly while containing clinically meaningful diagnostic errors. To address this failure mode, we propose an architecture-agnostic framework that augments VLM inputs with spatially localized, disc-level anomaly heatmaps generated by a semi-supervised U-Net++ model. These heatmaps both improve anatomical sensitivity through explicit visual grounding and provide an independent interpretability output for clinical oversight, moving us closer to diagnostically reliable, visually grounded VLMs for lumbar spine MRI interpretation.

Type

Conference paper

Publication

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops — CV4Clinic

Last updated on Jun 2026

Medical-Imaging Vision-Language Spine Report Generation

Authors

Maria Monzon (she/her)

Computer Vision & Medical AI Researcher

PhD candidate at ETH Zurich developing robust and trustworthy deep learning for medical image analysis — spine and cardiac MRI, multimodal biomedical data, and uncertainty quantification. Previously a computer-vision researcher at BASF, where I deployed models to production in regulated, GLP-certified environments. I care about efficient code and reproducible research.

← Be Indiscrete: The Benefits of Learning Continuous Spine Degeneration Severity Scores Jun 2026

Segmentation Pre-Training for Efficient Spine Degeneration Grading Jun 2026 →

No results found

Beyond Fluency: A Clinical Benchmark and Anomaly-Enhanced Baseline for Spine MRI Report Generation