ExperimentalComputational pathology research lab / 2025-2026
Problem
Prostate biopsy grading uses gigapixel slides where small tissue regions matter, artifacts can distort attention, and ISUP grades require careful evaluation and explainability.
Solution
Store patch coordinates, extract pathology-model embeddings, suppress artifact-like tokens with spatial-variance and Median/MAD gating, train ABMIL/TransMIL classifiers, and project attention into heatmaps.
Key Features
OpenSlide preprocessing for SVS/TIFF/NDPI slides with HSV-Otsu tissue segmentation and 256x256 patch tessellation
Coordinate-only patch registry that avoids saving massive patch folders while preserving MPP, coordinates, and spatial variance
GPU feature extraction with Macenko color normalization and frozen HistoEncoder or Virchow2 pathology backbones
CustomGatingLayer v2 that combines low spatial variance with robust Median/MAD token-norm outlier detection
ABMIL and TransMIL slide-level classifiers with focal loss, patch dropout, cosine LR, gradient accumulation, and QWK-based model selection
Best documented run: HistoEncoder + 4-layer TransMIL + gating v2 reaching 0.9151 validation QWK at epoch 41
Cribriform morphology path that clusters high-attention embeddings when predicted ISUP grade suggests Gleason Pattern 4 risk
Explainability path for attention heatmaps, WSI overlays, class probabilities, cribriform fields, and clinical-report JSON
Experiment artifacts for training curves, QWK curves, gating analysis, confusion matrices, configs, and logs
Test suite covering preprocessing, patch extraction, feature extraction, training, cribriform detection, and explainability
Architecture Map
Mermaid
flowchart TB WSI["Raw prostate WSI: SVS/TIFF/NDPI"] --> Reader["OpenSlide WSI reader"] Reader --> Tissue["HSV-Otsu tissue segmentation"] Tissue --> Registry["Patch registry: coords, MPP, spatial variance"] Registry --> FeatureDataset["On-the-fly patch dataset"] FeatureDataset --> Macenko["GPU Macenko color normalization"] Macenko --> Backbone["Frozen pathology foundation model"] Backbone --> Histo["HistoEncoder 384d"] Backbone --> Virchow["Virchow2 1280d"] Histo --> Embeddings["Slide embeddings .pt"] Virchow --> Embeddings Embeddings --> Gate["CustomGatingLayer v2: variance + Median/MAD"] Gate --> MIL["MIL aggregator: ABMIL or TransMIL"] MIL --> Classifier["ISUP classifier head"] Classifier --> Metrics["QWK, accuracy, confusion matrices"] MIL --> Attention["Attention weights"] Attention --> Cribriform["Cribriform clustering on high-attention patches"] Attention --> Heatmap["Attention heatmap + WSI overlay"] Cribriform --> Report["Clinical JSON report"] Heatmap --> Report