PathOS / Prostate Lesion Detection

ExperimentalComputational pathology research lab / 2025-2026

Problem

Prostate biopsy grading uses gigapixel slides where small tissue regions matter, artifacts can distort attention, and ISUP grades require careful evaluation and explainability.

Solution

Store patch coordinates, extract pathology-model embeddings, suppress artifact-like tokens with spatial-variance and Median/MAD gating, train ABMIL/TransMIL classifiers, and project attention into heatmaps.

Key Features

OpenSlide preprocessing for SVS/TIFF/NDPI slides with HSV-Otsu tissue segmentation and 256x256 patch tessellation

Coordinate-only patch registry that avoids saving massive patch folders while preserving MPP, coordinates, and spatial variance

GPU feature extraction with Macenko color normalization and frozen HistoEncoder or Virchow2 pathology backbones

CustomGatingLayer v2 that combines low spatial variance with robust Median/MAD token-norm outlier detection

ABMIL and TransMIL slide-level classifiers with focal loss, patch dropout, cosine LR, gradient accumulation, and QWK-based model selection

Best documented run: HistoEncoder + 4-layer TransMIL + gating v2 reaching 0.9151 validation QWK at epoch 41

Cribriform morphology path that clusters high-attention embeddings when predicted ISUP grade suggests Gleason Pattern 4 risk

Explainability path for attention heatmaps, WSI overlays, class probabilities, cribriform fields, and clinical-report JSON

Experiment artifacts for training curves, QWK curves, gating analysis, confusion matrices, configs, and logs

Test suite covering preprocessing, patch extraction, feature extraction, training, cribriform detection, and explainability

Architecture Map

Mermaid

flowchart TB
  WSI["Raw prostate WSI: SVS/TIFF/NDPI"] --> Reader["OpenSlide WSI reader"]
  Reader --> Tissue["HSV-Otsu tissue segmentation"]
  Tissue --> Registry["Patch registry: coords, MPP, spatial variance"]
  Registry --> FeatureDataset["On-the-fly patch dataset"]
  FeatureDataset --> Macenko["GPU Macenko color normalization"]
  Macenko --> Backbone["Frozen pathology foundation model"]
  Backbone --> Histo["HistoEncoder 384d"]
  Backbone --> Virchow["Virchow2 1280d"]
  Histo --> Embeddings["Slide embeddings .pt"]
  Virchow --> Embeddings
  Embeddings --> Gate["CustomGatingLayer v2: variance + Median/MAD"]
  Gate --> MIL["MIL aggregator: ABMIL or TransMIL"]
  MIL --> Classifier["ISUP classifier head"]
  Classifier --> Metrics["QWK, accuracy, confusion matrices"]
  MIL --> Attention["Attention weights"]
  Attention --> Cribriform["Cribriform clustering on high-attention patches"]
  Attention --> Heatmap["Attention heatmap + WSI overlay"]
  Cribriform --> Report["Clinical JSON report"]
  Heatmap --> Report

PathOS Screenshots

5 views

Training and validation Quadratic Weighted Kappa curves for the best PathOS HistoEncoder TransMIL run. — Champion HistoEncoder + 4-layer TransMIL run: validation QWK reaches 0.9151 at epoch 41.

Training and validation focal loss curves for the best PathOS HistoEncoder TransMIL run. — Focal-loss curves for the best gated TransMIL experiment, used to identify the model selection point and later overfitting.

PathOS gating analysis scatter plot comparing spatial variance and token L2 norm. — Custom gating analysis: spatial variance vs token norm separates valid tissue tokens from rejected artifact-like patches.

PathOS token norm histogram before and after Median MAD gating. — Median/MAD gating evidence showing token norm thresholds used to suppress outlier patches before attention pooling.

External evaluation confusion matrix for the PathOS HistoEncoder TransMIL model. — TransMIL confusion matrix from the external evaluation framework, included as research evidence rather than a diagnostic claim.