Back to projects

PathOS / Prostate Lesion Detection

Whole-slide-image pipeline for ISUP grade prediction with tissue segmentation, patch registries, foundation-model embeddings, artifact gating, MIL aggregation, cribriform analysis, heatmaps, and tracking.

ExperimentalComputational pathology research lab / 2025-2026

Problem

Prostate biopsy grading uses gigapixel slides where small tissue regions matter, artifacts can distort attention, and ISUP grades require careful evaluation and explainability.

Solution

Store patch coordinates, extract pathology-model embeddings, suppress artifact-like tokens with spatial-variance and Median/MAD gating, train ABMIL/TransMIL classifiers, and project attention into heatmaps.

Key Features

OpenSlide preprocessing for SVS/TIFF/NDPI slides with HSV-Otsu tissue segmentation and 256x256 patch tessellation
Coordinate-only patch registry that avoids saving massive patch folders while preserving MPP, coordinates, and spatial variance
GPU feature extraction with Macenko color normalization and frozen HistoEncoder or Virchow2 pathology backbones
CustomGatingLayer v2 that combines low spatial variance with robust Median/MAD token-norm outlier detection
ABMIL and TransMIL slide-level classifiers with focal loss, patch dropout, cosine LR, gradient accumulation, and QWK-based model selection
Best documented run: HistoEncoder + 4-layer TransMIL + gating v2 reaching 0.9151 validation QWK at epoch 41
Cribriform morphology path that clusters high-attention embeddings when predicted ISUP grade suggests Gleason Pattern 4 risk
Explainability path for attention heatmaps, WSI overlays, class probabilities, cribriform fields, and clinical-report JSON
Experiment artifacts for training curves, QWK curves, gating analysis, confusion matrices, configs, and logs
Test suite covering preprocessing, patch extraction, feature extraction, training, cribriform detection, and explainability

Architecture Map

Mermaid
flowchart TB
  WSI["Raw prostate WSI: SVS/TIFF/NDPI"] --> Reader["OpenSlide WSI reader"]
  Reader --> Tissue["HSV-Otsu tissue segmentation"]
  Tissue --> Registry["Patch registry: coords, MPP, spatial variance"]
  Registry --> FeatureDataset["On-the-fly patch dataset"]
  FeatureDataset --> Macenko["GPU Macenko color normalization"]
  Macenko --> Backbone["Frozen pathology foundation model"]
  Backbone --> Histo["HistoEncoder 384d"]
  Backbone --> Virchow["Virchow2 1280d"]
  Histo --> Embeddings["Slide embeddings .pt"]
  Virchow --> Embeddings
  Embeddings --> Gate["CustomGatingLayer v2: variance + Median/MAD"]
  Gate --> MIL["MIL aggregator: ABMIL or TransMIL"]
  MIL --> Classifier["ISUP classifier head"]
  Classifier --> Metrics["QWK, accuracy, confusion matrices"]
  MIL --> Attention["Attention weights"]
  Attention --> Cribriform["Cribriform clustering on high-attention patches"]
  Attention --> Heatmap["Attention heatmap + WSI overlay"]
  Cribriform --> Report["Clinical JSON report"]
  Heatmap --> Report

PathOS Screenshots

5 views
Champion HistoEncoder + 4-layer TransMIL run: validation QWK reaches 0.9151 at epoch 41.
Focal-loss curves for the best gated TransMIL experiment, used to identify the model selection point and later overfitting.
Custom gating analysis: spatial variance vs token norm separates valid tissue tokens from rejected artifact-like patches.
Median/MAD gating evidence showing token norm thresholds used to suppress outlier patches before attention pooling.
TransMIL confusion matrix from the external evaluation framework, included as research evidence rather than a diagnostic claim.