Pneumonia Detection Lab

ExperimentalMedical imaging ML lab / 2024-2025

Problem

Chest X-ray classification needs consistent preprocessing and careful wording so the demo does not imply clinical readiness.

Solution

Build and compare three educational model paths: logistic regression with one combined feature vector, logistic regression with separated Gabor/DCT/Fourier/PHOG features, and a CNN trained directly from image tensors.

Key Features

Dataset documented as 5,200 chest X-ray images across Normal and Pneumonia classes

Image preprocessing for grayscale loading, resizing to 128x128, normalization, and notebook-ready tensors

Feature extraction using Gabor filters, DCT, Fourier transform, and PHOG descriptors

Single-feature and multi-feature logistic regression notebooks with saved pickle models

CNN notebook with TensorFlow/Keras training curves and saved h5 model artifact

Confusion matrices, accuracy reports, PCA/decision-boundary plots, and training/validation charts

Tkinter inference interface with image upload, model selector, and Normal/Pneumonia lab output

Portfolio wording framed as an educational experiment, not a medical diagnostic product

Architecture Map

Mermaid

flowchart LR
  Dataset["Chest X-ray dataset"] --> Prep["Preprocessing: grayscale, resize 128x128, normalize"]
  Prep --> Features["Feature extraction: Gabor, DCT, Fourier, PHOG"]
  Features --> SingleCsv["single_feature.csv: combined feature vector"]
  Features --> MultiCsv["multi_features.csv: separated feature columns"]
  SingleCsv --> LRSingle["Logistic Regression: LR_1feature"]
  MultiCsv --> LRMulti["Logistic Regression: LR_4features"]
  Prep --> CNN["CNN trained from image tensors"]
  LRSingle --> Metrics["Accuracy, reports, confusion matrices"]
  LRMulti --> Metrics
  CNN --> Metrics
  LRSingle --> App["Tkinter model selector"]
  LRMulti --> App
  CNN --> App
  App --> Result["Lab output: Normal or Pneumonia"]

Pneumonia Lab Screenshots

6 views

CNN training and validation accuracy and loss curves for the pneumonia detection lab. — CNN training and validation curves used to review learning behavior across the image-based model.

CNN confusion matrix for a small pneumonia detection test set. — CNN test confusion matrix, included as lab evidence rather than a clinical performance claim.

PCA decision-region plot for logistic regression using multiple extracted image features. — PCA decision-region visualization for the logistic regression path using separate Gabor, DCT, Fourier, and PHOG features.

Confusion matrix for logistic regression with multiple extracted image features. — Multi-feature logistic regression confusion matrix for comparing feature-engineered results.

Logistic regression sigmoid and decision-boundary plot using a combined feature vector. — Single-feature logistic regression decision boundary built from the combined feature vector.

Confusion matrix for logistic regression using a single combined image feature. — Single-feature logistic regression confusion matrix used to compare the simpler feature table against the multi-feature setup.