Back to projects

Pneumonia Detection Lab

Educational chest X-ray lab comparing feature-engineered logistic regression and CNN models for Normal vs Pneumonia prediction.

ExperimentalMedical imaging ML lab / 2024-2025

Problem

Chest X-ray classification needs consistent preprocessing and careful wording so the demo does not imply clinical readiness.

Solution

Build and compare three educational model paths: logistic regression with one combined feature vector, logistic regression with separated Gabor/DCT/Fourier/PHOG features, and a CNN trained directly from image tensors.

Key Features

Dataset documented as 5,200 chest X-ray images across Normal and Pneumonia classes
Image preprocessing for grayscale loading, resizing to 128x128, normalization, and notebook-ready tensors
Feature extraction using Gabor filters, DCT, Fourier transform, and PHOG descriptors
Single-feature and multi-feature logistic regression notebooks with saved pickle models
CNN notebook with TensorFlow/Keras training curves and saved h5 model artifact
Confusion matrices, accuracy reports, PCA/decision-boundary plots, and training/validation charts
Tkinter inference interface with image upload, model selector, and Normal/Pneumonia lab output
Portfolio wording framed as an educational experiment, not a medical diagnostic product

Architecture Map

Mermaid
flowchart LR
  Dataset["Chest X-ray dataset"] --> Prep["Preprocessing: grayscale, resize 128x128, normalize"]
  Prep --> Features["Feature extraction: Gabor, DCT, Fourier, PHOG"]
  Features --> SingleCsv["single_feature.csv: combined feature vector"]
  Features --> MultiCsv["multi_features.csv: separated feature columns"]
  SingleCsv --> LRSingle["Logistic Regression: LR_1feature"]
  MultiCsv --> LRMulti["Logistic Regression: LR_4features"]
  Prep --> CNN["CNN trained from image tensors"]
  LRSingle --> Metrics["Accuracy, reports, confusion matrices"]
  LRMulti --> Metrics
  CNN --> Metrics
  LRSingle --> App["Tkinter model selector"]
  LRMulti --> App
  CNN --> App
  App --> Result["Lab output: Normal or Pneumonia"]

Pneumonia Lab Screenshots

6 views
CNN training and validation curves used to review learning behavior across the image-based model.
CNN test confusion matrix, included as lab evidence rather than a clinical performance claim.
PCA decision-region visualization for the logistic regression path using separate Gabor, DCT, Fourier, and PHOG features.
Multi-feature logistic regression confusion matrix for comparing feature-engineered results.
Single-feature logistic regression decision boundary built from the combined feature vector.
Single-feature logistic regression confusion matrix used to compare the simpler feature table against the multi-feature setup.