MediScan AI — Medical Image & Report Assistant

AI Computer Vision LLaMA FastAPI PyTorch Hugging Face Medical AI

Medical imaging is one of the highest-stakes domains for AI. A system that can look at a chest X-ray and describe what it sees — in plain clinical language — touches on almost every subdomain of applied AI: computer vision, anomaly detection, natural language generation, and responsible deployment. MediScan AI is my end-to-end prototype that chains all of these together into a single REST endpoint.


What It Does

You POST a scan (JPEG or PNG), and the API responds with three things:

  1. Classification results — top-k labels from a Vision Transformer fine-tuned on chest X-ray data, each with a confidence score.
  2. Anomaly assessment — a risk level (low / medium / high) derived from how far the scan’s feature vector sits from a distribution of “normal” embeddings.
  3. A structured radiology report — generated by LLaMA 3.1 70B, formatted with FINDINGS, IMPRESSION, and RECOMMENDATION sections.

The full pipeline runs in under 10 seconds on CPU (the bottleneck is the LLM API call).


Architecture

The pipeline has five stages, each a clean Python class:

Stage 1 — OpenCV Preprocessing

Raw medical scans arrive with low contrast and shot noise. Before any model sees the image, ImagePreprocessor applies:

  • CLAHE (Contrast Limited Adaptive Histogram Equalization) on the L-channel in LAB colour space — boosts local contrast without washing out global structure, which matters for detecting subtle opacities.
  • Non-local means denoising — preserves edges while smoothing the flat regions typical of CT and MRI backgrounds.

Stage 2 — Hugging Face + PyTorch Feature Extraction

MedicalImageAnalyser runs two passes:

Pass 1 (HF pipeline): nickmuchi/vit-finetuned-chest-xray-pneumonia — a ViT-base-patch16-224 fine-tuned on chest X-ray data. Returns top-k label/score pairs (e.g. PNEUMONIA: 0.87, NORMAL: 0.13).

Pass 2 (ResNet backbone): A ResNet-18 with its classification head removed produces a 512-dimensional feature vector that feeds the anomaly scorer. The HF model gives interpretable labels; the ResNet gives a dense embedding that captures general visual structure.

Stage 3 — Scikit-learn Anomaly Scoring

AnomalyScorer wraps an IsolationForest trained on normal scan embeddings. It normalises the feature vector and maps the decision_function output to a risk level:

score < -0.15 and is_anomaly  →  HIGH
score >= -0.15 and is_anomaly →  MEDIUM
not is_anomaly                →  LOW

Stage 4 — NLTK Prompt Construction

PromptBuilder uses NLTK’s word_tokenize to deduplicate terms from the classification labels and assembles a structured prompt with scan type, confidence scores, and anomaly metrics for the LLM.

Stage 5 — LLaMA 3.1 Report Generation

LLaMAReportGenerator calls NVIDIA’s OpenAI-compatible endpoint for meta/llama-3.1-70b-instruct at temperature 0.3 — low enough for consistent, conservative medical language. The system prompt instructs the model to always flag output as AI-assisted and requiring clinical review.


FastAPI Wrapper

GET  /health   →  status, device, model name
POST /analyze  →  multipart: file + scan_type  →  JSON report

Running It

git clone https://github.com/ik-awais/mediscan-ai.git && cd mediscan-ai
pip install -r requirements.txt

# Create .env with your keys
echo "NVIDIA_API_KEY=your_key_here" >> .env
echo "HF_TOKEN=your_token_here" >> .env

# Open notebook, run all cells — server starts at http://localhost:8000

curl -X POST http://localhost:8000/analyze \
  -F 'file=@chest_xray.jpg' \
  -F 'scan_type=chest X-ray'

Clinical disclaimer: MediScan AI is a research prototype. It is not a medical device and must not be used for clinical decision-making. All outputs require review by a qualified radiologist.

View on GitHub