MediScan AI: Medical Image & Report Assistant

March 2026

AIComputer VisionLLaMAFastAPIPyTorchHugging FaceMedical AIVision TransformerAnomaly DetectionNLP

An end-to-end AI pipeline for medical scan analysis. Upload a raw scan (JPEG or PNG), and the system preprocesses the image, classifies it with a fine-tuned Vision Transformer, scores it for anomalies using a ResNet-18 feature extractor paired with Scikit-learn’s IsolationForest, and generates a structured three-section radiology report using LLaMA 3.1 70B Instruct, all served via a single FastAPI REST endpoint.

The full pipeline runs in under 10 seconds on CPU. The bottleneck is the LLM API call to NVIDIA’s hosted endpoint.

Clinical Disclaimer: MediScan AI is a research and educational prototype. It is not a medical device and must not be used for clinical decision-making. All outputs require review by a qualified radiologist or licensed clinician before any action is taken.

What It Does

A single POST /analyze call with a scan file and scan type returns three structured outputs:

Classification results: top-k labels from a ViT fine-tuned on chest X-ray data, each with a confidence score (e.g. PNEUMONIA: 0.8741, NORMAL: 0.1259)
Anomaly assessment: a risk level of low, medium, or high derived from how far the scan’s 512-dimensional ResNet-18 feature vector sits from a distribution of normal embeddings, plus a raw anomaly score
Structured radiology report: generated by LLaMA 3.1 70B at temperature 0.3, formatted with FINDINGS, IMPRESSION, and RECOMMENDATION sections, with an embedded disclaimer that the output is AI-assisted and requires clinical review

Pipeline Architecture

Five isolated Python classes form the sequential pipeline:

Upload Scan (JPEG / PNG)
        │
        ▼
┌─────────────────────────────────────────┐
│  Stage 1 - ImagePreprocessor            │
│  OpenCV CLAHE contrast enhancement      │
│  + NL-means denoising                   │
└────────────────────┬────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────┐
│  Stage 2 - MedicalImageAnalyser         │
│  Pass 1: HuggingFace ViT pipeline       │
│    nickmuchi/vit-finetuned-chest-xray   │
│    → top-k labels + confidence scores   │
│  Pass 2: ResNet-18 backbone             │
│    (classification head removed)        │
│    → 512-dim feature vector             │
└────────────────────┬────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────┐
│  Stage 3 - AnomalyScorer               │
│  Scikit-learn IsolationForest           │
│  Normalise vector → decision_function   │
│  → anomaly score + risk level           │
│    score < -0.15 & is_anomaly → HIGH    │
│    score >= -0.15 & is_anomaly → MEDIUM │
│    not is_anomaly              → LOW    │
└────────────────────┬────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────┐
│  Stage 4 - PromptBuilder                │
│  NLTK word_tokenize on label strings    │
│  → deduplicated terms                   │
│  → structured prompt with scan type,    │
│     confidence scores, anomaly metrics  │
└────────────────────┬────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────┐
│  Stage 5 - LLaMAReportGenerator        │
│  NVIDIA OpenAI-compatible endpoint      │
│  meta/llama-3.1-70b-instruct            │
│  temperature=0.3                        │
│  → FINDINGS / IMPRESSION / RECOMM.     │
└────────────────────┬────────────────────┘
                     │
                     ▼
             FastAPI JSON Response
    (labels + anomaly score + full report)

Stage-by-Stage Detail

Stage 1: OpenCV Preprocessing

Raw medical scans typically arrive with low local contrast and shot noise that would impair both the classifier and the feature extractor. ImagePreprocessor applies two operations before any model sees the image:

CLAHE (Contrast Limited Adaptive Histogram Equalization): runs on the L-channel in LAB colour space. Unlike global histogram equalization, CLAHE operates on small tiles and clips the contrast amplification, which boosts subtle opacities and margins without washing out global structure. This is particularly important for detecting subtle infiltrates in chest X-rays.

Non-local means denoising: preserves hard edges (which matter for identifying masses and borders) while smoothing the flat uniform regions typical of CT and MRI backgrounds, reducing false texture signals that could confuse the classifier.

Stage 2: ViT Classification + ResNet Feature Extraction

MedicalImageAnalyser runs two separate forward passes:

Pass 1 (HuggingFace ViT pipeline): nickmuchi/vit-finetuned-chest-xray-pneumonia is a ViT-base-patch16-224 fine-tuned on chest X-ray data. It produces top-k label/score pairs that are directly interpretable (e.g. PNEUMONIA, NORMAL). These labels drive both the anomaly prompt and the report.

Pass 2 (ResNet-18 feature extraction): A ResNet-18 with its final classification head removed produces a 512-dimensional dense embedding. This vector captures general visual structure independently of the ViT’s label vocabulary, giving the anomaly scorer a richer signal than label confidence alone. The two-model approach separates interpretable classification from geometric feature representation.

Stage 3: IsolationForest Anomaly Scoring

AnomalyScorer normalises the 512-dim ResNet embedding and passes it through a Scikit-learn IsolationForest trained on embeddings from normal scans. The decision_function output (negative = more anomalous) is mapped to three risk tiers:

Condition	Risk Level
`decision_function < -0.15` and `predict == -1`	HIGH
`decision_function >= -0.15` and `predict == -1`	MEDIUM
`predict == 1` (inlier)	LOW

The raw score is also returned in the API response, allowing downstream systems to apply their own thresholds.

Stage 4: NLTK Prompt Construction

PromptBuilder tokenises the classification label strings with nltk.word_tokenize, de-duplicates terms, and assembles a structured prompt that includes the scan type, the top-k label/confidence pairs, and the anomaly score with risk level. Structuring the prompt this way, rather than passing raw model outputs, keeps the LLM focused on clinical synthesis rather than score interpretation.

Stage 5: LLaMA 3.1 Report Generation

LLaMAReportGenerator calls NVIDIA’s hosted meta/llama-3.1-70b-instruct endpoint via the OpenAI-compatible SDK at temperature=0.3. Low temperature produces conservative, consistent medical language and reduces hallucinated clinical details. The system prompt instructs the model to always flag output as AI-assisted and requiring clinical review, and to structure the response as three sections: FINDINGS, IMPRESSION, RECOMMENDATION.

Tech Stack

Layer	Tool	Notes
Image preprocessing	OpenCV	CLAHE on LAB L-channel + NL-means denoising
ViT classification	HuggingFace Transformers + PyTorch	`nickmuchi/vit-finetuned-chest-xray-pneumonia`
Feature extraction	ResNet-18 (torchvision)	512-dim embedding, classification head removed
Anomaly detection	Scikit-learn IsolationForest	Trained on normal scan embeddings
Text processing	NLTK	`word_tokenize` for prompt deduplication
Report generation	LLaMA 3.1 70B Instruct	NVIDIA-hosted, OpenAI-compatible API
API server	FastAPI + Uvicorn	REST endpoint with multipart file upload
Config	python-dotenv	API key management via `.env`

Project Structure

mediscan-ai/
├── mediscan_ai.ipynb     # Main notebook - all pipeline stages, step-by-step
├── .env.example          # Template for API keys (never commit .env)
├── requirements.txt      # Pinned dependencies
└── README.md

The entire pipeline lives in the notebook for clarity and reproducibility. The FastAPI server is started from the final notebook cell, making it straightforward to experiment with individual stages before running the full endpoint.

API Reference

`GET /health`

Returns system status, compute device, and active model name.

{
  "status": "ok",
  "device": "cpu",
  "model": "meta/llama-3.1-70b-instruct"
}

`POST /analyze`

Request: multipart form upload:

Field	Type	Description
`file`	file	JPEG or PNG medical scan
`scan_type`	string	Human-readable scan type (e.g. `"chest X-ray"`, `"CT scan"`)

Response:

{
  "scan_type": "chest X-ray",
  "top_labels": [
    {"label": "PNEUMONIA", "score": 0.8741},
    {"label": "NORMAL",    "score": 0.1259}
  ],
  "anomaly": {
    "anomaly_score": -0.1823,
    "risk_level": "high",
    "is_anomaly": true
  },
  "report": "FINDINGS:\n...\n\nIMPRESSION:\n...\n\nRECOMMENDATION:\n..."
}

Setup & Installation

Prerequisites

Python 3.9 or higher
A free NVIDIA API key at build.nvidia.com
A HuggingFace token at huggingface.co/settings/tokens (free)

Steps

1. Clone and install

git clone https://github.com/ik-awais/mediscan-ai.git
cd mediscan-ai
pip install -r requirements.txt

2. Configure API keys

cp .env.example .env

Open .env and add:

NVIDIA_API_KEY=your_nvidia_api_key_here
HF_TOKEN=your_huggingface_token_here

3. Run the notebook and start the server

Open mediscan_ai.ipynb in Jupyter and run all cells top to bottom. The final cell starts the FastAPI server at http://localhost:8000.

4. Send a test request

curl -X POST http://localhost:8000/analyze \
  -F 'file=@/path/to/scan.jpg' \
  -F 'scan_type=chest X-ray'

Design Decisions

Two-model image analysis: The ViT provides interpretable classification labels; the ResNet provides dense geometric embeddings. Using both separates the tasks cleanly: the ViT answers “what does this look like?” and the ResNet gives the anomaly scorer a richer signal that is not constrained by the ViT’s label vocabulary. A scan could be structurally unusual without fitting any of the ViT’s trained categories.

IsolationForest over supervised anomaly detection: IsolationForest is an unsupervised method that requires only normal examples for training. This is practical in medical imaging where labelled anomaly data is scarce, and avoids the class imbalance problems that affect supervised classifiers trained on rare conditions.

Low LLM temperature (0.3): Medical language should be conservative and consistent. High temperature increases output variety but also increases the chance of hallucinated clinical details. At 0.3, the model stays within the evidence the prompt provides.

Notebook-first architecture: The entire pipeline is in a single Jupyter notebook rather than a multi-file Python package. This makes each stage independently explorable and debuggable, which matters during research prototyping where pipeline stages are likely to change. The FastAPI server is a thin wrapper over the same classes, not a separate codebase.

CLAHE over global equalization: Global histogram equalization amplifies noise uniformly across the entire image. CLAHE’s tile-based approach and contrast clipping makes it the standard preprocessing step in medical imaging, where enhancing a subtle consolidation in one region should not blow out the hilum in another.

Roadmap

Replace synthetic IsolationForest training data with real labelled scan embeddings
Add DICOM file support (the native format for clinical scanners)
Multi-modality support: CT, MRI, ultrasound in addition to plain X-ray
Streamlit or React frontend for non-technical clinical users
Docker container for single-command deployment

Document Q&A System

A production-ready RAG system built with LangChain, FAISS, and Streamlit. Upload PDF, DOCX, or TXT documents and get natural language answers with source citations, sharing the Groq LLM backend and the same philosophy of grounded, traceable AI output. → GitHub · Read More

← Back to Projects

MediScan AI: Medical Image & Report Assistant

What It Does

Pipeline Architecture

Stage-by-Stage Detail

Stage 1: OpenCV Preprocessing

Stage 2: ViT Classification + ResNet Feature Extraction

Stage 3: IsolationForest Anomaly Scoring

Stage 4: NLTK Prompt Construction

Stage 5: LLaMA 3.1 Report Generation

Tech Stack

Project Structure

API Reference

`GET /health`

`POST /analyze`

Setup & Installation

Prerequisites

Steps

Design Decisions

Roadmap

Document Q&A System

LectureLens: AI-Powered Study Assistant with RAG & Hybrid Search

AI Research Assistant Agent

You Don’t Need to Be a Programmer to Be a Tech Person

Operating Systems Explained for Normal People

MediScan AI: Medical Image & Report Assistant

What It Does

Pipeline Architecture

Stage-by-Stage Detail

Stage 1: OpenCV Preprocessing

Stage 2: ViT Classification + ResNet Feature Extraction

Stage 3: IsolationForest Anomaly Scoring

Stage 4: NLTK Prompt Construction

Stage 5: LLaMA 3.1 Report Generation

Tech Stack

Project Structure

API Reference

GET /health

POST /analyze

Setup & Installation

Prerequisites

Steps

Design Decisions

Roadmap

Related Projects

Document Q&A System

LectureLens: AI-Powered Study Assistant with RAG & Hybrid Search

AI Research Assistant Agent

Related Blog Posts

You Don’t Need to Be a Programmer to Be a Tech Person

Operating Systems Explained for Normal People

`GET /health`

`POST /analyze`