CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a pathology research project analyzing omentum metastasis detection in gynecological malignancies. The codebase implements statistical pathsampling analysis to determine optimal tissue sampling protocols for detecting microscopic-only metastases in grossly normal omentum.

Key Finding: Sample 4 cassettes from omentum to achieve >95% detection of metastases (validated by 3 independent methods).

Project Structure

Core Analysis Pipeline

  1. Data Preparationomentum_recode.R
  2. Pathsampling Analysisomentum_pathsampling_run.R
  3. Report Generationomentum_pathsampling_analysis.qmd
  4. Validationcompare_jamovi_vs_qmd.R

Data Files

  • Input: omentum_04_11_2025.xlsx (1,098 cases - raw data)
  • Processed:
    • omentum.xlsx - Clean data for Quarto reports
    • omentum_recoded.csv - Full recoded dataset with analysis variables
    • omentum_recoded.RData - R workspace with analysis subsets

Key Analysis Subsets (in RData file)

  • omentum_analysis - Full dataset with derived variables
  • micro_tracked - Microscopic-only cases with detection tracking (n=46)
  • abundant_tracked - Abundant/obvious tumor cases with tracking (n=15)

Development Commands

R Analysis Workflow

# 1. Prepare data (required first)
source("omentum_recode.R")

# 2. Run pathsampling analysis
source("omentum_pathsampling_run.R")

# 3. Generate report
quarto::quarto_render("omentum_pathsampling_analysis.qmd")

# 4. Validate results
source("compare_jamovi_vs_qmd.R")

Quarto Website

# Preview website locally
quarto preview

# Render entire website
quarto render

# Render specific document
quarto render omentum_pathsampling_analysis.qmd

Working with Results

# Load analysis results
result <- readRDS("pathsampling_microscopic_results.rds")

# Load prepared data
load("omentum_recoded.RData")

# Access subsets
str(micro_tracked)  # Primary analysis group
str(abundant_tracked)  # Comparison group

Code Architecture

1. Data Recoding Script (omentum_recode.R)

Purpose: Transform raw Turkish-language pathology data into analysis-ready format

Key Operations: - Recodes Turkish variables to English - Creates comprehensive tumor classification (tumor_category) - Calculates metastatic block distribution metrics - Generates analysis subsets

Critical Variables Created: - tumor_category - 4-level classification: - “Microscopic-Only” - Grossly normal with occult metastases (TARGET GROUP) - “Abundant/Obvious” - Visible macroscopic tumor - “Small Visible Only” - Only macroscopic tumor - “No Tumor” - No tumor found - metastasis_proportion - Proportion of cassettes with metastasis - tumor_burden - Categorized burden (Single focus/Low/Moderate/High) - has_detection_tracking - Flag for cases with sequential detection data

Outputs: - omentum.xlsx - Basic recoded data for QMD analysis - omentum_recoded.csv - Full dataset with all derived variables - omentum_recoded.RData - R workspace with micro_tracked and abundant_tracked subsets

2. Pathsampling Analysis Runner (omentum_pathsampling_run.R)

Purpose: Run ClinicoPath pathsampling function on prepared data

Dependencies: - Requires omentum_recoded.RData (run omentum_recode.R first) - Requires ClinicoPath jamovi module at: /Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule

Workflow: 1. Loads prepared data 2. Loads ClinicoPath package from local dev directory 3. Runs pathsampling on microscopic-only cases 4. Optionally runs on abundant cases (if n ≥ 5) 5. Performs comparative analysis

Key Parameters: - firstDetection = "first_cassette_tumor_identified" - Sequential detection variable - targetConfidence = 0.95 - 95% detection target - maxSamples = 15 - Maximum cassettes to evaluate - bootstrapIterations = 10000 - For confidence intervals - seedValue = 42 - Reproducibility

3. Quarto Analysis Document (omentum_pathsampling_analysis.qmd)

Purpose: Comprehensive reproducible research document with dynamic calculations

Architecture: - Loads data directly from omentum.xlsx - Performs all recoding inline - Implements pathsampling using geometric probability model - Generates publication-ready figures and tables

Key Statistical Methods: - Geometric MLE: q = 1 / mean(first_detection) - Optimized MLE: Uses optimize() with negative log-likelihood - Bootstrap CI: 10,000 iterations for robust confidence intervals - Cumulative Detection: P(detect ≤ n) = 1 - (1-q)^n

Important: This QMD file is self-contained and does NOT depend on the R scripts. It loads raw data and performs all analyses independently.

4. Validation Script (compare_jamovi_vs_qmd.R)

Purpose: Compare results from 3 independent analysis methods

Methods Compared: 1. Jamovi GUI analysis (all 1,098 cases) 2. R pathsampling function (46 microscopic-only) 3. QMD manual analysis (46 microscopic-only)

Validation: All methods recommend 4 cassettes for >95% detection

Key Domain Concepts

Pathsampling Analysis

Purpose: Determine optimal number of tissue cassettes to achieve target detection sensitivity

Model: Geometric probability where: - q = probability of detecting tumor in any single cassette - First detection at cassette k: P(k) = (1-q)^(k-1) × q - MLE estimate: q = 1 / mean(first_detection)

Clinical Application: - If q = 0.5349 (53.5% per cassette) - Then 4 cassettes → 95.3% detection - Recommendation: Sample 4 cassettes from omentum

Microscopic-Only Metastases

Definition: Grossly normal omentum with occult metastases found only microscopically

Clinical Significance: - Occurs in 5.5% of cases - Not visible at surgery - Upstages disease - Impacts treatment decisions

Detection Characteristics (from this dataset): - Mean first detection: 1.87 cassettes - Detection probability: q = 0.5349 - Mean blocks with metastasis: 3.63 - Mean proportion involved: 63%

Tumor Categories in Dataset

Classification Logic (macro/micro presence): - No Tumor: macro=Absent, micro=Absent - Microscopic-Only: macro=Absent, micro=Present (PRIMARY ANALYSIS GROUP) - Small Visible Only: macro=Present, micro=Absent - Abundant/Obvious: macro=Present, micro=Present

Critical Variables Reference

Detection Variables

  • first_cassette_tumor_identified - Which cassette tumor was first seen (1-15)
  • cassette_number - Total cassettes examined
  • total_cassettes_with_metastasis - Number of blocks with tumor
  • cassettes_with_metastasis - Which specific cassettes have tumor

Classification Variables

  • macroscopic_tumor - Present/Absent (visible at gross exam)
  • microscopic_tumor - Present/Absent (visible microscopically)
  • tumor_category - Comprehensive classification (see above)
  • has_detection_tracking - TRUE if first_cassette_tumor_identified is not NA

Clinical Variables

  • Location - Primary tumor site (Endometrium/Ovary/Synchronous)
  • TumorType - Tumor grade (Low/High/Borderline)
  • metastasis_size_cm - Size of metastasis in cm
  • stage_changed - Yes/No (did omentum finding change stage)

Derived Metrics

  • metastasis_proportion - total_cassettes_with_metastasis / cassette_number
  • tumor_burden - Categorical (None/Single focus/Low/Moderate/High)
  • metastasis_extent - Categorical (Focal/Moderate/Extensive/Diffuse)

External Dependencies

ClinicoPath Package

Location: /Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule

Loading Pattern:

setwd("/Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule")
jmvtools::prepare()
devtools::document()
devtools::load_all()
setwd("/Users/serdarbalci/Desktop/omentum")  # Return to project

Usage: Provides pathsampling() function for statistical analysis

Alternative: QMD file implements pathsampling manually without this dependency

Common Workflows

Adding New Cases to Dataset

  1. Update omentum_04_11_2025.xlsx with new cases

  2. Run complete pipeline:

    source("omentum_recode.R")
    source("omentum_pathsampling_run.R")
    quarto::quarto_render("omentum_pathsampling_analysis.qmd")

Modifying Analysis Parameters

Change target confidence (e.g., 90% instead of 95%): - Edit targetConfidence = 0.90 in omentum_pathsampling_run.R - Edit target_confidence <- 0.90 in QMD file

Change bootstrap iterations: - Edit bootstrapIterations parameter - Edit n_boot variable in QMD

Change max cassettes evaluated: - Edit maxSamples parameter - Edit max_cassettes in QMD cumulative probability calculation

Generating Website

The project uses Quarto website format (defined in _quarto.yml):

project:
  type: website

website:
  title: "Omentum Sampling"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - href: omentum_pathsampling_analysis.qmd
        text: Omentum Path Sampling Analysis
      - href: omentum_analysis_in_jamovi.html
        text: Omentum Analysis in Jamovi

Preview with: quarto preview

Debugging Analysis Issues

If pathsampling function not found:

# Ensure ClinicoPath package is loaded
setwd("/Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule")
jmvtools::prepare()
devtools::document()
devtools::load_all()

If data subsets are empty:

# Check tumor category distribution
load("omentum_recoded.RData")
table(omentum_analysis$tumor_category)
table(omentum_analysis$has_detection_tracking)

If QMD rendering fails: - Check that omentum.xlsx exists in project root - Verify all required R packages are installed (tidyverse, readxl, knitr, kableExtra, ggplot2, patchwork)

Statistical Reproducibility

Random Seed: All analyses use set.seed(42) for reproducibility

Bootstrap Iterations: 10,000 iterations (consensus across all methods)

Confidence Intervals: 95% CI using 2.5th and 97.5th percentiles of bootstrap distribution

Important Notes

  • The project contains both Turkish and English variable names. The recoding script handles the translation.
  • All analyses focus on the “Microscopic-Only” subset (n=46) as the primary target group
  • The Jamovi analysis uses all 1,098 cases and produces slightly different q (0.5596 vs 0.5349) but the same clinical recommendation (4 cassettes)
  • Detection tracking is available for 76.7% of microscopic-only cases (46/60)
  • The pathsampling method assumes independent sampling (geometric distribution)

File Size Reference

  • Large RDS files (~1.1MB each): pathsampling_*_results.rds - These contain full bootstrap distributions
  • Small files: CSV summaries, analysis scripts
  • Medium files: Excel/CSV data files (~50-200KB)