CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a pathology research project analyzing omentum metastasis detection in gynecological malignancies. The codebase implements statistical pathsampling analysis to determine optimal tissue sampling protocols for detecting microscopic-only metastases in grossly normal omentum.
Key Finding: Sample 4 cassettes from omentum to achieve >95% detection of metastases (validated by 3 independent methods).
Project Structure
Core Analysis Pipeline
- Data Preparation →
omentum_recode.R - Pathsampling Analysis →
omentum_pathsampling_run.R - Report Generation →
omentum_pathsampling_analysis.qmd - Validation →
compare_jamovi_vs_qmd.R
Data Files
- Input:
omentum_04_11_2025.xlsx(1,098 cases - raw data) - Processed:
omentum.xlsx- Clean data for Quarto reportsomentum_recoded.csv- Full recoded dataset with analysis variablesomentum_recoded.RData- R workspace with analysis subsets
Key Analysis Subsets (in RData file)
omentum_analysis- Full dataset with derived variablesmicro_tracked- Microscopic-only cases with detection tracking (n=46)abundant_tracked- Abundant/obvious tumor cases with tracking (n=15)
Development Commands
R Analysis Workflow
# 1. Prepare data (required first)
source("omentum_recode.R")
# 2. Run pathsampling analysis
source("omentum_pathsampling_run.R")
# 3. Generate report
quarto::quarto_render("omentum_pathsampling_analysis.qmd")
# 4. Validate results
source("compare_jamovi_vs_qmd.R")Quarto Website
# Preview website locally
quarto preview
# Render entire website
quarto render
# Render specific document
quarto render omentum_pathsampling_analysis.qmdWorking with Results
# Load analysis results
result <- readRDS("pathsampling_microscopic_results.rds")
# Load prepared data
load("omentum_recoded.RData")
# Access subsets
str(micro_tracked) # Primary analysis group
str(abundant_tracked) # Comparison groupCode Architecture
1. Data Recoding Script (omentum_recode.R)
Purpose: Transform raw Turkish-language pathology data into analysis-ready format
Key Operations: - Recodes Turkish variables to English - Creates comprehensive tumor classification (tumor_category) - Calculates metastatic block distribution metrics - Generates analysis subsets
Critical Variables Created: - tumor_category - 4-level classification: - “Microscopic-Only” - Grossly normal with occult metastases (TARGET GROUP) - “Abundant/Obvious” - Visible macroscopic tumor - “Small Visible Only” - Only macroscopic tumor - “No Tumor” - No tumor found - metastasis_proportion - Proportion of cassettes with metastasis - tumor_burden - Categorized burden (Single focus/Low/Moderate/High) - has_detection_tracking - Flag for cases with sequential detection data
Outputs: - omentum.xlsx - Basic recoded data for QMD analysis - omentum_recoded.csv - Full dataset with all derived variables - omentum_recoded.RData - R workspace with micro_tracked and abundant_tracked subsets
2. Pathsampling Analysis Runner (omentum_pathsampling_run.R)
Purpose: Run ClinicoPath pathsampling function on prepared data
Dependencies: - Requires omentum_recoded.RData (run omentum_recode.R first) - Requires ClinicoPath jamovi module at: /Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule
Workflow: 1. Loads prepared data 2. Loads ClinicoPath package from local dev directory 3. Runs pathsampling on microscopic-only cases 4. Optionally runs on abundant cases (if n ≥ 5) 5. Performs comparative analysis
Key Parameters: - firstDetection = "first_cassette_tumor_identified" - Sequential detection variable - targetConfidence = 0.95 - 95% detection target - maxSamples = 15 - Maximum cassettes to evaluate - bootstrapIterations = 10000 - For confidence intervals - seedValue = 42 - Reproducibility
3. Quarto Analysis Document (omentum_pathsampling_analysis.qmd)
Purpose: Comprehensive reproducible research document with dynamic calculations
Architecture: - Loads data directly from omentum.xlsx - Performs all recoding inline - Implements pathsampling using geometric probability model - Generates publication-ready figures and tables
Key Statistical Methods: - Geometric MLE: q = 1 / mean(first_detection) - Optimized MLE: Uses optimize() with negative log-likelihood - Bootstrap CI: 10,000 iterations for robust confidence intervals - Cumulative Detection: P(detect ≤ n) = 1 - (1-q)^n
Important: This QMD file is self-contained and does NOT depend on the R scripts. It loads raw data and performs all analyses independently.
4. Validation Script (compare_jamovi_vs_qmd.R)
Purpose: Compare results from 3 independent analysis methods
Methods Compared: 1. Jamovi GUI analysis (all 1,098 cases) 2. R pathsampling function (46 microscopic-only) 3. QMD manual analysis (46 microscopic-only)
Validation: All methods recommend 4 cassettes for >95% detection
Key Domain Concepts
Pathsampling Analysis
Purpose: Determine optimal number of tissue cassettes to achieve target detection sensitivity
Model: Geometric probability where: - q = probability of detecting tumor in any single cassette - First detection at cassette k: P(k) = (1-q)^(k-1) × q - MLE estimate: q = 1 / mean(first_detection)
Clinical Application: - If q = 0.5349 (53.5% per cassette) - Then 4 cassettes → 95.3% detection - Recommendation: Sample 4 cassettes from omentum
Microscopic-Only Metastases
Definition: Grossly normal omentum with occult metastases found only microscopically
Clinical Significance: - Occurs in 5.5% of cases - Not visible at surgery - Upstages disease - Impacts treatment decisions
Detection Characteristics (from this dataset): - Mean first detection: 1.87 cassettes - Detection probability: q = 0.5349 - Mean blocks with metastasis: 3.63 - Mean proportion involved: 63%
Tumor Categories in Dataset
Classification Logic (macro/micro presence): - No Tumor: macro=Absent, micro=Absent - Microscopic-Only: macro=Absent, micro=Present (PRIMARY ANALYSIS GROUP) - Small Visible Only: macro=Present, micro=Absent - Abundant/Obvious: macro=Present, micro=Present
Critical Variables Reference
Detection Variables
first_cassette_tumor_identified- Which cassette tumor was first seen (1-15)cassette_number- Total cassettes examinedtotal_cassettes_with_metastasis- Number of blocks with tumorcassettes_with_metastasis- Which specific cassettes have tumor
Classification Variables
macroscopic_tumor- Present/Absent (visible at gross exam)microscopic_tumor- Present/Absent (visible microscopically)tumor_category- Comprehensive classification (see above)has_detection_tracking- TRUE iffirst_cassette_tumor_identifiedis not NA
Clinical Variables
Location- Primary tumor site (Endometrium/Ovary/Synchronous)TumorType- Tumor grade (Low/High/Borderline)metastasis_size_cm- Size of metastasis in cmstage_changed- Yes/No (did omentum finding change stage)
Derived Metrics
metastasis_proportion-total_cassettes_with_metastasis / cassette_numbertumor_burden- Categorical (None/Single focus/Low/Moderate/High)metastasis_extent- Categorical (Focal/Moderate/Extensive/Diffuse)
External Dependencies
ClinicoPath Package
Location: /Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule
Loading Pattern:
setwd("/Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule")
jmvtools::prepare()
devtools::document()
devtools::load_all()
setwd("/Users/serdarbalci/Desktop/omentum") # Return to projectUsage: Provides pathsampling() function for statistical analysis
Alternative: QMD file implements pathsampling manually without this dependency
Common Workflows
Adding New Cases to Dataset
Update
omentum_04_11_2025.xlsxwith new casesRun complete pipeline:
source("omentum_recode.R") source("omentum_pathsampling_run.R") quarto::quarto_render("omentum_pathsampling_analysis.qmd")
Modifying Analysis Parameters
Change target confidence (e.g., 90% instead of 95%): - Edit targetConfidence = 0.90 in omentum_pathsampling_run.R - Edit target_confidence <- 0.90 in QMD file
Change bootstrap iterations: - Edit bootstrapIterations parameter - Edit n_boot variable in QMD
Change max cassettes evaluated: - Edit maxSamples parameter - Edit max_cassettes in QMD cumulative probability calculation
Generating Website
The project uses Quarto website format (defined in _quarto.yml):
project:
type: website
website:
title: "Omentum Sampling"
navbar:
left:
- href: index.qmd
text: Home
- href: omentum_pathsampling_analysis.qmd
text: Omentum Path Sampling Analysis
- href: omentum_analysis_in_jamovi.html
text: Omentum Analysis in JamoviPreview with: quarto preview
Debugging Analysis Issues
If pathsampling function not found:
# Ensure ClinicoPath package is loaded
setwd("/Users/serdarbalci/Documents/GitHub/ClinicoPathJamoviModule")
jmvtools::prepare()
devtools::document()
devtools::load_all()If data subsets are empty:
# Check tumor category distribution
load("omentum_recoded.RData")
table(omentum_analysis$tumor_category)
table(omentum_analysis$has_detection_tracking)If QMD rendering fails: - Check that omentum.xlsx exists in project root - Verify all required R packages are installed (tidyverse, readxl, knitr, kableExtra, ggplot2, patchwork)
Statistical Reproducibility
Random Seed: All analyses use set.seed(42) for reproducibility
Bootstrap Iterations: 10,000 iterations (consensus across all methods)
Confidence Intervals: 95% CI using 2.5th and 97.5th percentiles of bootstrap distribution
Important Notes
- The project contains both Turkish and English variable names. The recoding script handles the translation.
- All analyses focus on the “Microscopic-Only” subset (n=46) as the primary target group
- The Jamovi analysis uses all 1,098 cases and produces slightly different q (0.5596 vs 0.5349) but the same clinical recommendation (4 cassettes)
- Detection tracking is available for 76.7% of microscopic-only cases (46/60)
- The pathsampling method assumes independent sampling (geometric distribution)
File Size Reference
- Large RDS files (~1.1MB each):
pathsampling_*_results.rds- These contain full bootstrap distributions - Small files: CSV summaries, analysis scripts
- Medium files: Excel/CSV data files (~50-200KB)