3 Materials and Methods

3.1 Study Cohort

Our study included 1098 patients who underwent omental sampling for a gynecologic malignancy at our department between 2014 and 2023; of these, 1071 had a carcinoma and 27 had a serous borderline tumor. All cases were evaluated for age, primary tumor location, histologic subtype, chemotherapy status, presence of macroscopic omental disease, presence of a microscopically detected (occult) omental focus (defined as <0.5 cm; see Introduction), the number of blocks sampled, and the block number in which the focus was first detected. For every case the omentum submitted for histologic review had been described as grossly unremarkable, without a discrete mass or nodule; in the cases with an occult focus the archived gross descriptions were re-checked to confirm that no macroscopic lesion had been recorded. In addition to histologic subtype, tumors were classified by grade as low-grade (grade 1 and 2 endometrioid type), serous borderline tumor, or high-grade. Because “high-grade” encompasses several histologic types with potentially differing metastatic propensity, the actual histologic diagnoses are provided disaggregated by primary site in the results; low-grade serous carcinoma of the ovary was analyzed within the high-grade category for grade-based comparisons and identified separately where relevant. In our study, omental sampling in all cases was performed using a combination of inspection and palpation. Survival data were excluded from this study. For patients with an occult focus (<0.5 cm), the archived slides were retrieved and re-reviewed, and the greatest dimension of the focus was measured on the glass slide. Our laboratory records the cassette-by-cassette origin of the omental sections and, when an occult focus is identified, the specific block in which it is first seen (the first-positive block, or “tracking data”); this documentation practice was in place throughout the 2014–2023 study period.

3.2 Statistical Analysis

3.2.1 Descriptive Statistics

Descriptive statistics were presented as mean ± standard deviation for continuous variables and as frequencies and percentages for categorical variables. Comparison of first detection cassette between endometrial and ovarian primary tumors was performed using the Mann-Whitney U test, as the data were right-skewed with small subgroup sizes. The rank-biserial correlation was calculated as the effect size measure. A p-value <0.05 was considered statistically significant. Differences across tumor burden strata were assessed using the Kruskal-Wallis test.

3.2.2 Detection Probability Estimation

The per-cassette detection probability ($q$) was estimated using two complementary approaches to ensure robustness of our findings:

1. Geometric Maximum Likelihood Estimation (MLE): The primary analysis estimated $q$ as the reciprocal of the mean first detection cassette number ($q = 1/\bar{k}$), where $\bar{k}$ is the mean cassette number at which tumor was first identified. This approach models the distribution of first-detection times under a geometric probability framework, where the probability of first detecting tumor in cassette $k$ is:

\[P(k) = (1-q)^{k-1} \times q\]

2. Empirical Proportion: As a validation analysis, we calculated the overall proportion of cassettes containing tumor across all examined cassettes (positive cassettes / total cassettes examined). This empirical proportion provides an estimate of $q$ independent of first-detection modeling and uses all available cassette data rather than only first-detection information.

3.2.3 Cumulative Detection Probability

For both approaches, cumulative detection probabilities for sampling $n$ cassettes were calculated as:

\[P(\text{detect} \leq n) = 1 - (1-q)^n\]

This formula represents the probability of detecting at least one positive cassette when examining $n$ cassettes, assuming each cassette has independent detection probability $q$.

3.2.4 Confidence Intervals and Bootstrap Methodology

Bootstrap resampling with 10,000 iterations was performed to estimate 95% confidence intervals for detection probabilities. For the geometric MLE approach, cases were resampled with replacement, and $q$ was recalculated for each bootstrap sample. For the empirical proportion approach, bootstrap samples were used to recalculate the positivity rate and subsequent cumulative detection probabilities. The 2.5th and 97.5th percentiles of the bootstrap distribution defined the confidence intervals using the percentile method. A fixed random seed (42) was used for reproducibility.

3.2.5 Heterogeneity Assessment

Heterogeneity across cases was assessed using the coefficient of variation (CV) of per-case positivity rates (number of positive blocks / total blocks examined). CV < 0.30 was considered low heterogeneity, 0.30-0.60 moderate, and >0.60 high heterogeneity.

3.2.6 Stratified Analysis

Cases were stratified by tumor burden, defined as the number of blocks containing tumor: low burden (1 block), medium burden (2-3 blocks), and high burden (4+ blocks). Detection characteristics were compared across strata to assess heterogeneity in detection probability.

3.2.7 Model Assumptions

The geometric probability model assumes: (1) independence - each cassette represents independent sampling from the omentum; (2) constant probability - all cassettes have equal detection probability $q$; and (3) binary outcome - each cassette either contains detectable tumor or does not. These assumptions are reasonable given our sampling protocol, where cassettes were obtained from different omental regions. Critically, while the underlying disease distribution within the omentum may not be uniform, the sampling process itself is effectively independent: because the omentum appears grossly normal, the pathologist has no visual cue to preferentially sample specific regions. Each cassette is therefore a blind draw from a macroscopically homogeneous specimen. The independence assumption applies to the sampling process, not to the underlying disease distribution. We assessed the extent of omental involvement by quantifying the proportion of cases with tumor in multiple blocks. As a sensitivity analysis, we also fitted a beta-binomial model using the method of moments to account for inter-case heterogeneity in detection probability.

3.2.8 Statistical Software

All statistical analyses were performed using R software (version 4.6.0). Custom functions were developed for pathsampling analysis, bootstrap simulation, and heterogeneity assessment.

3.3 Ethical Approval

This study was approved by the institutional ethics committee (Hacettepe University, SBA 24/523). Due to the retrospective nature of the study, the requirement for informed consent was waived.

# Materials and Methods ```{r setup-functions} #| include: false # Source manuscript functions source("./R/manuscript_functions.R") source("./R/data_for_manuscript.R") ``` ## Study Cohort Our study included `r cohort_stats$n_total` patients who underwent omental sampling for a gynecologic malignancy at our department between 2014 and 2023; of these, `r cohort_stats$n_total - grade_stats$n_borderline` had a carcinoma and `r grade_stats$n_borderline` had a serous borderline tumor. All cases were evaluated for age, primary tumor location, histologic subtype, chemotherapy status, presence of macroscopic omental disease, presence of a microscopically detected (occult) omental focus (defined as <0.5 cm; see Introduction), the number of blocks sampled, and the block number in which the focus was first detected. For every case the omentum submitted for histologic review had been described as grossly unremarkable, without a discrete mass or nodule; in the cases with an occult focus the archived gross descriptions were re-checked to confirm that no macroscopic lesion had been recorded. In addition to histologic subtype, tumors were classified by grade as low-grade (grade 1 and 2 endometrioid type), serous borderline tumor, or high-grade. Because "high-grade" encompasses several histologic types with potentially differing metastatic propensity, the actual histologic diagnoses are provided disaggregated by primary site in the results; low-grade serous carcinoma of the ovary was analyzed within the high-grade category for grade-based comparisons and identified separately where relevant. In our study, omental sampling in all cases was performed using a combination of inspection and palpation. Survival data were excluded from this study. For patients with an occult focus (<0.5 cm), the archived slides were retrieved and re-reviewed, and the greatest dimension of the focus was measured on the glass slide. Our laboratory records the cassette-by-cassette origin of the omental sections and, when an occult focus is identified, the specific block in which it is first seen (the first-positive block, or "tracking data"); this documentation practice was in place throughout the 2014–2023 study period. ## Statistical Analysis ### Descriptive Statistics Descriptive statistics were presented as mean ± standard deviation for continuous variables and as frequencies and percentages for categorical variables. Comparison of first detection cassette between endometrial and ovarian primary tumors was performed using the Mann-Whitney U test, as the data were right-skewed with small subgroup sizes. The rank-biserial correlation was calculated as the effect size measure. A p-value <0.05 was considered statistically significant. Differences across tumor burden strata were assessed using the Kruskal-Wallis test. ### Detection Probability Estimation The per-cassette detection probability ($q$) was estimated using two complementary approaches to ensure robustness of our findings: **1. Geometric Maximum Likelihood Estimation (MLE):** The primary analysis estimated $q$ as the reciprocal of the mean first detection cassette number ($q = 1/\bar{k}$), where $\bar{k}$ is the mean cassette number at which tumor was first identified. This approach models the distribution of first-detection times under a geometric probability framework, where the probability of first detecting tumor in cassette $k$ is: $$P(k) = (1-q)^{k-1} \times q$$ **2. Empirical Proportion:** As a validation analysis, we calculated the overall proportion of cassettes containing tumor across all examined cassettes (positive cassettes / total cassettes examined). This empirical proportion provides an estimate of $q$ independent of first-detection modeling and uses all available cassette data rather than only first-detection information. ### Cumulative Detection Probability For both approaches, cumulative detection probabilities for sampling $n$ cassettes were calculated as: $$P(\text{detect} \leq n) = 1 - (1-q)^n$$ This formula represents the probability of detecting at least one positive cassette when examining $n$ cassettes, assuming each cassette has independent detection probability $q$. ### Confidence Intervals and Bootstrap Methodology Bootstrap resampling with 10,000 iterations was performed to estimate 95% confidence intervals for detection probabilities. For the geometric MLE approach, cases were resampled with replacement, and $q$ was recalculated for each bootstrap sample. For the empirical proportion approach, bootstrap samples were used to recalculate the positivity rate and subsequent cumulative detection probabilities. The 2.5th and 97.5th percentiles of the bootstrap distribution defined the confidence intervals using the percentile method. A fixed random seed (42) was used for reproducibility. ### Heterogeneity Assessment Heterogeneity across cases was assessed using the coefficient of variation (CV) of per-case positivity rates (number of positive blocks / total blocks examined). CV < 0.30 was considered low heterogeneity, 0.30-0.60 moderate, and >0.60 high heterogeneity. ### Stratified Analysis Cases were stratified by tumor burden, defined as the number of blocks containing tumor: low burden (1 block), medium burden (2-3 blocks), and high burden (4+ blocks). Detection characteristics were compared across strata to assess heterogeneity in detection probability. ### Model Assumptions The geometric probability model assumes: (1) independence - each cassette represents independent sampling from the omentum; (2) constant probability - all cassettes have equal detection probability $q$; and (3) binary outcome - each cassette either contains detectable tumor or does not. These assumptions are reasonable given our sampling protocol, where cassettes were obtained from different omental regions. Critically, while the underlying disease distribution within the omentum may not be uniform, the sampling process itself is effectively independent: because the omentum appears grossly normal, the pathologist has no visual cue to preferentially sample specific regions. Each cassette is therefore a blind draw from a macroscopically homogeneous specimen. The independence assumption applies to the sampling process, not to the underlying disease distribution. We assessed the extent of omental involvement by quantifying the proportion of cases with tumor in multiple blocks. As a sensitivity analysis, we also fitted a beta-binomial model using the method of moments to account for inter-case heterogeneity in detection probability. ### Statistical Software All statistical analyses were performed using R software (version `r getRversion()`). Custom functions were developed for pathsampling analysis, bootstrap simulation, and heterogeneity assessment. ## Ethical Approval This study was approved by the institutional ethics committee (Hacettepe University, SBA 24/523). Due to the retrospective nature of the study, the requirement for informed consent was waived.