● AI好医生 · Clinical Research Lab

Labs

Labs is the clinical-research lab of AI好医生. A set of "fill in, get results" statistical tools for clinicians — used on the spot. All computation runs locally in your browser; your data is never uploaded or stored.

● Labs flagship

ResFlow · end-to-end clinical research →

From question to paper, one statistical thread. The Project workbench lets one dataset run through modelling, internal/external validation, nomogram / survival / calibration, to a paper-ready report. Free, mobile-friendly, data stays on your device.

Research tools

Ready-to-use online calculators · computed locally, never uploaded

Paired χ² — McNemar / Cochran's Q

Compare paired binary outcomes: two paired conditions (same subjects, two tests or before/after) use McNemar (with continuity correction and exact binomial); ≥3 paired conditions use Cochran's Q. The ordinary chi-square assumes independence and is wrong for paired data.

Effect Size (d / η² / converter)

Two group means → Cohen's d and Hedges's g (with 95% CI); ANOVA F → η²/partial η²; plus d↔r↔OR conversion. P-values say whether a difference exists, effect sizes say how large — required for papers and meta-analysis.

Equivalence / Non-inferiority (TOST)

Prove two groups are practically equivalent or not inferior: pre-set an equivalence margin ±Δ, then run two one-sided tests (TOST). Gives the TOST p-value, (1−2α) confidence interval and an equivalence/non-inferiority verdict. P>0.05 ≠ equal.

Measurement Error — SEM / MDC

Standard error of measurement (SEM) and minimal detectable change (MDC): how large an individual change must be to exceed measurement error and count as real. From SD+reliability (ICC) or test–retest paired data. Common for scales/imaging measurements and follow-up.

Cluster-Randomized Sample Size (Design Effect)

For cluster randomization (randomizing by ward/community/hospital) inflate the sample size by the design effect DE=1+(m−1)ICC. From the individually-randomized sample size (computable here for two proportions or two means) derive total subjects and number of clusters needed.

Incidence Rate & Rate Ratio (IRR)

Outcomes per person-time (person-years/months): one group gives incidence rate with an exact Poisson CI; two groups give the incidence rate ratio IRR (Wald log CI + exact conditional CI). Suits cohorts, adverse-event rates, recurrence rates.

Ready⚗️ Frontier

Time-dependent ROC / AUC(t)

Discrimination of a prognostic marker/model across follow-up time: time-dependent AUC(t) using cumulative-case/dynamic-control definitions plus IPCW (inverse probability of censoring weighting) to handle censoring; gives AUC at each time point and the curve.

Ready⚗️ Frontier

Diagnostic Test Meta-analysis (SROC)

Pool multiple diagnostic-accuracy studies: random-effects pooled sensitivity/specificity, diagnostic odds ratio DOR, and a Moses–Littenberg SROC curve showing threshold effects. Enter TP/FP/FN/TN per study.

Statistical Charts (incl. 3D)

Paste data and get a figure: scatter/line/bar/histogram/box plots, plus drag-to-rotate 3D scatter and 3D surface. Supports grouped coloring, titles/axis labels, gridlines; export SVG vectors and 300/600 dpi PNGs. General-purpose plotting ready for paper figures.

Ready⚗️ Frontier

Prediction-Model Sample Size (Riley)

Minimum sample size to develop a clinical prediction model (binary): Riley 2019 three criteria (shrinkage S=0.9, Nagelkerke R² difference ≤0.05, outcome-risk precision), taking the maximum; gives required n, events and events-per-parameter (EPP). Replaces the rule-of-thumb EPV=10.

Ready⚗️ Frontier

Win Ratio

Stratified composite-endpoint win ratio (Pocock 2012): compare time-to-event endpoints by priority, then a secondary continuous measure; pairwise treatment×control comparisons count wins/losses/ties, giving Win Ratio, Win Odds, Net Benefit and bootstrap 95% CI. Common in heart-failure/cardiovascular trials.

Ready⚗️ Frontier

Bayesian Proportion Inference

Beta–binomial conjugate: one group gives posterior mean/median/95% credible interval and P(p>threshold); two groups give P(group1>group2) plus posterior risk difference/risk ratio/OR. Supports Jeffreys/uniform priors — more intuitive than p-values, suited to early/adaptive trials.

E-value — Unmeasured Confounding

Sensitivity analysis for observational studies: convert RR/OR/HR to the risk-ratio scale and compute how strong an unmeasured confounder (RR) would have to be to explain away the association. Gives an E-value for the point estimate and for the CI limit. Based on VanderWeele & Ding 2017.

Gwet's AC1 Agreement

Agreement between two raters on nominal categories. AC1 avoids the kappa paradox (high agreement but low κ) when categories are very imbalanced, and is more robust than Cohen's κ. Reports AC1 and κ side by side with bootstrap 95% CI.

Multiple-testing Correction (FDR / Holm)

From a set of p-values, get Bonferroni, Holm (FWER control), Benjamini–Hochberg (FDR) and Benjamini–Yekutieli (FDR under dependence) adjusted q-values and the number significant at each α. Essential for multi-marker/omics/multi-subgroup analyses.

Lin's CCC (Concordance Correlation)

Agreement between two measurement methods: CCC combines precision (Pearson r) and accuracy (bias-correction factor Cb), catching systematic bias that r alone misses — complementary to Bland–Altman. Gives CCC, r, Cb with 95% CI.

Restricted Cubic Spline (RCS)

Model the non-linear relationship between a continuous variable and an outcome with restricted cubic splines: a smooth OR/HR curve (with 95% band) and a non-linearity test, for Logistic and Cox, with covariate adjustment, plus one-click R (rms) code to reproduce.

Fine–Gray Subdistribution Hazard

Under competing risks, model covariate effects on the cumulative incidence (CIF) of the target event, giving subdistribution hazard ratios sHR. Putting group as a covariate gives a competing-risks between-group comparison (the regression analogue of Gray's test). Verified to reduce to Cox when there is no competing event.

Propensity Score (PSM / IPTW)

Control confounding in observational studies: logistic regression for the propensity score, then 1:1 caliper matching or inverse-probability weighting to balance covariates. Reports covariate balance (standardized mean difference, SMD) and a Love plot before/after, with optional outcome effect.

Advanced Sample Size (Survival/Diagnostic/Non-inferiority)

Sample size for specialized designs: survival studies (log-rank/Cox, Schoenfeld events needed), diagnostic tests (by target sensitivity/specificity precision), and non-inferiority/equivalence trials (means or rates). For plain means/rates/correlation see the Sample Size calculator.

Competing-risks CIF

When competing events exist (e.g. target endpoint vs death from other causes), 1−KM overestimates the target event's incidence. CIF correctly removes the competing-event influence, giving each group's target-event cumulative incidence curve and estimates at time τ.

RMST (Restricted Mean Survival Time)

RMST is the area under the KM curve over [0,τ], i.e. mean survival time within τ. When proportional hazards fails and HR is hard to interpret, the RMST difference is a more robust, intuitive between-group measure; includes KM curves and a difference test.

NRI / IDI Reclassification

Incremental value of a new vs old prediction model: net reclassification improvement (NRI, continuous and categorical) and integrated discrimination improvement (IDI). When adding a new marker barely raises AUC, these quantify the reclassification and discrimination gain.

Fleiss' Kappa (multi-rater)

Agreement among 3+ raters making nominal classifications of multiple subjects. Cohen's κ is for two raters only; for more, use Fleiss' κ. Gives κ, strength of agreement, per-category proportions and expected agreement.

Cochran–Armitage Trend Test

Test whether a binary outcome rate shows a linear trend across ordered groups (e.g. increasing dose/grade). More targeted than chi-square — specifically detects a monotone dose-response — with a grouped-rate trend bar chart.

Normality / Homogeneity of Variance

Assumption checks before t-test or ANOVA: Shapiro–Wilk normality (with Q–Q plot) plus Levene and Bartlett variance-homogeneity tests, to decide parametric vs non-parametric methods. Shapiro–Wilk uses the Royston (1992) algorithm.

Model Calibration (Hosmer–Lemeshow)

Assess prediction-model calibration: Hosmer–Lemeshow goodness-of-fit, calibration plot, calibration slope/intercept (CITL) and Brier score. Discrimination (AUC) is about ranking; calibration is about whether probabilities are accurate — complementary to ROC/DCA.

Meta Funnel Plot + Egger

Assess publication bias in a meta-analysis: draw a funnel plot (effect size vs standard error) and run Egger's regression test for funnel asymmetry; an intercept significantly off zero suggests possible publication bias/small-study effects. Use alongside the Meta-analysis tool.

Fagan Nomogram (post-test probability)

Bedside evidence-based reasoning: pre-test probability × likelihood ratio → post-test probability, from sensitivity/specificity or a directly entered LR, with the classic Fagan nomogram. Shows how much a test can change diagnostic certainty.

Statistical Method Advisor

"Which statistical method should I use?" Answer a few questions and the advisor recommends a method and links straight to the matching online tool. Covers group comparison, correlation/regression, diagnostic tests, survival, agreement/reliability and more.

Method Comparison (Passing–Bablok / Deming)

Compare whether two measurement methods agree: Passing–Bablok (robust, no error-distribution assumption) and Deming (error in both axes) regression, with slope/intercept and 95% CI plus a scatter plot. Slope CI containing 1 = no proportional bias; intercept CI containing 0 = no constant bias.

DeLong Test (compare two AUCs)

On the same subjects, test whether two markers/models differ in ROC area under the curve. DeLong handles the paired correlation, giving each AUC, 95% CI, the AUC difference and a p-value.

Cronbach's α (scale reliability)

Internal consistency of a scale/questionnaire: enter a subjects×items score matrix to get Cronbach's α with interpretation, plus α-if-item-deleted to find items dragging reliability down. A standard metric in scale development and validation.

ANOVA Post-hoc (Tukey/Bonferroni/Holm)

After ANOVA shows a difference, pinpoint which groups differ: one-way ANOVA first, then Tukey–Kramer HSD (with 95% simultaneous CIs), Bonferroni and Holm pairwise comparison p-values.

Decision Curve Analysis (DCA)

Assess the clinical usefulness of a prediction model: net benefit across threshold probabilities, compared with the treat-all and treat-none reference lines, comparing several models at once, with an exportable curve. A standard for prediction-model papers.

Cox Proportional-Hazards Regression

The workhorse of multivariable survival analysis: assess multiple covariates' effects on survival time, outputting each covariate's hazard ratio HR, 95% CI, Wald p-value and the model likelihood-ratio test. Ties handled by Breslow approximation; pairs with Kaplan–Meier.

Table 1 (Baseline Characteristics)

Paste raw data with a header and pick a grouping column to generate a clinical-paper "baseline characteristics" table in one click: continuous variables as mean±SD (t/ANOVA) or median (IQR) (non-parametric), categorical as n(%) (chi-square/Fisher), with between-group p-values auto-computed.

RR / OR / NNT Calculator

Enter a 2×2 table to compute relative risk RR, odds ratio OR, absolute risk difference ARD, relative risk reduction RRR and number needed to treat NNT, with 95% CIs for ratios and the risk difference. Common for interpreting cohort/case-control/RCT outcomes.

Non-parametric Tests (Mann–Whitney/Kruskal–Wallis)

Compare groups when data are non-normal or ordinal: two independent groups → Mann–Whitney U, paired → Wilcoxon signed-rank, ≥3 groups → Kruskal–Wallis H, with tie correction and medians.

Kappa Agreement

Inter-rater classification agreement: Cohen's Kappa (nominal) plus linear/quadratic weighted Kappa for ordinal grades, with a confusion matrix and agreement interpretation. Common for imaging-grade agreement.

Correlation (Pearson/Spearman)

Correlation between two continuous variables: Pearson product-moment (with Fisher-z 95% CI and p-value) and Spearman rank correlation, for normal/non-normal data.

Chi-square / Fisher's Exact

Contingency-table analysis of categorical variables: Pearson chi-square for any R×C table; for 2×2 tables, also Yates correction and Fisher's exact test, with a warning when expected counts are too small.

t-test / ANOVA

Compare means of two or more groups of a continuous variable: two groups give Student / Welch / paired t-tests, ≥3 groups automatically run one-way ANOVA, with the statistic and p-value.

Nomogram

From a logistic regression, automatically generate a clinical-prediction-model nomogram, turning the regression equation into a visual "read points → total → probability" chart. No R required.

Binary Logistic Regression

Analyze multiple factors' effects on a binary outcome, outputting each factor's OR, 95% CI, significance and model fit indices. A core method for clinical risk-factor analysis and prediction models.

Meta-analysis & Forest Plot

Pool OR or MD across studies with fixed/random-effects models, compute I² and Q heterogeneity and draw a forest plot. Common in systematic reviews.

Bland–Altman Agreement

Assess agreement between two measurement methods: draw a Bland–Altman plot and compute the mean bias and 95% limits of agreement (LoA).

Kaplan–Meier Survival

Enter follow-up time and outcome events to draw survival curves and compute median survival; with groups, a log-rank test runs automatically. Common in oncology prognosis research.

Fetal Biometry Percentile

Enter gestational age and a fetal measurement (BPD/HC/AC/FL/transverse cerebellar diameter) to estimate its percentile, helping judge whether growth deviates from the normal range.

Diagnostic 2×2 Table

Enter true-positive/false-positive/false-negative/true-negative to compute sensitivity, specificity, predictive values, likelihood ratios and the diagnostic odds ratio, with 95% CIs for the proportions.

Regression Fitting & Plot

Enter predictor and response, automatically fit linear / quadratic / cubic models and compare them, outputting the regression equation and a fitted plot.

Sample Size Calculator

Preset scenarios by study type (reference-interval study / diagnostic test / correlation study); enter the parameters to get the required sample size.

Reference Interval

Enter a set of measurements to compute mean, SD, normality test and the P2.5–P97.5 reference interval, producing a reference-interval table ready to drop into a paper.

ROC Curve & Diagnostic Performance

Enter a gold standard and measured values to compute AUC (with 95% CI), draw the ROC curve, and give the optimal cutoff by Youden index with its sensitivity/specificity. A core tool for diagnostic imaging research.

ICC (Intraclass Correlation)

Intraclass correlation coefficient (ICC 2,1 / 2,k), with 95% confidence interval and an agreement grade. Essential for observer-agreement analysis in imaging research.

About Labs

Labs is the clinical-research lab of AI好医生 for practising clinicians: a set of online tools drawn from real research needs, emphasizing scenario fit, ease of use and results you can drop straight into a paper. Tools are added continuously — 55 available so far.

查看中文版（科研Labs）→