
Quick group-aware zero-inflation check (Negative Binomial baseline via edgeR)
Source:R/check_zeroinflation.R
check_zeroinflation.RdComputes a sample group-aware Zero-Inflation (ZI) index for each gene using a negative-binomial (NB) baseline fitted with edgeR. For each group (e.g., drug condition), the function:
estimates gene-wise tagwise dispersions with edgeR (using all selected groups),
builds NB-expected zero probabilities from TMMwsp-scaled means, and
returns per-gene ZI (observed zeros minus NB-expected zeros) and per-group summaries (e.g., % genes with ZI > 0.05). ZI-cutoffs are user-defined.
This is intended as a fast screening diagnostic to decide whether standard NB GLM methods (edgeR/DESeq2) are adequate or whether a zero-aware workflow (e.g., ZINB-WaVE) might be warranted.
This function relies on edgeR to estimate dispersion. The current implementation requires ≥2 groups in the design so that edgeR can stabilize gene-wise dispersions across groups. If you only have a single group and still want a design-aware baseline for expected zeros, fit a Gamma–Poisson/NB GLM and compute the expected zero probabilities from its fitted means and over-dispersion.
Usage
check_zeroinflation(
data = NULL,
group_by = NULL,
samples = NULL,
batch = 1,
cutoffs = c(0.1, 0.2)
)Arguments
- data
Seurat object.
- group_by
Character, column in
data@meta.datathat defines groups (default:"combined_id").- samples
Character vector of group labels/patterns to include. If
NULLor if none match, all groups ingroup_byare used.- batch
Optional batch indicator; if length 1, an intercept-free design is used with group dummies.
- cutoffs
Numeric vector of user-supply ZI thresholds for summary statistics
Value
A list with:
gene_metrics_by_group: long data frame (group × gene) withp0_obs,p0_nb,ZI, and counts.summary_by_group: one row per group with medians and % ZI thresholds, plus observed/expected zero counts for the group.
Note
This is a screening tool; it is not a replacement for fitting a full GLM with your actual design. If strong covariates exist, a GLM baseline (e.g.,
glmGamPoi::glm_gp) will yield more faithful expected-zero rates.For single-group experiments, consider either adding a reference group or switching to a GLM-based baseline that does not require multiple groups.
Examples
data(mini_mac)
check_zeroinflation(mini_mac, group_by = "combined_id",
samples = c("DMSO_0","Staurosporine_10"))
#> $gene_metrics_by_group
#> group gene mean_count_group dispersion p0_obs
#> NAMPT DMSO_0 NAMPT 8.31578947 7.684511e-03 0.0000000
#> ENSG00000278869 DMSO_0 ENSG00000278869 0.00000000 9.765625e-05 1.0000000
#> CABP7-DT DMSO_0 CABP7-DT 0.00000000 9.765625e-05 1.0000000
#> NBEAP4 DMSO_0 NBEAP4 0.00000000 9.765625e-05 1.0000000
#> FMO2 DMSO_0 FMO2 0.00000000 9.765625e-05 1.0000000
#> NDUFA4P2 DMSO_0 NDUFA4P2 0.00000000 9.765625e-05 1.0000000
#> DPY19L4P2 DMSO_0 DPY19L4P2 0.00000000 9.765625e-05 1.0000000
#> ENSG00000286114 DMSO_0 ENSG00000286114 0.00000000 9.765625e-05 1.0000000
#> ENSG00000265935 DMSO_0 ENSG00000265935 0.00000000 9.765625e-05 1.0000000
#> Y-RNA DMSO_0 Y-RNA 0.00000000 9.765625e-05 1.0000000
#> FAM201B DMSO_0 FAM201B 0.00000000 9.765625e-05 1.0000000
#> ENSG00000243018 DMSO_0 ENSG00000243018 0.00000000 9.765625e-05 1.0000000
#> TRBC1 DMSO_0 TRBC1 0.05263158 9.765625e-05 0.9473684
#> FAM20C DMSO_0 FAM20C 5.10526316 6.737674e-03 0.0000000
#> ENSG00000251536 DMSO_0 ENSG00000251536 0.00000000 9.765625e-05 1.0000000
#> CLDN18 DMSO_0 CLDN18 0.00000000 9.765625e-05 1.0000000
#> ENSG00000259688 DMSO_0 ENSG00000259688 0.00000000 9.765625e-05 1.0000000
#> RBM7P1 DMSO_0 RBM7P1 0.00000000 9.765625e-05 1.0000000
#> UBE2HP1 DMSO_0 UBE2HP1 0.00000000 9.765625e-05 1.0000000
#> ENSG00000258419 DMSO_0 ENSG00000258419 0.00000000 9.765625e-05 1.0000000
#> ENSG00000231698 DMSO_0 ENSG00000231698 0.00000000 9.765625e-05 1.0000000
#> NF1P5 DMSO_0 NF1P5 0.00000000 9.765625e-05 1.0000000
#> PTPN20 DMSO_0 PTPN20 0.00000000 9.765625e-05 1.0000000
#> ENSG00000280048 DMSO_0 ENSG00000280048 0.00000000 9.765625e-05 1.0000000
#> AHCY DMSO_0 AHCY 5.10526316 7.578144e-03 0.0000000
#> CCDC127 DMSO_0 CCDC127 7.73684211 7.721302e-03 0.0000000
#> FSHR DMSO_0 FSHR 0.00000000 9.765625e-05 1.0000000
#> TRIM64C DMSO_0 TRIM64C 0.00000000 9.765625e-05 1.0000000
#> ENSG00000285971 DMSO_0 ENSG00000285971 0.00000000 9.765625e-05 1.0000000
#> ENSG00000217239 DMSO_0 ENSG00000217239 0.68421053 9.765625e-05 0.4210526
#> ENSG00000236366 DMSO_0 ENSG00000236366 0.00000000 9.765625e-05 1.0000000
#> ENSG00000286853 DMSO_0 ENSG00000286853 0.00000000 9.765625e-05 1.0000000
#> RN7SL270P DMSO_0 RN7SL270P 0.00000000 9.765625e-05 1.0000000
#> CYCSP41 DMSO_0 CYCSP41 0.00000000 9.765625e-05 1.0000000
#> MIR150 DMSO_0 MIR150 0.00000000 9.765625e-05 1.0000000
#> ENSG00000289359 DMSO_0 ENSG00000289359 0.00000000 9.765625e-05 1.0000000
#> Metazoa-SRP DMSO_0 Metazoa-SRP 0.00000000 9.765625e-05 1.0000000
#> ENSG00000284620 DMSO_0 ENSG00000284620 0.00000000 9.765625e-05 1.0000000
#> Y-RNA.1 DMSO_0 Y-RNA.1 0.00000000 9.765625e-05 1.0000000
#> CNN2P9 DMSO_0 CNN2P9 0.00000000 9.765625e-05 1.0000000
#> ENSG00000273375 DMSO_0 ENSG00000273375 0.05263158 9.765625e-05 0.9473684
#> ENSG00000287871 DMSO_0 ENSG00000287871 0.00000000 9.765625e-05 1.0000000
#> LINC02862 DMSO_0 LINC02862 0.00000000 9.765625e-05 1.0000000
#> MIR556 DMSO_0 MIR556 0.00000000 9.765625e-05 1.0000000
#> ENSG00000235609 DMSO_0 ENSG00000235609 0.73684211 9.765625e-05 0.6315789
#> MBD2 DMSO_0 MBD2 6.73684211 7.776025e-03 0.0000000
#> HIGD1AP6 DMSO_0 HIGD1AP6 0.00000000 9.765625e-05 1.0000000
#> ENSG00000276958 DMSO_0 ENSG00000276958 0.00000000 9.765625e-05 1.0000000
#> ENSG00000275295 DMSO_0 ENSG00000275295 0.00000000 9.765625e-05 1.0000000
#> ENSG00000285454 DMSO_0 ENSG00000285454 0.00000000 9.765625e-05 1.0000000
#> C10orf71-AS1 DMSO_0 C10orf71-AS1 0.00000000 9.765625e-05 1.0000000
#> ENSG00000256001 DMSO_0 ENSG00000256001 0.00000000 9.765625e-05 1.0000000
#> ENSG00000279294 DMSO_0 ENSG00000279294 0.00000000 9.765625e-05 1.0000000
#> IFNWP5 DMSO_0 IFNWP5 0.00000000 9.765625e-05 1.0000000
#> MAN1C1 DMSO_0 MAN1C1 0.05263158 9.765625e-05 0.9473684
#> RN7SL211P DMSO_0 RN7SL211P 0.00000000 9.765625e-05 1.0000000
#> GNRHR2P1 DMSO_0 GNRHR2P1 0.00000000 9.765625e-05 1.0000000
#> ENSG00000273904 DMSO_0 ENSG00000273904 0.00000000 9.765625e-05 1.0000000
#> ENSG00000241593 DMSO_0 ENSG00000241593 0.00000000 9.765625e-05 1.0000000
#> WDR31 DMSO_0 WDR31 0.42105263 9.765625e-05 0.6842105
#> DRD5 DMSO_0 DRD5 0.00000000 9.765625e-05 1.0000000
#> ENSG00000256569 DMSO_0 ENSG00000256569 0.00000000 9.765625e-05 1.0000000
#> EPHX1 DMSO_0 EPHX1 6.21052632 7.764337e-03 0.0000000
#> ACTN1 DMSO_0 ACTN1 3.47368421 5.144421e-03 0.0000000
#> MIR5188 DMSO_0 MIR5188 0.00000000 9.765625e-05 1.0000000
#> RNU6-118P DMSO_0 RNU6-118P 0.00000000 9.765625e-05 1.0000000
#> ENSG00000271758 DMSO_0 ENSG00000271758 0.00000000 9.765625e-05 1.0000000
#> ZNF84-DT DMSO_0 ZNF84-DT 0.00000000 9.765625e-05 1.0000000
#> ENSG00000248733 DMSO_0 ENSG00000248733 0.00000000 9.765625e-05 1.0000000
#> ACTL7A DMSO_0 ACTL7A 0.00000000 9.765625e-05 1.0000000
#> GID4 DMSO_0 GID4 3.10526316 2.384059e-03 0.0000000
#> Y-RNA.2 DMSO_0 Y-RNA.2 0.00000000 9.765625e-05 1.0000000
#> MIR200C DMSO_0 MIR200C 0.00000000 9.765625e-05 1.0000000
#> ENSG00000224644 DMSO_0 ENSG00000224644 0.00000000 9.765625e-05 1.0000000
#> CSTA DMSO_0 CSTA 4.47368421 7.184685e-03 0.0000000
#> MIR664A DMSO_0 MIR664A 0.00000000 9.765625e-05 1.0000000
#> MIR4802 DMSO_0 MIR4802 0.00000000 9.765625e-05 1.0000000
#> ENSG00000278655 DMSO_0 ENSG00000278655 0.00000000 9.765625e-05 1.0000000
#> ENSG00000280122 DMSO_0 ENSG00000280122 0.00000000 9.765625e-05 1.0000000
#> ENSG00000254180 DMSO_0 ENSG00000254180 0.00000000 9.765625e-05 1.0000000
#> RNU6-896P DMSO_0 RNU6-896P 0.00000000 9.765625e-05 1.0000000
#> ENSG00000286805 DMSO_0 ENSG00000286805 0.00000000 9.765625e-05 1.0000000
#> SHANK1 DMSO_0 SHANK1 0.00000000 9.765625e-05 1.0000000
#> ENSG00000291048 DMSO_0 ENSG00000291048 0.00000000 9.765625e-05 1.0000000
#> RN7SL268P DMSO_0 RN7SL268P 0.00000000 9.765625e-05 1.0000000
#> NLGN2 DMSO_0 NLGN2 0.52631579 9.765625e-05 0.5789474
#> DMC1 DMSO_0 DMC1 0.26315789 9.765625e-05 0.7368421
#> KCNAB1-AS1 DMSO_0 KCNAB1-AS1 0.00000000 9.765625e-05 1.0000000
#> ENSG00000276015 DMSO_0 ENSG00000276015 0.00000000 9.765625e-05 1.0000000
#> WWTR1-IT1 DMSO_0 WWTR1-IT1 0.00000000 9.765625e-05 1.0000000
#> ENSG00000260465 DMSO_0 ENSG00000260465 0.00000000 9.765625e-05 1.0000000
#> RPL5P30 DMSO_0 RPL5P30 0.10526316 9.765625e-05 0.8947368
#> ENSG00000270988 DMSO_0 ENSG00000270988 0.00000000 9.765625e-05 1.0000000
#> MIR545 DMSO_0 MIR545 0.00000000 9.765625e-05 1.0000000
#> ENSG00000257548 DMSO_0 ENSG00000257548 0.00000000 9.765625e-05 1.0000000
#> ENSG00000289950 DMSO_0 ENSG00000289950 0.00000000 9.765625e-05 1.0000000
#> ENSG00000262413 DMSO_0 ENSG00000262413 0.15789474 9.765625e-05 0.8421053
#> ENSG00000249890 DMSO_0 ENSG00000249890 0.00000000 9.765625e-05 1.0000000
#> RN7SL255P DMSO_0 RN7SL255P 0.00000000 9.765625e-05 1.0000000
#> TRIM53CP DMSO_0 TRIM53CP 0.00000000 9.765625e-05 1.0000000
#> RNA5SP107 DMSO_0 RNA5SP107 0.00000000 9.765625e-05 1.0000000
#> RNU6-845P DMSO_0 RNU6-845P 0.00000000 9.765625e-05 1.0000000
#> ENSG00000241114 DMSO_0 ENSG00000241114 0.00000000 9.765625e-05 1.0000000
#> SERBP1P2 DMSO_0 SERBP1P2 0.00000000 9.765625e-05 1.0000000
#> RPS10-NUDT3 DMSO_0 RPS10-NUDT3 0.05263158 9.765625e-05 0.9473684
#> CDY12P DMSO_0 CDY12P 0.00000000 9.765625e-05 1.0000000
#> MIR4644 DMSO_0 MIR4644 0.00000000 9.765625e-05 1.0000000
#> ENSG00000223343 DMSO_0 ENSG00000223343 0.00000000 9.765625e-05 1.0000000
#> MORF4L1P3 DMSO_0 MORF4L1P3 0.00000000 9.765625e-05 1.0000000
#> MRGPRX3 DMSO_0 MRGPRX3 0.89473684 9.765625e-05 0.3157895
#> CD160 DMSO_0 CD160 0.00000000 9.765625e-05 1.0000000
#> obs_zeros_num p0_nb expected_zeros_num ZI
#> NAMPT 0 0.0005640631 0.01071720 -0.0005640631
#> ENSG00000278869 19 1.0000000000 19.00000000 0.0000000000
#> CABP7-DT 19 1.0000000000 19.00000000 0.0000000000
#> NBEAP4 19 1.0000000000 19.00000000 0.0000000000
#> FMO2 19 1.0000000000 19.00000000 0.0000000000
#> NDUFA4P2 19 1.0000000000 19.00000000 0.0000000000
#> DPY19L4P2 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000286114 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000265935 19 1.0000000000 19.00000000 0.0000000000
#> Y-RNA 19 1.0000000000 19.00000000 0.0000000000
#> FAM201B 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000243018 19 1.0000000000 19.00000000 0.0000000000
#> TRBC1 18 0.9487604337 18.02644824 -0.0013920127
#> FAM20C 0 0.0085194504 0.16186956 -0.0085194504
#> ENSG00000251536 19 1.0000000000 19.00000000 0.0000000000
#> CLDN18 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000259688 19 1.0000000000 19.00000000 0.0000000000
#> RBM7P1 19 1.0000000000 19.00000000 0.0000000000
#> UBE2HP1 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000258419 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000231698 19 1.0000000000 19.00000000 0.0000000000
#> NF1P5 19 1.0000000000 19.00000000 0.0000000000
#> PTPN20 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000280048 19 1.0000000000 19.00000000 0.0000000000
#> AHCY 0 0.0085942501 0.16329075 -0.0085942501
#> CCDC127 0 0.0009125321 0.01733811 -0.0009125321
#> FSHR 19 1.0000000000 19.00000000 0.0000000000
#> TRIM64C 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000285971 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000217239 8 0.5072457703 9.63766964 -0.0861931387
#> ENSG00000236366 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000286853 19 1.0000000000 19.00000000 0.0000000000
#> RN7SL270P 19 1.0000000000 19.00000000 0.0000000000
#> CYCSP41 19 1.0000000000 19.00000000 0.0000000000
#> MIR150 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000289359 19 1.0000000000 19.00000000 0.0000000000
#> Metazoa-SRP 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000284620 19 1.0000000000 19.00000000 0.0000000000
#> Y-RNA.1 19 1.0000000000 19.00000000 0.0000000000
#> CNN2P9 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000273375 18 0.9487604337 18.02644824 -0.0013920127
#> ENSG00000287871 19 1.0000000000 19.00000000 0.0000000000
#> LINC02862 19 1.0000000000 19.00000000 0.0000000000
#> MIR556 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000235609 12 0.4816551325 9.15144752 0.1499238148
#> MBD2 0 0.0021160863 0.04020564 -0.0021160863
#> HIGD1AP6 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000276958 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000275295 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000285454 19 1.0000000000 19.00000000 0.0000000000
#> C10orf71-AS1 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000256001 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000279294 19 1.0000000000 19.00000000 0.0000000000
#> IFNWP5 19 1.0000000000 19.00000000 0.0000000000
#> MAN1C1 18 0.9487604337 18.02644824 -0.0013920127
#> RN7SL211P 19 1.0000000000 19.00000000 0.0000000000
#> GNRHR2P1 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000273904 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000241593 19 1.0000000000 19.00000000 0.0000000000
#> WDR31 13 0.6577186357 12.49665408 0.0264918906
#> DRD5 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000256569 19 1.0000000000 19.00000000 0.0000000000
#> EPHX1 0 0.0033115123 0.06291873 -0.0033115123
#> ACTN1 0 0.0362952319 0.68960941 -0.0362952319
#> MIR5188 19 1.0000000000 19.00000000 0.0000000000
#> RNU6-118P 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000271758 19 1.0000000000 19.00000000 0.0000000000
#> ZNF84-DT 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000248733 19 1.0000000000 19.00000000 0.0000000000
#> ACTL7A 19 1.0000000000 19.00000000 0.0000000000
#> GID4 0 0.0503084282 0.95586014 -0.0503084282
#> Y-RNA.2 19 1.0000000000 19.00000000 0.0000000000
#> MIR200C 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000224644 19 1.0000000000 19.00000000 0.0000000000
#> CSTA 0 0.0149415657 0.28388975 -0.0149415657
#> MIR664A 19 1.0000000000 19.00000000 0.0000000000
#> MIR4802 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000278655 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000280122 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000254180 19 1.0000000000 19.00000000 0.0000000000
#> RNU6-896P 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000286805 19 1.0000000000 19.00000000 0.0000000000
#> SHANK1 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000291048 19 1.0000000000 19.00000000 0.0000000000
#> RN7SL268P 19 1.0000000000 19.00000000 0.0000000000
#> NLGN2 11 0.5926918838 11.26114579 -0.0137445154
#> DMC1 14 0.7692454377 14.61566332 -0.0324033325
#> KCNAB1-AS1 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000276015 19 1.0000000000 19.00000000 0.0000000000
#> WWTR1-IT1 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000260465 19 1.0000000000 19.00000000 0.0000000000
#> RPL5P30 17 0.9002049947 17.10389490 -0.0054681526
#> ENSG00000270988 19 1.0000000000 19.00000000 0.0000000000
#> MIR545 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000257548 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000289950 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000262413 16 0.8541900003 16.22961001 -0.0120847371
#> ENSG00000249890 19 1.0000000000 19.00000000 0.0000000000
#> RN7SL255P 19 1.0000000000 19.00000000 0.0000000000
#> TRIM53CP 19 1.0000000000 19.00000000 0.0000000000
#> RNA5SP107 19 1.0000000000 19.00000000 0.0000000000
#> RNU6-845P 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000241114 19 1.0000000000 19.00000000 0.0000000000
#> SERBP1P2 19 1.0000000000 19.00000000 0.0000000000
#> RPS10-NUDT3 18 0.9487604337 18.02644824 -0.0013920127
#> CDY12P 19 1.0000000000 19.00000000 0.0000000000
#> MIR4644 19 1.0000000000 19.00000000 0.0000000000
#> ENSG00000223343 19 1.0000000000 19.00000000 0.0000000000
#> MORF4L1P3 19 1.0000000000 19.00000000 0.0000000000
#> MRGPRX3 6 0.4125265457 7.83800437 -0.0967370721
#> CD160 19 1.0000000000 19.00000000 0.0000000000
#> [ reached 'max' / getOption("max.print") -- omitted 889 rows ]
#>
#> $summary_by_group
#> group n_genes n_wells median_p0_obs median_p0_nb
#> DMSO_0 DMSO_0 500 19 1 1
#> Staurosporine_10 Staurosporine_10 500 3 1 1
#> median_ZI observed_zeros_num expected_zeros_num pct_ZI_gt_0.1
#> DMSO_0 0 7780 7789.191 0.004
#> Staurosporine_10 0 1332 1307.828 0.080
#> pct_ZI_gt_0.2
#> DMSO_0 0.000
#> Staurosporine_10 0.052
#>