
Quick group-aware zero-inflation check (Negative Binomial baseline via edgeR)
Source:R/check_zeroinflation.R
check_zeroinflation.RdComputes a sample group-aware Zero-Inflation (ZI) index for each gene using a negative-binomial (NB) baseline fitted with edgeR. For each group (e.g., drug condition), the function:
estimates gene-wise tagwise dispersions with edgeR (using all selected groups),
builds NB-expected zero probabilities from TMMwsp-scaled means, and
returns per-gene ZI (observed zeros minus NB-expected zeros) and per-group summaries (e.g., % genes with ZI > 0.05). ZI-cutoffs are user-defined.
This is intended as a fast screening diagnostic to decide whether standard NB GLM methods (edgeR/DESeq2) are adequate or whether a zero-aware workflow (e.g., ZINB-WaVE) might be warranted.
This function relies on edgeR to estimate dispersion. The current implementation requires ≥2 groups in the design so that edgeR can stabilize gene-wise dispersions across groups. If you only have a single group and still want a design-aware baseline for expected zeros, fit a Gamma–Poisson/NB GLM and compute the expected zero probabilities from its fitted means and over-dispersion.
Usage
check_zeroinflation(
data = NULL,
group_by = NULL,
samples = NULL,
batch = 1,
cutoffs = c(0.1, 0.2)
)Arguments
- data
Seurat object.
- group_by
Character, column in
data@meta.datathat defines groups (default:"combined_id").- samples
Character vector of group labels/patterns to include. If
NULLor if none match, all groups ingroup_byare used.- batch
Optional batch indicator; if length 1, an intercept-free design is used with group dummies.
- cutoffs
Numeric vector of user-supply ZI thresholds for summary statistics
Value
A list with:
gene_metrics_by_group: long data frame (group × gene) withp0_obs,p0_nb,ZI, and counts.summary_by_group: one row per group with medians and % ZI thresholds, plus observed/expected zero counts for the group.
Note
This is a screening tool; it is not a replacement for fitting a full GLM with your actual design. If strong covariates exist, a GLM baseline (e.g.,
glmGamPoi::glm_gp) will yield more faithful expected-zero rates.For single-group experiments, consider either adding a reference group or switching to a GLM-based baseline that does not require multiple groups.
Examples
data(mini_mac)
check_zeroinflation(mini_mac, group_by = "combined_id",
samples = c("DMSO_0","Staurosporine_10"))
#> $gene_metrics_by_group
#> group gene mean_count_group dispersion p0_obs
#> NAMPT DMSO_0 NAMPT 8.3158 0.0076845 0.000
#> ENSG00000278869 DMSO_0 ENSG00000278869 0.0000 0.0000977 1.000
#> CABP7-DT DMSO_0 CABP7-DT 0.0000 0.0000977 1.000
#> NBEAP4 DMSO_0 NBEAP4 0.0000 0.0000977 1.000
#> FMO2 DMSO_0 FMO2 0.0000 0.0000977 1.000
#> NDUFA4P2 DMSO_0 NDUFA4P2 0.0000 0.0000977 1.000
#> DPY19L4P2 DMSO_0 DPY19L4P2 0.0000 0.0000977 1.000
#> ENSG00000286114 DMSO_0 ENSG00000286114 0.0000 0.0000977 1.000
#> ENSG00000265935 DMSO_0 ENSG00000265935 0.0000 0.0000977 1.000
#> Y-RNA DMSO_0 Y-RNA 0.0000 0.0000977 1.000
#> FAM201B DMSO_0 FAM201B 0.0000 0.0000977 1.000
#> ENSG00000243018 DMSO_0 ENSG00000243018 0.0000 0.0000977 1.000
#> TRBC1 DMSO_0 TRBC1 0.0526 0.0000977 0.947
#> FAM20C DMSO_0 FAM20C 5.1053 0.0067377 0.000
#> ENSG00000251536 DMSO_0 ENSG00000251536 0.0000 0.0000977 1.000
#> CLDN18 DMSO_0 CLDN18 0.0000 0.0000977 1.000
#> ENSG00000259688 DMSO_0 ENSG00000259688 0.0000 0.0000977 1.000
#> RBM7P1 DMSO_0 RBM7P1 0.0000 0.0000977 1.000
#> UBE2HP1 DMSO_0 UBE2HP1 0.0000 0.0000977 1.000
#> ENSG00000258419 DMSO_0 ENSG00000258419 0.0000 0.0000977 1.000
#> ENSG00000231698 DMSO_0 ENSG00000231698 0.0000 0.0000977 1.000
#> NF1P5 DMSO_0 NF1P5 0.0000 0.0000977 1.000
#> PTPN20 DMSO_0 PTPN20 0.0000 0.0000977 1.000
#> ENSG00000280048 DMSO_0 ENSG00000280048 0.0000 0.0000977 1.000
#> AHCY DMSO_0 AHCY 5.1053 0.0075781 0.000
#> CCDC127 DMSO_0 CCDC127 7.7368 0.0077213 0.000
#> FSHR DMSO_0 FSHR 0.0000 0.0000977 1.000
#> TRIM64C DMSO_0 TRIM64C 0.0000 0.0000977 1.000
#> ENSG00000285971 DMSO_0 ENSG00000285971 0.0000 0.0000977 1.000
#> ENSG00000217239 DMSO_0 ENSG00000217239 0.6842 0.0000977 0.421
#> ENSG00000236366 DMSO_0 ENSG00000236366 0.0000 0.0000977 1.000
#> ENSG00000286853 DMSO_0 ENSG00000286853 0.0000 0.0000977 1.000
#> RN7SL270P DMSO_0 RN7SL270P 0.0000 0.0000977 1.000
#> CYCSP41 DMSO_0 CYCSP41 0.0000 0.0000977 1.000
#> MIR150 DMSO_0 MIR150 0.0000 0.0000977 1.000
#> ENSG00000289359 DMSO_0 ENSG00000289359 0.0000 0.0000977 1.000
#> Metazoa-SRP DMSO_0 Metazoa-SRP 0.0000 0.0000977 1.000
#> ENSG00000284620 DMSO_0 ENSG00000284620 0.0000 0.0000977 1.000
#> Y-RNA.1 DMSO_0 Y-RNA.1 0.0000 0.0000977 1.000
#> CNN2P9 DMSO_0 CNN2P9 0.0000 0.0000977 1.000
#> ENSG00000273375 DMSO_0 ENSG00000273375 0.0526 0.0000977 0.947
#> ENSG00000287871 DMSO_0 ENSG00000287871 0.0000 0.0000977 1.000
#> LINC02862 DMSO_0 LINC02862 0.0000 0.0000977 1.000
#> MIR556 DMSO_0 MIR556 0.0000 0.0000977 1.000
#> ENSG00000235609 DMSO_0 ENSG00000235609 0.7368 0.0000977 0.632
#> MBD2 DMSO_0 MBD2 6.7368 0.0077760 0.000
#> HIGD1AP6 DMSO_0 HIGD1AP6 0.0000 0.0000977 1.000
#> ENSG00000276958 DMSO_0 ENSG00000276958 0.0000 0.0000977 1.000
#> ENSG00000275295 DMSO_0 ENSG00000275295 0.0000 0.0000977 1.000
#> ENSG00000285454 DMSO_0 ENSG00000285454 0.0000 0.0000977 1.000
#> C10orf71-AS1 DMSO_0 C10orf71-AS1 0.0000 0.0000977 1.000
#> ENSG00000256001 DMSO_0 ENSG00000256001 0.0000 0.0000977 1.000
#> ENSG00000279294 DMSO_0 ENSG00000279294 0.0000 0.0000977 1.000
#> IFNWP5 DMSO_0 IFNWP5 0.0000 0.0000977 1.000
#> MAN1C1 DMSO_0 MAN1C1 0.0526 0.0000977 0.947
#> RN7SL211P DMSO_0 RN7SL211P 0.0000 0.0000977 1.000
#> GNRHR2P1 DMSO_0 GNRHR2P1 0.0000 0.0000977 1.000
#> ENSG00000273904 DMSO_0 ENSG00000273904 0.0000 0.0000977 1.000
#> ENSG00000241593 DMSO_0 ENSG00000241593 0.0000 0.0000977 1.000
#> WDR31 DMSO_0 WDR31 0.4211 0.0000977 0.684
#> DRD5 DMSO_0 DRD5 0.0000 0.0000977 1.000
#> ENSG00000256569 DMSO_0 ENSG00000256569 0.0000 0.0000977 1.000
#> EPHX1 DMSO_0 EPHX1 6.2105 0.0077643 0.000
#> ACTN1 DMSO_0 ACTN1 3.4737 0.0051444 0.000
#> MIR5188 DMSO_0 MIR5188 0.0000 0.0000977 1.000
#> RNU6-118P DMSO_0 RNU6-118P 0.0000 0.0000977 1.000
#> ENSG00000271758 DMSO_0 ENSG00000271758 0.0000 0.0000977 1.000
#> ZNF84-DT DMSO_0 ZNF84-DT 0.0000 0.0000977 1.000
#> ENSG00000248733 DMSO_0 ENSG00000248733 0.0000 0.0000977 1.000
#> ACTL7A DMSO_0 ACTL7A 0.0000 0.0000977 1.000
#> GID4 DMSO_0 GID4 3.1053 0.0023841 0.000
#> Y-RNA.2 DMSO_0 Y-RNA.2 0.0000 0.0000977 1.000
#> MIR200C DMSO_0 MIR200C 0.0000 0.0000977 1.000
#> ENSG00000224644 DMSO_0 ENSG00000224644 0.0000 0.0000977 1.000
#> CSTA DMSO_0 CSTA 4.4737 0.0071847 0.000
#> MIR664A DMSO_0 MIR664A 0.0000 0.0000977 1.000
#> MIR4802 DMSO_0 MIR4802 0.0000 0.0000977 1.000
#> ENSG00000278655 DMSO_0 ENSG00000278655 0.0000 0.0000977 1.000
#> ENSG00000280122 DMSO_0 ENSG00000280122 0.0000 0.0000977 1.000
#> ENSG00000254180 DMSO_0 ENSG00000254180 0.0000 0.0000977 1.000
#> RNU6-896P DMSO_0 RNU6-896P 0.0000 0.0000977 1.000
#> ENSG00000286805 DMSO_0 ENSG00000286805 0.0000 0.0000977 1.000
#> SHANK1 DMSO_0 SHANK1 0.0000 0.0000977 1.000
#> ENSG00000291048 DMSO_0 ENSG00000291048 0.0000 0.0000977 1.000
#> RN7SL268P DMSO_0 RN7SL268P 0.0000 0.0000977 1.000
#> NLGN2 DMSO_0 NLGN2 0.5263 0.0000977 0.579
#> DMC1 DMSO_0 DMC1 0.2632 0.0000977 0.737
#> KCNAB1-AS1 DMSO_0 KCNAB1-AS1 0.0000 0.0000977 1.000
#> ENSG00000276015 DMSO_0 ENSG00000276015 0.0000 0.0000977 1.000
#> WWTR1-IT1 DMSO_0 WWTR1-IT1 0.0000 0.0000977 1.000
#> ENSG00000260465 DMSO_0 ENSG00000260465 0.0000 0.0000977 1.000
#> RPL5P30 DMSO_0 RPL5P30 0.1053 0.0000977 0.895
#> ENSG00000270988 DMSO_0 ENSG00000270988 0.0000 0.0000977 1.000
#> MIR545 DMSO_0 MIR545 0.0000 0.0000977 1.000
#> ENSG00000257548 DMSO_0 ENSG00000257548 0.0000 0.0000977 1.000
#> ENSG00000289950 DMSO_0 ENSG00000289950 0.0000 0.0000977 1.000
#> ENSG00000262413 DMSO_0 ENSG00000262413 0.1579 0.0000977 0.842
#> ENSG00000249890 DMSO_0 ENSG00000249890 0.0000 0.0000977 1.000
#> RN7SL255P DMSO_0 RN7SL255P 0.0000 0.0000977 1.000
#> TRIM53CP DMSO_0 TRIM53CP 0.0000 0.0000977 1.000
#> RNA5SP107 DMSO_0 RNA5SP107 0.0000 0.0000977 1.000
#> RNU6-845P DMSO_0 RNU6-845P 0.0000 0.0000977 1.000
#> ENSG00000241114 DMSO_0 ENSG00000241114 0.0000 0.0000977 1.000
#> SERBP1P2 DMSO_0 SERBP1P2 0.0000 0.0000977 1.000
#> RPS10-NUDT3 DMSO_0 RPS10-NUDT3 0.0526 0.0000977 0.947
#> CDY12P DMSO_0 CDY12P 0.0000 0.0000977 1.000
#> MIR4644 DMSO_0 MIR4644 0.0000 0.0000977 1.000
#> ENSG00000223343 DMSO_0 ENSG00000223343 0.0000 0.0000977 1.000
#> MORF4L1P3 DMSO_0 MORF4L1P3 0.0000 0.0000977 1.000
#> MRGPRX3 DMSO_0 MRGPRX3 0.8947 0.0000977 0.316
#> CD160 DMSO_0 CD160 0.0000 0.0000977 1.000
#> obs_zeros_num p0_nb expected_zeros_num ZI
#> NAMPT 0 0.000564 0.0107 -0.000564
#> ENSG00000278869 19 1.000000 19.0000 0.000000
#> CABP7-DT 19 1.000000 19.0000 0.000000
#> NBEAP4 19 1.000000 19.0000 0.000000
#> FMO2 19 1.000000 19.0000 0.000000
#> NDUFA4P2 19 1.000000 19.0000 0.000000
#> DPY19L4P2 19 1.000000 19.0000 0.000000
#> ENSG00000286114 19 1.000000 19.0000 0.000000
#> ENSG00000265935 19 1.000000 19.0000 0.000000
#> Y-RNA 19 1.000000 19.0000 0.000000
#> FAM201B 19 1.000000 19.0000 0.000000
#> ENSG00000243018 19 1.000000 19.0000 0.000000
#> TRBC1 18 0.948760 18.0264 -0.001392
#> FAM20C 0 0.008519 0.1619 -0.008519
#> ENSG00000251536 19 1.000000 19.0000 0.000000
#> CLDN18 19 1.000000 19.0000 0.000000
#> ENSG00000259688 19 1.000000 19.0000 0.000000
#> RBM7P1 19 1.000000 19.0000 0.000000
#> UBE2HP1 19 1.000000 19.0000 0.000000
#> ENSG00000258419 19 1.000000 19.0000 0.000000
#> ENSG00000231698 19 1.000000 19.0000 0.000000
#> NF1P5 19 1.000000 19.0000 0.000000
#> PTPN20 19 1.000000 19.0000 0.000000
#> ENSG00000280048 19 1.000000 19.0000 0.000000
#> AHCY 0 0.008594 0.1633 -0.008594
#> CCDC127 0 0.000913 0.0173 -0.000913
#> FSHR 19 1.000000 19.0000 0.000000
#> TRIM64C 19 1.000000 19.0000 0.000000
#> ENSG00000285971 19 1.000000 19.0000 0.000000
#> ENSG00000217239 8 0.507246 9.6377 -0.086193
#> ENSG00000236366 19 1.000000 19.0000 0.000000
#> ENSG00000286853 19 1.000000 19.0000 0.000000
#> RN7SL270P 19 1.000000 19.0000 0.000000
#> CYCSP41 19 1.000000 19.0000 0.000000
#> MIR150 19 1.000000 19.0000 0.000000
#> ENSG00000289359 19 1.000000 19.0000 0.000000
#> Metazoa-SRP 19 1.000000 19.0000 0.000000
#> ENSG00000284620 19 1.000000 19.0000 0.000000
#> Y-RNA.1 19 1.000000 19.0000 0.000000
#> CNN2P9 19 1.000000 19.0000 0.000000
#> ENSG00000273375 18 0.948760 18.0264 -0.001392
#> ENSG00000287871 19 1.000000 19.0000 0.000000
#> LINC02862 19 1.000000 19.0000 0.000000
#> MIR556 19 1.000000 19.0000 0.000000
#> ENSG00000235609 12 0.481655 9.1514 0.149924
#> MBD2 0 0.002116 0.0402 -0.002116
#> HIGD1AP6 19 1.000000 19.0000 0.000000
#> ENSG00000276958 19 1.000000 19.0000 0.000000
#> ENSG00000275295 19 1.000000 19.0000 0.000000
#> ENSG00000285454 19 1.000000 19.0000 0.000000
#> C10orf71-AS1 19 1.000000 19.0000 0.000000
#> ENSG00000256001 19 1.000000 19.0000 0.000000
#> ENSG00000279294 19 1.000000 19.0000 0.000000
#> IFNWP5 19 1.000000 19.0000 0.000000
#> MAN1C1 18 0.948760 18.0264 -0.001392
#> RN7SL211P 19 1.000000 19.0000 0.000000
#> GNRHR2P1 19 1.000000 19.0000 0.000000
#> ENSG00000273904 19 1.000000 19.0000 0.000000
#> ENSG00000241593 19 1.000000 19.0000 0.000000
#> WDR31 13 0.657719 12.4967 0.026492
#> DRD5 19 1.000000 19.0000 0.000000
#> ENSG00000256569 19 1.000000 19.0000 0.000000
#> EPHX1 0 0.003312 0.0629 -0.003312
#> ACTN1 0 0.036295 0.6896 -0.036295
#> MIR5188 19 1.000000 19.0000 0.000000
#> RNU6-118P 19 1.000000 19.0000 0.000000
#> ENSG00000271758 19 1.000000 19.0000 0.000000
#> ZNF84-DT 19 1.000000 19.0000 0.000000
#> ENSG00000248733 19 1.000000 19.0000 0.000000
#> ACTL7A 19 1.000000 19.0000 0.000000
#> GID4 0 0.050308 0.9559 -0.050308
#> Y-RNA.2 19 1.000000 19.0000 0.000000
#> MIR200C 19 1.000000 19.0000 0.000000
#> ENSG00000224644 19 1.000000 19.0000 0.000000
#> CSTA 0 0.014942 0.2839 -0.014942
#> MIR664A 19 1.000000 19.0000 0.000000
#> MIR4802 19 1.000000 19.0000 0.000000
#> ENSG00000278655 19 1.000000 19.0000 0.000000
#> ENSG00000280122 19 1.000000 19.0000 0.000000
#> ENSG00000254180 19 1.000000 19.0000 0.000000
#> RNU6-896P 19 1.000000 19.0000 0.000000
#> ENSG00000286805 19 1.000000 19.0000 0.000000
#> SHANK1 19 1.000000 19.0000 0.000000
#> ENSG00000291048 19 1.000000 19.0000 0.000000
#> RN7SL268P 19 1.000000 19.0000 0.000000
#> NLGN2 11 0.592692 11.2611 -0.013745
#> DMC1 14 0.769245 14.6157 -0.032403
#> KCNAB1-AS1 19 1.000000 19.0000 0.000000
#> ENSG00000276015 19 1.000000 19.0000 0.000000
#> WWTR1-IT1 19 1.000000 19.0000 0.000000
#> ENSG00000260465 19 1.000000 19.0000 0.000000
#> RPL5P30 17 0.900205 17.1039 -0.005468
#> ENSG00000270988 19 1.000000 19.0000 0.000000
#> MIR545 19 1.000000 19.0000 0.000000
#> ENSG00000257548 19 1.000000 19.0000 0.000000
#> ENSG00000289950 19 1.000000 19.0000 0.000000
#> ENSG00000262413 16 0.854190 16.2296 -0.012085
#> ENSG00000249890 19 1.000000 19.0000 0.000000
#> RN7SL255P 19 1.000000 19.0000 0.000000
#> TRIM53CP 19 1.000000 19.0000 0.000000
#> RNA5SP107 19 1.000000 19.0000 0.000000
#> RNU6-845P 19 1.000000 19.0000 0.000000
#> ENSG00000241114 19 1.000000 19.0000 0.000000
#> SERBP1P2 19 1.000000 19.0000 0.000000
#> RPS10-NUDT3 18 0.948760 18.0264 -0.001392
#> CDY12P 19 1.000000 19.0000 0.000000
#> MIR4644 19 1.000000 19.0000 0.000000
#> ENSG00000223343 19 1.000000 19.0000 0.000000
#> MORF4L1P3 19 1.000000 19.0000 0.000000
#> MRGPRX3 6 0.412527 7.8380 -0.096737
#> CD160 19 1.000000 19.0000 0.000000
#> [ reached 'max' / getOption("max.print") -- omitted 889 rows ]
#>
#> $summary_by_group
#> group n_genes n_wells mean_p0_obs mean_p0_nb
#> DMSO_0 DMSO_0 500 19 0.819 0.820
#> Staurosporine_10 Staurosporine_10 500 3 0.888 0.872
#> mean_ZI observed_zeros_num expected_zeros_num pct_ZI_gt_0.1
#> DMSO_0 -0.000968 7780 7789 0.004
#> Staurosporine_10 0.016115 1332 1308 0.080
#> pct_ZI_gt_0.2
#> DMSO_0 0.000
#> Staurosporine_10 0.052
#>