
Cross platform compatibility
Source:vignettes/cross_platform_compatibility.Rmd
cross_platform_compatibility.Rmd
Introduction
This vignette is to demonstrate how to adapt macpie’s MACseq workflows to any high-throughput transcriptomic profiling (HTTr) data format such as DRUGseq. As the experimental design of DRUGseq plates is a bit different from MACseq. In MACseq experimental design, we prefer to have replicate wells in the same plate. While in DRUGseq, the replicates are in different plates.
Key points:
This vignette will cover the following two parts:
1. Converting a DRUGseq plate
Mapping DRUGseq plate layouts into macpie’s metadata format.
Validating the metadata format and content.
Some basic QC steps from QC vignette.
2. Converting multiple DRUGseq plates
Importing multiple plates into a single macpie object.
Detecting batch effects across plates in the QC section.
Performing differential expression and pathway enrichment tests on merged data with limma-voom correction.
The DRUGseq dataset is a large-scale drug screening dataset that includes a large set of small molecules (N = 4,343) tested on U2OS cells. This dataset was retrieved from Zenodo (Ozer et al., 2024).
library(macpie)
library(tibble)
library(stringr)
library(pheatmap)
library(ggiraph)
library(tidyseurat)
#> Loading required package: ttservice
#>
#> Attaching package: 'ttservice'
#> The following objects are masked from 'package:macpie':
#>
#> bind_cols, bind_rows, plot_ly
#> Loading required package: SeuratObject
#> Loading required package: sp
#>
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#>
#> intersect, t
#> ========================================
#> tidyseurat version 0.8.0
#> If you use TIDYSEURAT in published research, please cite:
#>
#> Mangiola et al. Interfacing Seurat with the R tidy universe. Bioinformatics 2021.
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(tidyseurat))
#>
#> To restore the Seurat default display use options("restore_Seurat_show" = TRUE)
#> ========================================
#>
#> Attaching package: 'tidyseurat'
#> The following object is masked from 'package:ttservice':
#>
#> plot_ly
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:macpie':
#>
#> %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
#> flatten_raw, invoke, splice
#> The following object is masked from 'package:testthat':
#>
#> is_null
Converting a DRUGseq plate to a macpie object
1. Metadata import and validation
Experimental background and metadata files are in Novartis_drugseq_U2OS_MoABox/ on DRUGseq Github and data can be downloaded from this ZENODO link: Novartis/DRUG-seq U2OS MoABox Dataset Creators.
plate_well_metadata <- read.csv(paste0(dir,"DRUGseqData/DRUGseq_U2OS_MoABox_plate_wells_metadata_public.txt"), sep = "\t")
As the metadata contains plate and well-level information for the 59,904 samples, we only read in the metadata for plate VH02012944, which is the plate we use in this vignette.
1.1 Convert DRUGseq metadata to macpie metadata format
An example of a DRUGseq plate content format is also available on on DRUGseq Github.
First, we can have a look at the content and column names:
head(metadata)
#> analysis_id investigation_id investigation_name batch_id plate_replicate
#> 1 24 2384 DRUG-seq_CBT_U2OS_MoA 24 2
#> 2 24 2384 DRUG-seq_CBT_U2OS_MoA 24 2
#> 3 24 2384 DRUG-seq_CBT_U2OS_MoA 24 2
#> 4 24 2384 DRUG-seq_CBT_U2OS_MoA 24 2
#> 5 24 2384 DRUG-seq_CBT_U2OS_MoA 24 2
#> 6 24 2384 DRUG-seq_CBT_U2OS_MoA 24 2
#> experiment_id plate_name plate_barcode plate_index well_id
#> 1 6487 U-2_OS_MoA_batch24_rep2 VH02012944 TTACAGGA A01
#> 2 6487 U-2_OS_MoA_batch24_rep2 VH02012944 TTACAGGA J01
#> 3 6487 U-2_OS_MoA_batch24_rep2 VH02012944 TTACAGGA K01
#> 4 6487 U-2_OS_MoA_batch24_rep2 VH02012944 TTACAGGA L01
#> 5 6487 U-2_OS_MoA_batch24_rep2 VH02012944 TTACAGGA M01
#> 6 6487 U-2_OS_MoA_batch24_rep2 VH02012944 TTACAGGA N01
#> well_index col row well_type cell_line_name cell_line_ncn concentration unit
#> 1 AACAAGGTAC 1 1 SA U-2 OS FH55-48QE 10 uM
#> 2 AAGATCGGCG 1 10 SA U-2 OS FH55-48QE 10 uM
#> 3 AAGCGATGTT 1 11 SA U-2 OS FH55-48QE 10 uM
#> 4 AAGCGTTCAG 1 12 SA U-2 OS FH55-48QE 10 uM
#> 5 AAGGTCTGGA 1 13 SA U-2 OS FH55-48QE 10 uM
#> 6 AAGTTAGCGC 1 14 SA U-2 OS FH55-48QE 10 uM
#> hours_post_treatment biosample_id external_biosample_id cmpd_sample_id
#> 1 24 2018772 KA-74-VF86 BA-86-AL61
#> 2 24 2018988 AB-94-KK84 BB-79-AG41
#> 3 24 2019012 OD-91-CJ88 BB-41-XE67
#> 4 24 2019036 AE-90-GO84 ED-91-LA02
#> 5 24 2019060 MF-99-JS89 DF-11-IL69
#> 6 24 2019084 UB-01-LS89 LE-80-BM08
#> plate_well
#> 1 VH02012944_AACAAGGTAC
#> 2 VH02012944_AAGATCGGCG
#> 3 VH02012944_AAGCGATGTT
#> 4 VH02012944_AAGCGTTCAG
#> 5 VH02012944_AAGGTCTGGA
#> 6 VH02012944_AAGTTAGCGC
Next, we convert relevant columns in DRUGseq metadata to macpie metadata format.
#extract relevant columns from DRUGseq plate
macpie_metadata <- metadata %>%
select(plate_barcode, well_id, well_index, row, col, well_type, cell_line_name, concentration, unit, hours_post_treatment, cmpd_sample_id) %>%
mutate(
Plate_ID = plate_barcode,
Well_ID = well_id,
Barcode = well_index,
Cell_type = "U2OS",
Unit_1 = unit,
Treatment_1 = cmpd_sample_id,
Sample_type = well_type,
Concentration_1 = as.numeric(concentration),
Row = LETTERS[row],
Column = as.integer(col),
Time = as.factor(hours_post_treatment),
Unit = "h",
Species = "human",
Model_type = "2D_adherent",
Sample_type = if_else(well_type == "SA" & col == 24,
"Positive Control",
well_type)
)
#Column names for macpie
col_names <- c("Plate_ID", "Well_ID", "Row", "Column", "Species",
"Cell_type", "Model_type", "Time", "Unit", "Treatment_1",
"Concentration_1", "Unit_1", "Sample_type", "Barcode")
macpie_metadata <- macpie_metadata[, col_names]
head(macpie_metadata)
#> Plate_ID Well_ID Row Column Species Cell_type Model_type Time Unit
#> 1 VH02012944 A01 A 1 human U2OS 2D_adherent 24 h
#> 2 VH02012944 J01 J 1 human U2OS 2D_adherent 24 h
#> 3 VH02012944 K01 K 1 human U2OS 2D_adherent 24 h
#> 4 VH02012944 L01 L 1 human U2OS 2D_adherent 24 h
#> 5 VH02012944 M01 M 1 human U2OS 2D_adherent 24 h
#> 6 VH02012944 N01 N 1 human U2OS 2D_adherent 24 h
#> Treatment_1 Concentration_1 Unit_1 Sample_type Barcode
#> 1 BA-86-AL61 10 uM SA AACAAGGTAC
#> 2 BB-79-AG41 10 uM SA AAGATCGGCG
#> 3 BB-41-XE67 10 uM SA AAGCGATGTT
#> 4 ED-91-LA02 10 uM SA AAGCGTTCAG
#> 5 DF-11-IL69 10 uM SA AAGGTCTGGA
#> 6 LE-80-BM08 10 uM SA AAGTTAGCGC
1.2 Validate DRUGseq metadata
Now, we can test the metadata validation function to ensure that the metadata is in the correct format and contains all the required columns.
validate_metadata(macpie_metadata)
#>
#> No validation issues found. Metadata is clean.
#>
#> Generating summary table grouped by Plate_ID...
#> Plate_ID count_Species count_Cell_type count_Model_type count_Time
#> 1 VH02012944 1 1 1 1
#> count_Unit count_Treatment_1 count_Concentration_1 count_Unit_1
#> 1 1 341 2 1
#> count_Sample_type
#> 1 4
Visualizing and inspecting the metadata layout
plot_metadata_heatmap(macpie_metadata)
2. Quality control
2.1 Import data
Let’s import one of the replicate plates VH02012944
from
batch 24.
DRUGseq data was downloaded from ./Exp_gzip.RData from the
above Zenodo link and saved as Exp_batch24.Rds
locally for
faster loading.
Each batch is a list of plates, and each plate contains the UMI counts matrix and the corresponding metadata.
# load("DRUGseqData/Exp_gzip.RData")
# batch24 <- Exp$`24`
# saveRDS(batch24, file = "DRUGseqData/Exp_batch24.Rds")
batch24 <- readRDS(paste0(dir,"DRUGseqData/Exp_batch24.Rds"))
data <- batch24$VH02012944
counts <- data$UMI.counts
colnames(counts) <- str_remove_all(colnames(counts), "VH02012944_")
Quick check of the UMI counts matrix dimensions and the first few row names:
dim(data$UMI.counts)
#> [1] 59594 384
rownames(data$UMI.counts)[1:10]
#> [1] "DUX4L9,grch38_4" "AC010378.2,grch38_5" "AL136295.5,grch38_14"
#> [4] "AC106786.1,grch38_5" "AC138956.2,grch38_5" "MTND2P40,grch38_19"
#> [7] "AC104109.2,grch38_5" "RNU6-1263P,grch38_3" "AC243972.3,grch38_14"
#> [10] "COX6B1P5,grch38_4"
Row names of DRUGseq data are in the format of “gene_chromosome”. But we only keep gene names for macpie data.
counts <- rownames_to_column(as.data.frame(counts), var= "gene_id") %>%
separate(gene_id, into = c("gene_name", "chrom"), sep = ",")
counts[1:10,1:5]
#> gene_name chrom CGGAGATTGG ACACCGAATT AGCCACTAGC
#> 1 DUX4L9 grch38_4 0 0 0
#> 2 AC010378.2 grch38_5 0 0 0
#> 3 AL136295.5 grch38_14 4 2 9
#> 4 AC106786.1 grch38_5 0 0 0
#> 5 AC138956.2 grch38_5 0 0 0
#> 6 MTND2P40 grch38_19 0 0 0
#> 7 AC104109.2 grch38_5 0 0 0
#> 8 RNU6-1263P grch38_3 0 0 0
#> 9 AC243972.3 grch38_14 0 0 0
#> 10 COX6B1P5 grch38_4 0 0 0
counts$gene_name <- make.unique(counts$gene_name)
counts <- counts %>%
select(-chrom) %>%
tibble::column_to_rownames(var = "gene_name") %>%
as.matrix()
The UMI counts matrix and metadata are now ready to be used with macpie functions. We can create a tidySeurat object and join it with the metadata.
as_mac<- CreateSeuratObject(counts = counts,
assay = "RNA",
project = "VH02012944")
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes
#> ('-')
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
as_mac<- as_mac%>% inner_join(macpie_metadata, by = c(".cell"="Barcode"))
Filtering:
We previously saved a filtered data set, which filtered out genes with < 5 reads in at least 1 replicate well for each treatment.
2.2 Visualize plate layout
Now, we can check UMI counts and sample types in the wells.
p <- plot_plate_layout(as_mac, "nCount_RNA", "Sample_type")
girafe(ggobj = p,
fonts = list(sans = "sans"),
options = list(
opts_hover(css = "stroke:black; stroke-width:0.8px;") # <- slight darkening
))
We can check common QC plots from the Seurat package to visualise the number of genes, reads, and percentage of mitochondrial and ribosomal genes per sample.
# Calculate percent of mitochondrial and ribosomal genes
as_mac[["percent.mt"]] <- PercentageFeatureSet(as_mac, pattern = "^mt-|^MT-")
as_mac[["percent.ribo"]] <- PercentageFeatureSet(as_mac, pattern = "^Rp[slp][[:digit:]]|^Rpsa|^RP[SLP][[:digit:]]|^RPSA")
# Example of a function from Seurat QC
VlnPlot(as_mac, features = c("nFeature_RNA", "nCount_RNA", "percent.mt", "percent.ribo"),
ncol = 4, group.by = "Sample_type") &
scale_fill_manual(values = macpie_colours$discrete)
#> Warning: Default search for "data" layer in "RNA" assay yielded no results;
#> utilizing "counts" layer instead.
#> Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
#> ℹ Please use the `layer` argument instead.
#> ℹ The deprecated feature was likely used in the Seurat package.
#> Please report the issue at <https://github.com/satijalab/seurat/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: `PackageCheck()` was deprecated in SeuratObject 5.0.0.
#> ℹ Please use `rlang::check_installed()` instead.
#> ℹ The deprecated feature was likely used in the Seurat package.
#> Please report the issue at <https://github.com/satijalab/seurat/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
2.2 Basic QC metrics
2.2.1 Sample grouping
Same as the Quality control vignette, we first visualise grouping of the samples based on top 500 expressed genes and limma’s MDS function. Hovering over the individual dots reveals sample identity and grouping.
p <- plot_mds(as_mac, group_by = "Sample_type", label = "Sample_type", n_labels = 30)
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
girafe(ggobj = p, fonts = list(sans = "sans"))
#> Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps
2.2.2 Correction of the batch effect
For simplicity, we only plot the RLE (Relative Log Expression) plot for different types of controls.
plot_rle(as_mac %>% filter(Sample_type !="SA"), label_column = "Sample_type", normalisation = "raw")
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
plot_rle(as_mac %>% filter(Sample_type !="SA"), label_column = "Sample_type", normalisation = "limma_voom")
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
Coverting multiple DRUGseq plates to a macpie object
1. Metadata import and validation
In this part of the vignette, we need three replicate plates from batch 24.
DRUGseq data was downloaded from ./Exp_gzip.RData from the
above Zenodo link and saved as Exp_batch24.Rds
locally for
faster loading.
Each batch is a list of plates, and each plate contains the UMI counts matrix and the corresponding metadata.
names(batch24)
#> [1] "VH02012956" "VH02012942" "VH02012944"
This batch contains 3 replicate plates, each has the UMI counts matrix and the metadata.
Now, we convert the metadata format to macpie metadata format.
The idea is to make a combined metadata and a combined count matrix
for three plates with the plate_ID
labelled.
#make a combined metadata for three plates
batch24_metadata <- batch24 %>%
map_dfr(~ {
.x$Annotation %>%
mutate(
Plate_ID = plate_barcode,
Well_ID = well_id,
Barcode = paste0(plate_barcode, "_", well_index),
Row = LETTERS[row],
Column = as.integer(col),
Species = "human",
Cell_type = "U2OS",
Model_type = "2D_adherent",
Time = as.factor(hours_post_treatment),
Unit = "h",
Treatment_1 = cmpd_sample_id,
Concentration_1 = as.numeric(concentration),
Unit_1 = unit,
Sample_type = if_else(well_type == "SA" & col == 24,
"Positive Control",
well_type)
)
})
batch24_metadata <- batch24_metadata%>%select(-c(batch_id, plate_barcode,plate_index, well_id,
well_index, col, row, biosample_id, external_biosample_id,
cmpd_sample_id, well_type, cell_line_name, cell_line_ncn, concentration, unit, hours_post_treatment, Sample))
# create a combined UMI matrix for 3 plates
batch24_counts <- batch24 %>%
map(~ {
.x$UMI.counts %>%
as.data.frame() %>%
rownames_to_column("gene_id") %>%
separate(col = gene_id, into = c("gene_name", "chrom"), sep = ",") %>%
mutate(gene_name = make.unique(gene_name)) %>%
select(-chrom) %>%
tibble::column_to_rownames(var = "gene_name") %>%
as.matrix()
})
binded_counts <- do.call(cbind, batch24_counts)
2. Quality control
2.1 Import data
Then, we can read in the combined count matrix and metadata.
as_mac <- CreateSeuratObject(counts = binded_counts,
min.cells = 1,
min.features = 1)
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes
#> ('-')
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
as_mac<- as_mac%>% inner_join(batch24_metadata, by = c(".cell"="Barcode"))
Filtering
As filtering genes in three 384-well plates could take a while. We suggest to save a previously filtered object to work with.
as_mac$combined_id <- paste0(as_mac$Treatment_1,"_", as_mac$Concentration_1)
min_sample_num <- min(table(as_mac$combined_id))
# mac_filtered <- filter_genes_by_expression(as_mac,
# group_by = "combined_id", min_counts = 10,
# min_samples = min_sample_num)
#
# saveRDS(mac_filtered,
# file = paste0(dir, "DRUGseqData/macpie_filtered_batch24.Rds"))
mac_filtered <- readRDS(paste0(dir, "/DRUGseqData/macpie_filtered_batch24.Rds"))
Here, we focus on check the data quality across three replicate plates, especially any batch effects from this batch 24.
mac_filtered[["percent.mt"]] <- PercentageFeatureSet(mac_filtered, pattern = "^mt-|^MT-")
mac_filtered[["percent.ribo"]] <- PercentageFeatureSet(mac_filtered, pattern = "^Rp[slp][[:digit:]]|^Rpsa|^RP[SLP][[:digit:]]|^RPSA")
# Example of a function from Seurat QC
VlnPlot(mac_filtered, features = c("nFeature_RNA", "nCount_RNA", "percent.mt", "percent.ribo"),
ncol = 4, group.by = "Sample_type", split.by = "orig.ident") + theme(legend.position = 'right') +
scale_fill_manual(values = macpie_colours$discrete)
#> Warning: Default search for "data" layer in "RNA" assay yielded no results;
#> utilizing "counts" layer instead.
#> The default behaviour of split.by has changed.
#> Separate violin plots are now plotted side-by-side.
#> To restore the old behaviour of a single split violin,
#> set split.plot = TRUE.
#>
#> This message will be shown once per session.
#> Scale for fill is already present.
#> Adding another scale for fill, which will replace the existing scale.
p <- plot_plate_layout(mac_filtered, "nCount_RNA", "combined_id") + facet_wrap(~orig.ident, ncol = 1) +
theme(strip.text = element_text(size=10),
axis.text.x = element_text(size=10),
axis.text.y = element_text(size=8),
legend.title = element_text(size=10),
legend.text = element_text(size=8),
trip.background = element_blank())
girafe(ggobj = p,
fonts = list(sans = "sans"),
options = list(
opts_hover(css = "stroke:black; stroke-width:1px;")
))
#> Warning in plot_theme(plot): The `trip.background` theme element is not
#> defined in the element hierarchy.
2.2 Basic QC metrics
2.2.1 Sample grouping with MDS plot
Same as above, we first visualise grouping of the samples in MDS plot.
p <- plot_mds(mac_filtered, group_by = "Sample_type", label = "combined_id", n_labels = 30)
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
p1 <- plot_mds(mac_filtered, group_by = "orig.ident", label = "combined_id", n_labels = 30)
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
g <- patchwork::wrap_plots(list(p, p1), ncol = 1, nrow = 2, rel_widths = c(7, 7), rel_heights = c(10, 10))
girafe(
ggobj = g,
width_svg = 10, # 10 inches wide
height_svg = 10, # 10 inches tall
fonts = list(sans = "sans"),
options = list(opts_hover(css = "stroke:black; stroke-width:0.8px;"))
)
#> Warning: ggrepel: 30 unlabeled data points (too many overlaps). Consider increasing max.overlaps
#> ggrepel: 30 unlabeled data points (too many overlaps). Consider increasing max.overlaps
2.2.2 Sample grouping with UMAP
Apart from MDS plot, we show that this data can also be applied to Seurat’s SCTransform to visualise in UMAP.
mac_sct <- SCTransform(mac_filtered, verbose = FALSE) %>%
RunPCA(verbose = FALSE) %>%
RunUMAP(verbose = FALSE,dims = 1:30)
DimPlot(mac_sct, reduction = "umap", group.by = "orig.ident", cols = macpie_colours$discrete)
From MDS and UMAP, there are batch effects among three replicate plates. Only one compound: three wells of BA-51-N076_10 formed a distinct cluster. All wells were separated by plates.
2.2.3 Distribution of UMI counts
We also want to expect what is the distribution of UMIs across the experiment. To that end we use box plot to show distribution of UMI counts grouped across treatments.
As there are 341 unique combinations of compound_concentration, it’s too messy to show in the vignette. In here, we only show 200 of them.
compounds_subset <- unique(mac_filtered$combined_id)
qc_stats <- compute_qc_metrics(mac_filtered %>% filter(combined_id %in% compounds_subset[1:200]), group_by = "combined_id", order_by = "median")
2.2.5 Correction of batch effect
According to DRUGseq metadata:
Wells with water are labelled as EC-27-RY89
Wells with DMSO are labelled as CB-43-EP73
mac_filtered_dmso <- mac_filtered %>% filter(Treatment_1 == "CB-43-EP73")
plot_rle(mac_filtered_dmso, label_column = "orig.ident", normalisation = "raw")+ scale_x_discrete(drop = FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
plot_rle(mac_filtered_dmso, label_column = "orig.ident", normalisation = "limma_voom")+ scale_x_discrete(drop = FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
Note: instead of discussing which correction methods we should use for this data, we only show the ways we detected and corrected batch effect here. As batch effect adjustment for sequencing data has been implemented in different methods, such as DESeq2, RUVSeq, edgeR. We highly recommend users thoroughly checking any batch effects and exploring different methods.
In the next part of the vignette, we demonstrate a batch
parameter has implemented in the differential expression test for batch
correction.
3. Differential gene expression
3.1 Single comparison
In here, you can specify a single condition in the combined_id column
and compare with DMSO (i.e.CB_43_EP73_0). By using the plate IDs in the
column of orig.ident as the input for batch parameter,
compute_singe_de
function can perform differential
expression analysis using the preferred method (limma voom in this
example) with batch information.
mac_filtered$combined_id <- str_replace_all(mac_filtered$combined_id, "-","_")
treatment_samples <- "FF_86_NH56_10"
control_samples <- "CB_43_EP73_0"
subset <- mac_filtered%>%filter(combined_id%in%c(treatment_samples,control_samples))
batch <- subset$orig.ident
top_table <- compute_single_de(mac_filtered, treatment_samples, control_samples, method = "limma_voom", batch = batch)
plot_volcano(top_table, max.overlaps = 6)
#> Warning: ggrepel: 5718 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps
Now we can visualise gene expression (in CPM) for the top 6 genes from the differential expression analysis.
genes <- top_table$gene[1:6]
group_by <- "combined_id"
plot_counts(mac_filtered,genes, group_by, treatment_samples, control_samples, normalisation = "cpm")
#> Normalizing layer: counts
Pathway enrichment analysis can be performed on the top
differentially expressed genes using function enrichr
.
3.2 Multiple comparisons
As there are 339 compounds with 10uM in the data, which could take quite a while (around 24 mins for a M3 Pro 18GB memory on parallelisation speedup num_cores = 4) to run. For the purpose of this vignette, we only include 100 compounds with 10uM in the data.
treatments <- mac_filtered %>%
filter(Concentration_1 == 10) %>%
select(combined_id) %>%
pull() %>%
unique()
#> tidyseurat says: Key columns are missing. A data frame is returned for independent data analysis.
treatments_subset <- treatments[1:100] # only use 100 compounds for the vignette
mac_filtered <- compute_multi_de(mac_filtered, treatments_subset, control_samples = "CB_43_EP73_0", method = "limma_voom", num_cores = 4)
We often want to ask which genes are differentially expressed in more than one treatment group.
Here, we can visualise treatment groups with shared differentially expressed genes, defined as the top 20 up-regulated genes from each single drug comparison (treatment vs control) that are found in at least 2 different treatment groups.
The heatmap below shows shared differentially expressed genes with corresponding log2FC values.
plot_multi_de(mac_filtered, group_by = "combined_id", value = "log2FC", p_value_cutoff = 0.01, direction="up", n_genes = 20, control = "CB_43_EP73_0", by="fc")
4. Pathway enrichment analysis
4.1 Multiple comparisons
The pathway enrichment analysis is done by using enrichR. Results from differential gene expression - multiple comparisons are used to pass on to the pathway enrichment analysis.
You can visualise the pathway enrichment results for multiple comparisons in a heatmap.
# Load genesets from enrichr for a specific species or define your own
enrichr_genesets <- download_geneset("human", "MSigDB_Hallmark_2020")
mac_filtered<- compute_multi_enrichr(mac_filtered, genesets = enrichr_genesets)
enriched_pathways_mat <- mac_filtered@tools$pathway_enrichment %>%
bind_rows() %>%
select(combined_id, Term, Combined.Score) %>%
pivot_wider(names_from = combined_id, values_from = Combined.Score) %>%
column_to_rownames(var = "Term") %>%
mutate(across(everything(), ~ ifelse(is.na(.), 0, log1p(.)))) %>% # Replace NA with 0 across all columns
as.matrix()
pheatmap(enriched_pathways_mat, color = macpie_colours$continuous_rev)