Tutorial 1: Treatment-based Metabolomics
Tutorial_1_Treatment_based_study.RmdDownload and run this tutorial: Tutorial_1_Treatment-based_study.Rmd — open in RStudio and run chunks with Ctrl+Enter.
Background
In this tutorial, we demonstrate how to use the eCOMET package for analyzing metabolomics data from a treatment-based study
What is a treatment-based study?
- In a treatment-based study, we compare metabolite profiles between different treatment groups (e.g., control vs. treated).
- The main goal is to identify differentially accumulated metabolites (DAMs) — compounds whose abundance changes significantly in response to a treatment.
- Common downstream analyses include volcano plots, heatmaps, and chemical class enrichment to understand what types of metabolites respond to treatment.
Tutorial dataset
The example data represent metabolomics analysis of Arabidopsis thaliana (Col-0) leaves attacked by two different herbivores:
| Group | Description | Replicates |
|---|---|---|
ctrl |
Undamaged control | 8 |
sl1 |
Spodoptera litura (chewing herbivore) | 8 |
le1 |
Lipaphis erysimi (aphid, piercing-sucking) | 8 |
These files are shipped with the eCOMET package and can be accessed via system.file().
Step 1: Locate tutorial data
eCOMET is distributed with a small set of example files so that tutorials can be run immediately without downloading additional data.
# Locate tutorial data shipped with the eCOMET package
data_dir <- system.file(
"extdata/tutorials/treatment_based",
package = "ecomet"
)
stopifnot(nzchar(data_dir)) # fail loudly if package/data not installed
# Define file paths
demo_feature <- file.path(data_dir, "feature_table_demo.csv")
demo_metadata <- file.path(data_dir, "metadata_demo.csv")
demo_sirius_formula <- file.path(data_dir, "canopus_formula_summary.tsv")
demo_sirius_structure <- file.path(data_dir, "structure_identifications.tsv")
demo_dreams <- file.path(data_dir, "dreams_sim_demo.csv")
gls_db <- file.path(data_dir, "custom_DB_glucosinolates.csv")Step 2: Create an mmo object
The mmo (mass-spectrometry metabolomics object) is the central data structure in eCOMET. It stores feature abundances, sample metadata, annotations, and analysis results as a named list.
To create an mmo object, you need at minimum:
- Feature abundance table — exported from MZmine (preferably the “full feature table” format)
- Sample metadata — a CSV identifying samples and their treatment groups
Optionally, you can later add:
- SIRIUS/CANOPUS annotations for chemical classification
- Chemical similarity scores (DreaMS, cosine, MS2DeepScore)
- Custom compound database annotations
2.1 Initialize the mmo object
GetMZmineFeature() reads your MZmine feature table and metadata, then links them together.
mmo <- GetMZmineFeature(
mzmine_dir = demo_feature,
metadata_dir = demo_metadata,
group_col = 'group', # column in metadata defining treatment groups
sample_col = 'sample' # column in metadata with sample file names
)The newly created mmo object contains three main components:
1) Feature abundance matrix — rows are features (m/z + retention time), columns are samples:
mmo$feature_data2) Feature information — m/z, retention time, and other feature-level metadata:
mmo$feature_info3) Sample metadata — treatment groups, sample masses, phenotype data:
mmo$metadataStep 3: Preprocessing and normalization
Before statistical analysis, we apply several preprocessing steps. Each step adds a new element to the mmo object — the original data is never overwritten, so you can always go back.
3.1 Replace zeros
Mass spectrometry data often contains zeros (undetected features). These need to be handled before log transformation or fold-change calculations. ReplaceZero() adds an imputed version of the feature table (mmo$imputed_feature_data).
-
method = 'one': Replace 0 and NA values with 1 (simple, conservative) -
method = 'half_min': Replace with half the minimum non-zero value per feature
mmo <- ReplaceZero(mmo, method = 'one')
head(mmo$imputed_feature_data)3.2 Mass normalization
If your metadata contains a mass column (tissue mass in mg), MassNormalization() adjusts peak areas for differences in sample mass: normalized = original * mean(mass) / sample_mass.
mmo <- MassNormalization(mmo)
mmo$feature_data3.3 Additional normalizations
eCOMET offers several normalization methods. Each adds a new element to the mmo object:
| Function | Result stored in | Description |
|---|---|---|
LogNormalization() |
mmo$log |
log2(x + 1) transformation — stabilizes variance |
MeancenterNormalization() |
mmo$meancentered |
Row-wise mean subtraction — centers features around zero |
ZNormalization() |
mmo$zscore |
Row-wise z-score — standardizes to mean=0, sd=1 |
mmo <- MeancenterNormalization(mmo, imputed = TRUE)
mmo <- LogNormalization(mmo, imputed = TRUE)
mmo <- ZNormalization(mmo, imputed = TRUE) 3.4 Presence/absence matrix
You can also convert the feature abundance matrix to a binary presence/absence matrix. Features with abundance above the threshold are marked as present (1), otherwise absent (0).
mmo <- FeaturePresence(mmo, threshold = 1)Step 4: Add annotations
4.1 SIRIUS/CANOPUS annotation
SIRIUS provides computational annotations including molecular formula predictions and compound class assignments via CANOPUS (NPC and ClassyFire classifications).
mmo <- AddSiriusAnnot(
mmo,
canopus_structuredir = demo_sirius_structure,
canopus_formuladir = demo_sirius_formula
)
mmo$sirius_annotNote on duplicates: SIRIUS may assign multiple candidate annotations to the same feature. If you see duplicate feature entries in
mmo$sirius_annot, only the top-ranked annotation per feature is retained. Checksummary(mmo)to confirm the annotation count matches.
4.2 Filter CANOPUS annotations by confidence
CANOPUS assigns posterior probabilities to each compound class prediction. We can filter out low-confidence predictions to improve the quality of downstream enrichment analyses.
First, let’s visualize the distribution of prediction confidence:
hist(mmo$sirius_annot$`NPC#pathway Probability`,
main = "Distribution of NPC Pathway Probabilities",
xlab = "Probability", col = "lightblue")There is typically a long tail of low-confidence predictions. We can filter these using filter_canopus_annotations():
-
pathway_level: which classification levels to filter (e.g.,"NPC#pathway","All","All_NPC") -
threshold: minimum probability to retain (values below are set to NA) -
suffix: label for the filtered result
How to choose a threshold? There is no universal threshold. The SIRIUS documentation recommends using context-dependent thresholds. For statistical analyses, you can sum probabilities to get expected counts rather than applying a hard cutoff.
mmo <- filter_canopus_annotations(
mmo,
pathway_level = "NPC#pathway",
threshold = 0.8,
suffix = "NPC_pathway_0.8",
overwrite = TRUE
)
mmo$sirius_annot_filtered_NPC_pathway_0.84.3 Filter structure predictions by COSMIC confidence
SIRIUS structure predictions are scored by the COSMIC confidence score. These scores should be interpreted with caution — they are not probabilities. The SIRIUS team recommends focusing on the highest-confidence hits (e.g., top 5–10%).
# Replace -Infinity values with 0
mmo$sirius_annot_filtered_NPC_pathway_0.8$ConfidenceScoreApproximate[
which(mmo$sirius_annot_filtered_NPC_pathway_0.8$ConfidenceScoreApproximate == "-Infinity")
] <- 0
Cosmic_Scores <- as.numeric(
mmo$sirius_annot_filtered_NPC_pathway_0.8$ConfidenceScoreApproximate
)
hist(Cosmic_Scores, main = "COSMIC Confidence Scores for Structures",
xlab = "Confidence Score", col = "lightblue")
# Find the threshold for the top 10% of scores
quantile(x = Cosmic_Scores, probs = 0.9, na.rm = TRUE)
mmo <- filter_cosmic_structure(
mmo,
input = "sirius_annot_filtered_NPC_pathway_0.8",
cosmic_mode = "approx",
threshold = 0.3892,
suffix = "CANOPUS_0.8_COSMIC_Top_10"
)4.4 Create feature vectors from annotations
Using SIRIUS annotations, you can create vectors of features belonging to specific compound classes. These are useful for downstream analyses like targeted heatmaps or PCA.
# Extract flavonoid features using ClassyFire annotation
FLVs <- mmo$sirius_annot %>%
filter(str_detect(`ClassyFire#most specific class`, "Flavonoid")) %>%
pull(id)
FLVs4.5 Custom compound database annotation
You can match features against an in-house compound library by m/z (ppm tolerance) and retention time (minute tolerance).
4.6 Add chemical distance matrix
Chemical similarity scores (e.g., from DreaMS) can be used for chemically-informed heatmap clustering and diversity analyses.
mmo <- AddChemDist(mmo, dreams_dir = demo_dreams)
mmo$dreams.dissim[1:10, 1:10]Step 5: Dimensionality reduction plots
5.1 PCA plot and PERMANOVA
A PCA (Principal Component Analysis) plot visualizes the overall distribution of samples across groups. PCAplot() performs PCA, generates a scatter plot with confidence ellipses, and runs a PERMANOVA test to check if groups are significantly different.
Key options:
-
normalization: which normalization to use ("None","Log","Meancentered","Z") -
label: whether to label individual sample points -
filter_id/id_list: restrict analysis to specific features -
save_output: set toFALSEto return the plot object without saving files
# Define group colors
colors <- c("ctrl" = "grey", "sl1" = "#fdcdac", "le1" = "#b3e2cd")
# Basic PCA
pca <- PCAplot(mmo, color = colors, save_output = FALSE)
pca$plot
# PCA with log-normalized data, no sample labels
log_pca <- PCAplot(mmo, color = colors, label = FALSE, normalization = 'Log', save_output = FALSE)
log_pca$plot
ggsave("PCA_log.pdf", log_pca$plot, width = 4, height = 4)
# PCA of flavonoids only
flv_pca <- PCAplot(mmo, color = colors,label = FALSE, filter_id = TRUE, id_list = FLVs, save_output = FALSE)
flv_pca$plotCustomizing PCA plots
All plot-generating functions in eCOMET return a list containing the plot object, data frames, and statistical results. You can use these to create plots with your own aesthetics.
# Get PCA plot data
pca_res <- PCAplot(mmo, color = colors, save_output = FALSE)
# Access the components
# pca_res$plot — the ggplot object
# pca_res$df — data.frame with PC1, PC2, group, sample columns
# pca_res$permanova — PERMANOVA results
# Create a custom plot
library(ggplot2)
custom_pca <- ggplot(pca_res$df, aes(x = PC1, y = PC2, color = group)) +
geom_point(size = 3) +
scale_color_manual(values = colors) +
theme_minimal() +
labs(title = "Custom PCA Plot")
ggsave("260320_demo/plot/PCA/PCA_custom.pdf", custom_pca, width = 8, height = 6)5.2 PLS-DA plot
PLS-DA (Partial Least Squares Discriminant Analysis) is a supervised method that maximizes separation between groups. Unlike PCA, PLS-DA uses group labels during dimensionality reduction.
plsda <- PLSDAplot(mmo, color = colors, save_output = FALSE)
plsda$plotStep 6: Identify differentially accumulated metabolites (DAMs)
A central goal of treatment-based studies is to find differentially accumulated metabolites (DAMs) — features whose abundance changes significantly between treatment and control.
6.1 Pairwise comparison
PairwiseComp() performs a pairwise t-test between two groups and computes:
- log2 fold change — magnitude of change (log2(group2_mean / group1_mean))
- adjusted p-value — significance after BH correction
Convention: The first group (
group1) is the reference/control. Positive log2FC means the feature is higher ingroup2(treatment).
# Compare each treatment to control
mmo <- PairwiseComp(mmo, group1 = 'ctrl', group2 = 'sl1')
mmo <- PairwiseComp(mmo, group1 = 'ctrl', group2 = 'le1')6.2 Extract DAMs
GetDAMs() extracts DAMs from all comparisons based on fold-change and p-value cutoffs.
-
fc_cutoff = 0.5849625corresponds to log2(1.5) — a 1.5-fold change -
pval_cutoff = 0.1is used here to capture more DAMs in this small demo dataset
DAMs <- GetDAMs(mmo, fc_cutoff = 0.5849625, pval_cutoff = 0.1)
DAMs_up <- DAMs$DAMs_up # Features upregulated in treatment
DAMs_down <- DAMs$DAMs_down # Features downregulated in treatment
head(DAMs_up)Step 7: Visualization of DAMs
7.1 Volcano plot
A volcano plot visualizes all features by their fold change (x-axis) and significance (y-axis). Features exceeding both thresholds are highlighted as DAMs.
vol_sl <- VolcanoPlot(mmo, comp = 'ctrl_vs_sl1',save_output = FALSE)
vol_sl$plot
# Remove feature labels by setting topk = 0
vol_le <- VolcanoPlot(mmo, comp = 'ctrl_vs_le1', topk = 0, save_output = FALSE)
vol_le$plot7.2 Venn diagram and UpSet plot
These plots visualize the overlap of DAMs between comparisons — e.g., which upregulated features are shared between herbivore treatments?
# Define input list for Venn/UpSet
VennInput <- list(
sl1.up = DAMs_up$ctrl_vs_sl1.up,
le1.up = DAMs_up$ctrl_vs_le1.up
)
# Venn diagram
library(ggvenn)
venn_plot <- ggvenn(VennInput, stroke_size = 0.5, set_name_size = 4,
show_percentage = FALSE) +
theme(legend.position = "none")
venn_plot
# UpSet plot
library(UpSetR)
upset(fromList(VennInput), nsets = 10, nintersects = 20,
order.by = 'freq', mainbar.y.label = 'Features in Set',
line.size = 1, point.size = 4, shade.color = 'white',
text.scale = 1, show.numbers = FALSE)Step 8: Heatmap
Heatmaps visualize the relative abundance of features across samples or groups. Features can be clustered using hierarchical clustering (default) or by chemical distances (e.g., DreaMS), following the concept of Qemistree (Tripathi et al., 2021, Nat Chem Biol).
Key options for GenerateHeatmapInputs():
| Parameter | Options | Description |
|---|---|---|
summarize |
"fold_change", "mean"
|
What values to display |
control_group |
group name | Reference for fold-change calculation |
normalization |
"None", "Log", "Z", "Meancentered"
|
Which normalization to use |
distance |
"dreams", "cosine", "m2ds", NULL
|
Chemical distance for row clustering |
Important: When using
distance, do NOT wrappheatmap()inpdf()/dev.off(). Instead, usepheatmap(filename = "output.pdf")to avoid side-effect plot contamination.
8.1 Fold-change heatmap with chemical clustering
Note on Inf values: Fold-change calculations can produce
Infor-Infwhen a feature has zero abundance in the control group (log2(x / 0) = Inf). We cap these to the maximum finite fold-change value before plotting.
library(pheatmap)
# Generate heatmap inputs using fold change and DreaMS chemical distance
heatmap_inputs <- GenerateHeatmapInputs(
mmo,
summarize = 'fold_change',
control_group = 'ctrl',
normalization = 'None',
distance = 'dreams'
)
# Handle Inf/NaN in fold-change matrix (from features absent in control)
fc_mat <- as.matrix(heatmap_inputs$FC_matrix)
finite_max <- max(abs(fc_mat[is.finite(fc_mat)]), na.rm = TRUE)
fc_mat[is.infinite(fc_mat) & fc_mat > 0] <- finite_max
fc_mat[is.infinite(fc_mat) & fc_mat < 0] <- -finite_max
fc_mat[is.nan(fc_mat)] <- 0
# Plot heatmap — use filename argument to avoid PDF contamination
pheatmap(
mat = fc_mat,
clustering_distance_rows = heatmap_inputs$dist_matrix,
clustering_method = "average",
cellwidth = 100,
cellheight = 0.3,
treeheight_row = 100,
fontsize_row = 3,
fontsize_col = 15,
scale = 'none',
width = 10, height = 10
)8.2 Z-score heatmap with default clustering
heatmap_inputs_z <- GenerateHeatmapInputs(
mmo,
summarize = 'mean',
normalization = 'Z',
distance = 'dreams'
)
# Omit clustering_distance_rows to use default euclidean clustering
pdf('heatmap_Zscore.pdf', width = 10, height = 10)
pheatmap(
mat = heatmap_inputs_z$FC_matrix,
clustering_method = "average",
cellwidth = 100,
cellheight = 0.3,
treeheight_row = 100,
fontsize_row = 3,
fontsize_col = 15,
scale = 'none',
width = 10, height = 10
)
dev.off()8.3 Targeted heatmap: glucosinolates only
By filtering to specific features, you can create focused heatmaps for compound classes of interest. When custom annotations are available, compound names can be shown as row labels.
heatmap_inputs_GLS <- GenerateHeatmapInputs(
mmo,
summarize = 'mean',
normalization = 'Z',
distance = 'dreams',
filter_id = TRUE,
id_list = GLSs
)
pheatmap(
mat = heatmap_inputs_GLS$FC_matrix,
clustering_method = "average",
cellwidth = 100,
cellheight = 8,
treeheight_row = 100,
fontsize_row = 8,
fontsize_col = 15,
scale = 'none',
labels_row = heatmap_inputs_GLS$row_label,
filename = "260320_demo/plot/Heatmap/Heatmap_Mean_Z_GLS.pdf",
width = 10, height = 10
)Step 9: CANOPUS class overrepresentation analysis (cORA)
Which chemical classes are over-represented among DAMs? This is analogous to Gene Ontology enrichment analysis in transcriptomics. eCOMET uses NPC and ClassyFire annotations from CANOPUS to test for enrichment via Fisher’s exact test.
9.1 Single-list enrichment
For a single set of features, CanopusListEnrichmentPlot() tests enrichment across all classification levels and generates a detailed plot.
# Detailed enrichment plot (style 1)
cORA_1 <- CanopusListEnrichmentPlot(
mmo, DAMs_up$ctrl_vs_sl1.up,
pthr = 0.1,
height = 6, width = 6,
save_output = FALSE
)
cORA_1$plot
# Compact enrichment plot showing top N terms (style 2)
cORA_2 <- CanopusListEnrichmentPlot_2(
mmo, DAMs_up$ctrl_vs_sl1.up,
pthr = 0.1,
topn = 10,
height = 6, width = 6,
save_output = FALSE
)
cORA_2$plot9.2 Multi-comparison cORA
When comparing enrichment across multiple DAM lists (e.g., up in sl1 vs. up in le1), use CanopusLevelEnrichmentPlot() for a single classification level, or CanopusAllLevelEnrichmentPlot() for all levels.
Available classification levels:
| Level | Description |
|---|---|
NPC_pathway |
NPC biosynthetic pathway |
NPC_superclass |
NPC superclass |
NPC_class |
NPC class |
ClassyFire_superclass |
ClassyFire superclass |
ClassyFire_class |
ClassyFire class |
ClassyFire_subclass |
ClassyFire subclass |
ClassyFire_level5 |
ClassyFire level 5 |
ClassyFire_most_specific |
ClassyFire most specific class |
# Single level: NPC class
cORA_multi_1 <- CanopusLevelEnrichmentPlot(
mmo, DAMs_up,
term_level = 'NPC_class',
pthr = 0.1,
pval = 'pval',
save_output = FALSE
)
cORA_multi_1$plot
# All NPC levels
cORA_multi_2 <- CanopusAllLevelEnrichmentPlot(
mmo, DAMs_up,
term_level = 'NPC',
pthr = 0.1,
save_output = FALSE,
width = 8, height = 12
)
cORA_multi_2$plot Step 10: Correlation analysis
To find metabolites linked to a phenotypic outcome (e.g., herbivore performance), we can screen for correlations between feature abundance and a phenotype variable in the metadata.
In this example, we correlate each feature’s abundance with Spodoptera litura performance (sl column in metadata) using Spearman rank correlation.
sl_cor <- ScreenFeaturePhenotypeCorrelation(
mmo,
phenotype = 'sl',
groups = c('sl1'),
model = 'spearman',
normalization = 'Z'
)
head(sl_cor)Step 11: Save and share the mmo object
The mmo object can be saved to a file and shared with collaborators. This preserves all data, annotations, normalizations, and analysis results.
# Save
SaveMMO(mmo, '260320_demo/output/mmo.RData')
# Load in a future session
# mmo <- LoadMMO('260320_demo/output/mmo.RData')Summary
This tutorial covered the complete eCOMET workflow for treatment-based metabolomics studies:
| Step | Function(s) | Purpose |
|---|---|---|
| 1 | system.file() |
Locate tutorial data |
| 2 | GetMZmineFeature() |
Create mmo object |
| 3 |
ReplaceZero(), *Normalization()
|
Preprocessing |
| 4 |
AddSiriusAnnot(), AddCustomAnnot(), AddChemDist()
|
Annotations |
| 5 |
PCAplot(), PLSDAplot()
|
Dimensionality reduction |
| 6 |
PairwiseComp(), GetDAMs()
|
Differential accumulation |
| 7 |
VolcanoPlot(), Venn/UpSet |
DAM visualization |
| 8 |
GenerateHeatmapInputs() + pheatmap()
|
Heatmaps |
| 9 | Canopus*EnrichmentPlot() |
Chemical class enrichment |
| 10 | ScreenFeaturePhenotypeCorrelation() |
Phenotype correlation |
| 11 |
SaveMMO() / LoadMMO()
|
Save/share results |
For more information, see the eCOMET documentation or the Interspecific Study Tutorial for comparing metabolomes across multiple species.