Interspecific Ecometabolomics Tutorial • ecomet

This Relative Paths will work if you open the .Rjroject.

library(here)
here()

library(ecomet)
library(dplyr)
library(ggplot2)

source("~/Library/CloudStorage/OneDrive-SmithsonianInstitution/One_Drive_BackUps_Local_Mac_Files/CODE_GIT_HUB_2017_Aug_31/eCOMET/R/250813_eCOMET_build_V2_MC.R")

Background

In this tutorial, we will demonstrate how to use the eCOMET package for analyzing metabolic data in a study that analyzes samples across biological species. We start by comparing chemical richess and diversity across samples. We then compare the composition of samples using dimensionality reduction (hierachical clustering, PCAs, NMDS) to display how samples within groups compare to one another.

Unlike in a treatment based study, where the expectation is that many compounds will be shared between treatment groups, and the goal is to identify differential expressed metabolites. In studies comparing across species, we are often presented with a case where the vast majority of compounds are not shared between groups (i.e species of plants). In these cases, metrics of chemical similarity can become relatively meaningless if we rely completely on shared MZ-RT features. To address this, chemical ecologists have integrated tools such a feature based molecular networking and structurally weighted scores of chemical similarity to account for the fact that although two samples do not contain the same mz-rt feature, they both are producing compounds from the same biosynthetic origin and thus, should be considered more chemically similar than another sample that produces compounds from a different class.

The tutorial coveres X parts:

we want to compare across species. *Note this assumes that all features re indepenent of one another. richness sirius annotation stacked barplot cumualitive richness
PCA and Brey Curtus
We illustrate how different species of trees share a small fraction of there mz-rt features and show how this can be usefull for identifying chemotypes.
We introduce feature based molecular networking and compound dendrograms that group mz-features into structurally similar groups on a tree.

show a feature based molecular network…

Show different trees. actual based on the standards chemodev cosine, dreams,

#dendrogram of

We illustrate how a feature abundance matrix and a chemical dendrogram/similarity matrix can be used to calculate:

Structurally weighted Chemical Diversity
Structurally weighted Chemical Similarity between samples
richness by pathway NPCstacked barplot - potentially add in pathway stacked

#Questions for MS

how do deal with correlation groups and ion identity? pick M+H/-h delete from mmo ignore cor group entirely…

What would figure 5 look like for the Demo….

#1) Created the mmo object

#demo_feature <- here("tutorial/tutorial_2_interspecific/Ecomet_Interspecific_Demo_full_feature_table.csv")
#demo_metadata <- here("tutorial/tutorial_2_interspecific/Ecomet_Interspecific_Demo_metadata_no_blanks.csv")
#demo_sirius_formula <- here("tutorial/tutorial_2_interspecific/sirius/formula_identifications.tsv")
#demo_sirius_structure <- here("tutorial/tutorial_2_interspecific/sirius/structure_identifications.tsv")
#demo_dreams <- here("tutorial/tutorial_2_interspecific/Ecomet_Interspecific_Demo_dreams_sim_dreams.csv")


#demo_feature <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/FLS_500/EcoMET_Interspecific_full_feature_table.csv"
#demo_metadata <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/FLS_100/EcoMET_Interspecific_metadata.csv"
#demo_sirius_formula <- here("tutorial/tutorial_2_interspecific/sirius/formula_identifications.tsv")
#demo_sirius_structure <- here("tutorial/tutorial_2_interspecific/sirius/structure_identifications.tsv")
#demo_dreams <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/FLS_500/EcoMET_Interspecific_edges_dreams.csv"


# demo_feature <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_100/EcoMET_Interspecific_full_feature_table.csv"
# demo_metadata <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_100/EcoMET_Interspecific_metadata_blank.csv"
# #demo_sirius_formula <- here("tutorial/tutorial_2_interspecific/sirius/formula_identifications.tsv")
# #demo_sirius_structure <- here("tutorial/tutorial_2_interspecific/sirius/structure_identifications.tsv")
# demo_dreams <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_100/EcoMET_Interspecific_edges_dreams_dreams.csv"
# 
# demo_feature <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled/EcoMET_Interspecific_full_feature_table.csv"
# demo_metadata <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_100/EcoMET_Interspecific_metadata_blank.csv"
# #demo_sirius_formula <- here("tutorial/tutorial_2_interspecific/sirius/formula_identifications.tsv")
# #demo_sirius_structure <- here("tutorial/tutorial_2_interspecific/sirius/structure_identifications.tsv")
# demo_dreams <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled/EcoMET_Interspecific_edges_dreams_dreams.csv"


demo_feature <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled_Standards/EcoMET_Interspecific_full_feature_table.csv"
demo_metadata <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled_Standards/EcoMET_Interspecific_metadata_blank.csv"
#demo_sirius_formula <- here("tutorial/tutorial_2_interspecific/sirius/formula_identifications.tsv")
#demo_sirius_structure <- here("tutorial/tutorial_2_interspecific/sirius/structure_identifications.tsv")
demo_dreams <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled_Standards/EcoMET_Interspecific_edges_dreams_dreams.csv"



demo_feature <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled_Standards_Max_Dreams/EcoMET_Interspecific_full_feature_table.csv"
demo_metadata <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled_Standards/EcoMET_Interspecific_metadata_blank.csv"
#demo_sirius_formula <- here("tutorial/tutorial_2_interspecific/sirius/formula_identifications.tsv")
#demo_sirius_structure <- here("tutorial/tutorial_2_interspecific/sirius/structure_identifications.tsv")
demo_dreams <- "/Users/dlforrister/Downloads/Test_Panama_New_Data/mzmine_output/Madidi_FLS_250_Gap_Filled_Standards_Max_Dreams/EcoMET_Interspecific_edges_dreams_dreams.csv"



# Initialize eCOMET object
mmo <- GetMZmineFeature(mzmine_dir=demo_feature, metadata_dir = demo_metadata, group_col = 'Species_binomial',sample_col = "filename")
mmo <- AddFeatureInfo(mmo = mmo,full_feature_csv=demo_feature)
# Add normalizations
mmo <- ReplaceZero(mmo, method = 'one') # Replace 0 and NA values by 1
mmo <- LogNormalization(mmo) # Add log-transformed area
mmo <- ZNormalization(mmo) # Add Zscore

Now the mmo object contains the Feature x Abundance matrix where the first two columns identify individual metabolites and the remaining columns represent the abundance in each sample

mmo$feature_data

mmo$feature_info %>% arrange(feature_group)

We can add the feature x feature pairwise distance to the mmo object based on the output of MzMine. This can accept modified cosine, dreaMS and MS2Deepscore

# Add Dreams distance
mmo <- AddChemDist(mmo, dreams_dir = demo_dreams)

mmo$dreams.dissim[1:10,1:10]

Sample Metadata: In this tutorial we are comparing 5 species of hickory along side a sample of Eastern black walnut.

mmo $metadata$ Species

There are also two types of samples “pools” and "samples. Pools represent species pools where each individual sample for each spcies was combined and run for ms/ms data analysis. Samples represent MS1 data for anindividual plant.

mmo$metadata

Note we can subset an mmo object using filter mmo to filte based on a list of samples, a list of groups or a list of features

In this tutorial we will subset to only individual samples removing the pooled samples

Build a dendrogram three options…

We are going to randomly pick 100 feature groups in order to illustrate how to generate how to make a dendrogram

# samples_single <- mmo$metadata$sample[mmo$metadata$speciesCode == "ingalb"]
# mmo_single <- filter_mmo(mmo,sample_list  = samples_single,drop_empty_feat = TRUE)
# 
# View(mmo_single$feature_info %>% arrange(feature_group,`ion_identities:iin_id`))
# 
table(mmo_single$feature_info$feature_group)
table(table(mmo_single$feature_info$feature_group))

samples_single <- mmo$metadata$sample[mmo$metadata$speciesCode %in% c("stanard1","stanard2")]
mmo_single <- filter_mmo(mmo,sample_list  = samples_single,drop_empty_feat = TRUE)
 

table(mmo_single$feature_info$feature_group)


feature_info_abundance <- mmo_single$feature_info %>%
  mutate(id = as.character(id)) %>%
  left_join(
    mmo_single$feature_data[, c(1, 3:4)] %>%
      mutate(id = as.character(id)) %>%
      group_by(id) %>%
      summarise(max = max(I1_C1_1, I1_C2_1), .groups = "drop"),
    by = "id"
  )

abundant_standard_feature <- feature_info_abundance$id[feature_info_abundance$max > 1e6]

mmo_single <- filter_mmo(mmo,feature_list = abundant_standard_feature,drop_empty_feat = TRUE)

mmo_single
table(mmo_single$feature_info$feature_group)

mmo_single$feature_info %>% filter(feature_group== "253")

mmo_single$feature_info %>% filter(rt > 6.0 & rt < 6.2)
mmo_single$feature_info %>% filter(mz > 139.02 & mz < 139.04)

mmo_single$feature_data %>% filter(id %in% as.character(mmo_single$feature_info %>% filter(feature_group== "255") %>% pull(id))) %>% select(id,feature,I1_C1_1,I1_C2_1) %>% arrange(-I1_C2_1)



mmo_single_dendrogram_correlation <- FeatureDendrogram(mmo=mmo_single,distance = "dreams",
  method = "average",
  ion_identity = "correlation",
  corr_col = "feature_group",
  iin_col  = "ion_identities:iin_id",
  within_group_dist = 0,
  save_output = FALSE,
  outprefix = "feature_tree"
)
plot(mmo_single_dendrogram_correlation$phylo, type = "fan", cex = 0.4, label.offset = 0.002,main="Dreams_Distance")

all_feature_groups <- unique(mmo_individual $feature_info$ feature_group[!is.na(mmo_individual $feature_info$ feature_group)])

sampled_feature_groups <- sample(all_feature_groups,50)

sampled_features <- mmo_individual $feature_info$ id[mmo_individual $feature_info$ feature_group %in% sampled_feature_groups]

mmo_individual_sampled_features <- filter_mmo(mmo_individual,feature_list = sampled_features,drop_empty_feat = TRUE)

mmo_single_dendrogram <- FeatureDendrogram(mmo=mmo_individual_sampled_features,distance = “dreams”, method = “average”, ion_identity = “none”, corr_col = “feature_group”, iin_col = “ion_identities:iin_id”, within_group_dist = 0, save_output = FALSE, outprefix = “feature_tree” )

plot(mmo_single_dendrogram$phylo, type = “fan”, cex = 0.4, label.offset = 0.002,main=“Dreams_Distance”)

table(mmo_individual_sampled_features$feature_info$feature_group)
table(mmo_individual_sampled_features$feature_info$`ion_identities:iin_id`)

#highlight_groups <- c("249", "41", "16", "195", "604")

highlight_groups <- c("12", "19", "24", "50", "74")

highlight_groups <- as.character(highlight_groups)

phy <- mmo_single_dendrogram$phylo
fi  <- mmo_individual_sampled_features$feature_info

fi$id <- as.character(fi$id)
tips  <- as.character(phy$tip.label)

idx <- match(tips, fi$id)
if (anyNA(idx)) {
  stop("Some phylo tip labels not found in feature_info$id")
}

#fg <- as.character(fi$feature_group[idx])
fg <- as.character(fi$`ion_identities:iin_id`[idx])
fg[is.na(fg) | fg == "" | fg == "NA"] <- "other"

# base color for non-highlighted tips
tip_cols <- rep("grey80", length(fg))

# colors ONLY for highlighted groups
hl_cols <- setNames(
  grDevices::rainbow(length(highlight_groups)),
  highlight_groups
)

# apply highlight colors
sel <- fg %in% highlight_groups
tip_cols[sel] <- hl_cols[fg[sel]]


plot(
  phy,
  type = "fan",
  tip.color = tip_cols,
  cex = 0.4,
  label.offset = 0.002,
  main = "Dreams distance (selected feature groups highlighted)"
)

mmo_single_dendrogram_correlation <- FeatureDendrogram(mmo=mmo_individual_sampled_features,distance = "dreams",
  method = "average",
  ion_identity = "correlation",
  corr_col = "feature_group",
  iin_col  = "ion_identities:iin_id",
  within_group_dist = 0,
  save_output = FALSE,
  outprefix = "feature_tree"
)

plot(mmo_single_dendrogram_correlation$phylo, type = "fan", cex = 0.4, label.offset = 0.002,main="Dreams_Distance + Force by Correlation Group ID")


phy <- mmo_single_dendrogram_correlation$phylo
fi  <- mmo_individual_sampled_features$feature_info

fi$id <- as.character(fi$id)
tips  <- as.character(phy$tip.label)

idx <- match(tips, fi$id)
if (anyNA(idx)) {
  stop("Some phylo tip labels not found in feature_info$id")
}

fg <- as.character(fi$feature_group[idx])
fg[is.na(fg) | fg == "" | fg == "NA"] <- "other"

# base color for non-highlighted tips
tip_cols <- rep("grey80", length(fg))

# colors ONLY for highlighted groups
hl_cols <- setNames(
  grDevices::rainbow(length(highlight_groups)),
  highlight_groups
)

# apply highlight colors
sel <- fg %in% highlight_groups
tip_cols[sel] <- hl_cols[fg[sel]]


plot(
  phy,
  type = "fan",
  tip.color = tip_cols,
  cex = 0.4,
  label.offset = 0.002,
  main = "Dreams_Distance + Force by Correlation Group ID"
)

mmo_single_dendrogram_ion_identity <- FeatureDendrogram(mmo=mmo_individual_sampled_features,distance = "dreams",
  method = "average",
  ion_identity = "ion_identity_network",
  corr_col = "feature_group",
  iin_col  = "ion_identities:iin_id",
  within_group_dist = 0,
  save_output = FALSE,
  outprefix = "feature_tree"
)

plot(mmo_single_dendrogram_ion_identity$phylo, type = "fan", cex = 0.4, label.offset = 0.002,main="Dreams_Distance + Force by Ion Identity Group ID")

phy <- mmo_single_dendrogram_ion_identity$phylo
fi  <- mmo_individual_sampled_features$feature_info

fi$id <- as.character(fi$id)
tips  <- as.character(phy$tip.label)

idx <- match(tips, fi$id)
if (anyNA(idx)) {
  stop("Some phylo tip labels not found in feature_info$id")
}

#fg <- as.character(fi$feature_group[idx])
fg <- as.character(fi$`ion_identities:iin_id`[idx])
fg[is.na(fg) | fg == "" | fg == "NA"] <- "other"

# base color for non-highlighted tips
tip_cols <- rep("grey80", length(fg))

# colors ONLY for highlighted groups
hl_cols <- setNames(
  grDevices::rainbow(length(highlight_groups)),
  highlight_groups
)

# apply highlight colors
sel <- fg %in% highlight_groups
tip_cols[sel] <- hl_cols[fg[sel]]


plot(
  phy,
  type = "fan",
  tip.color = tip_cols,
  cex = 0.4,
  label.offset = 0.002,
  main = "Dreams_Distance + Force by Ion Identity Group ID"
)

#samples_keep <- mmo$metadata$sample[mmo$metadata$Type != "pool"]

#samples_keep <- mmo$metadata$sample[mmo$metadata$Keep_Demo == "Yes"]
samples_keep <- mmo$metadata$sample[mmo$metadata$Keep != "No"]
mmo_individual <- filter_mmo(mmo,sample_list  = samples_keep,drop_empty_feat = TRUE)

mmo_individual$metadata


mmo_individual <- FeaturePresence(mmo_individual,threshold = 1)

library(ggvenn)

fp <- mmo_individual$feature_presence
md <- mmo_individual$metadata

# 1) Identify which columns in feature_presence are sample columns
#    (everything except the feature ID column)
sample_cols <- setdiff(colnames(fp), "feature")

# 2) Keep metadata rows that correspond to samples in the presence table
group_df <- mmo_individual$metadata[, c("sample", "group")]
groups <- unique(group_df$group)


# 3) For each group, collect features present in ANY sample in that group

VennInput <- list()

for (g in groups) {

  # samples that belong to this group
  group_samples <- group_df$sample[group_df$group == g]

  # subset the presence matrix to just those samples
  sub <- mmo_individual$feature_presence[, c("feature", group_samples), drop = FALSE]

  # feature is present in the group if it is >0 in at least one sample
  present_in_group <- apply(sub[, -1, drop = FALSE], 1, function(x) any(x > 0))

  # store the feature IDs
  VennInput[[g]] <- sub$feature[present_in_group]
}

# Quick check: number of features per group
sapply(VennInput, length)

# 4) Plot Venn diagram
p <- ggvenn(
  VennInput,
  stroke_size = 0.5,
  set_name_size = 4,
  show_percentage = FALSE
) +
  theme(legend.position = "none")

p

# 5) Save
if (!dir.exists("plot")) dir.create("plot", recursive = TRUE)
ggsave("plot/Venn_FeaturePresence.pdf", plot = p, height = 5, width = 5)



require(ggvenn)
library(ggvenn)
VennInput <- list(
  
)

ggvenn(VennInput, stroke_size = 0.5, set_name_size = 4, show_percentage = FALSE) +
  theme(legend.position = "none") 
ggsave("plot/Venn_Upreg.pdf", height = 5, width = 5)

2) Visualize distance between samples with PLSDA

library(colorspace)

make_pal <- function(n) qualitative_hcl(n, palette = "Dark 3")

n_groups <- length(unique(mmo_individual$metadata$Species_binomial))  # change column
colors <- make_pal(n_groups)

#PLSDAplot(mmo_individual, color = colors, outdir = "~/Downloads/PLSDA.pdf")

beta_diversity_dreams_bray <- GetBetaDiversity(mmo_individual, method = 'bray', 
 normalization = 'Log', distance = 'dreams', filter_feature = FALSE)

beta_diversity_dreams_bray

# Convert to 'dist' object
d <- as.dist(beta_diversity_dreams_bray)

# Run hierarchical clustering
hc <- hclust(d, method = "ward.D2")   # or "ward.D2", "complete"
plot(hc)


library(ape)
library(colorspace)


# convert to phylo
tr <- as.phylo(hc)

# ---- map species to tree tip labels (EDIT these column names) ----
md <- mmo_individual$metadata
# md must contain: sample ID that matches tr$tip.label, and species

tip_ids <- tr$tip.label

species_vec <- md$Species_binomial[match(tip_ids, md$sample)]
if (anyNA(species_vec)) {
  bad <- tip_ids[is.na(species_vec)]
  stop("Species missing for some tree tips. Example missing IDs:\n",
       paste(head(bad, 20), collapse = "\n"))
}
species_vec <- as.character(species_vec)

# palette
sp_levels <- sort(unique(species_vec))
sp_cols <- setNames(qualitative_hcl(length(sp_levels), "Dark 3"), sp_levels)

tip_cols <- unname(sp_cols[species_vec])

# ---- plot ----
par(mar = c(2, 2, 2, 10))  # extra right margin for long labels
plot(tr, type = "phylogram", cex = 0.6, tip.color = tip_cols,
     main = "Hierarchical clustering (DREAMS–Bray)")

legend("topleft", legend = sp_levels, col = sp_cols, pch = 15,
       cex = 0.6, bty = "n")

mmo_individual_dendrogram_correlation <- FeatureDendrogram(mmo=mmo_individual,distance = "dreams",
  method = "average",
  ion_identity = "correlation",
  corr_col = "feature_group",
  iin_col  = "ion_identities:iin_id",
  within_group_dist = 0,
  save_output = FALSE,
  outprefix = "feature_tree"
)

beta_diversity_dreams_Gen_Uni <- GetBetaDiversity(mmo_individual, method = 'Gen.Uni', 
 normalization = 'Log', distance = 'dreams', filter_feature = FALSE)

dim(beta_diversity_dreams_Gen_Uni)

# Convert to 'dist' object
d_Gen_Uni <- as.dist(beta_diversity_dreams_Gen_Uni)

# Run hierarchical clustering
hc_Gen_Uni <- hclust(d_Gen_Uni, method = "average")   # or "ward.D2", "complete"
plot(hc_Gen_Uni, main = "Hierarchical clustering Gen_Uni")

3) Calculate Beta Diversity between samples

#First we calculate dreams
beta_diversity_dreams <- GetBetaDiversity(mmo_individual, method = 'CSCS', 
 normalization = 'None', distance = 'dreams', filter_feature = FALSE)