Tutorial 3: Compound Dendrograms and Molecular Networks • ecomet

Overview

This tutorial covers how to use eCOMET to build and visualize structural relationships among detected metabolite features. The two main outputs are a compound dendrogram — a hierarchical tree grouping features by chemical similarity — and a molecular network suitable for export to Cytoscape or iTOL.

Both require the same four inputs:

MZmine feature table — the full feature abundance matrix exported from MZmine
Sample metadata — a CSV mapping sample filenames to group labels
DreaMS similarity scores — pairwise spectral similarity output from MZmine molecular networking, used to compute chemical distances among features
SIRIUS/CANOPUS predictions — compound class annotations used to color and annotate the dendrogram and network nodes

Required packages

# Install or update eCOMET if needed:
# pak::pak("phytoecia/eCOMET")

library(ecomet)
library(dplyr)
library(ggplot2)
library(ape)
library(colorspace)
library(stringr)

1. Load data and build the mmo object

Point eCOMET to the four input files and build the mmo object in one block. Everything downstream reads from this object.

data_dir <- system.file("extdata/tutorials/interspecific", package = "ecomet")
stopifnot(nzchar(data_dir))  # fail loudly if package data is missing

demo_feature          <- file.path(data_dir, "Ecomet_Interspecific_Demo_full_feature_table.csv")
demo_metadata         <- file.path(data_dir, "Ecomet_Interspecific_Demo_metadata_no_blanks.csv")
demo_sirius_formula   <- file.path(data_dir, "canopus_formula_summary.tsv")
demo_sirius_structure <- file.path(data_dir, "structure_identifications.tsv")
demo_dreams           <- file.path(data_dir, "Ecomet_Interspecific_Demo_dreams_sim_dreams.csv")

# 1a. Feature abundance matrix + metadata
mmo <- GetMZmineFeature(
  mzmine_dir   = demo_feature,
  metadata_dir = demo_metadata,
  group_col    = "Species_binomial",
  sample_col   = "filename"
)

# 1b. SIRIUS/CANOPUS compound class predictions
mmo <- AddSiriusAnnot(
  mmo,
  canopus_structuredir = demo_sirius_structure,
  canopus_formuladir   = demo_sirius_formula
)

# 1c. DreaMS pairwise chemical distances
mmo <- AddChemDist(mmo, dreams_dir = demo_dreams)

After this block:

mmo$feature_data — feature × sample abundance matrix
mmo$metadata — sample group assignments
mmo$sirius_annot — CANOPUS class predictions per feature
mmo$dreams.dissim — pairwise chemical dissimilarity matrix among features

2. Filtering the mmo object

Before building a dendrogram or molecular network it is often useful to restrict the dataset to a subset of samples, groups, or features. filter_mmo() handles all three cases and keeps every slot in the mmo object — abundance matrix, metadata, distance matrices, and annotations — consistently aligned.

The three filtering axes can be combined or used independently:

By group — retain only samples belonging to certain biological groups (e.g. a single species, a treatment arm)
By sample — retain a hand-picked list of individual sample IDs
By feature — retain a specific set of feature IDs, for example all compounds derived from a group in an annotation table

By default, features with no detected abundance in the retained samples are dropped automatically (drop_empty_feat = TRUE). You can raise or lower the zero-detection threshold with empty_threshold.

2.1 Filter by group

Use group_list to keep only samples that belong to certain groups. This is useful when you want to build a dendrogram for a single species or compare a subset of species side by side.

# See what groups are available
unique(mmo$metadata$group)

# Keep two species
mmo_two_sp <- filter_mmo(
  mmo,
  group_list = c("Annona RSS-85", "Beilschmiedia tovarensis")
)

2.2 Filter by sample

Use sample_list when you need finer control than groups allow — for example, removing a single outlier replicate or selecting samples by a condition not captured in the group column.

# Keep a hand-picked set of samples
samples_to_keep <- c("annRSS85_MDP0005", "annRSS85_MDP0091", "annRSS85_MDP0307")

mmo_subset <- filter_mmo(
  mmo,
  sample_list = samples_to_keep
)

sample_list and group_list are mutually exclusive — supply one or the other, not both.

2.3 Filter by feature: flavonoids example

Use id_list to restrict the dataset to a set of feature IDs. The most common source for this list is the SIRIUS annotation table — you pull the IDs of features that match a compound class of interest, then filter to those features.

Here we extract only features predicted to belong to the Flavonoid class:

# Pull feature IDs annotated as Flavonoids by CANOPUS
flavonoid_ids <- mmo$sirius_annot %>%
  filter(str_detect(`ClassyFire#most specific class`, "Flavonoid")) %>%
  pull(id)


length(flavonoid_ids)  # how many features matched

mmo_flavonoids <- filter_mmo(
  mmo,
  id_list = flavonoid_ids
)

The resulting object contains only flavonoid features, but all sample columns and group structure are preserved. The distance matrix is also trimmed to the retained features, so downstream dendrogram and network steps work without any additional changes.

You can combine feature and group filtering in a single call:

# Flavonoids in two species only
mmo_flv_subset <- filter_mmo(
  mmo,
  id_list    = flavonoid_ids,
  group_list = c("Annona RSS-85", "Beilschmiedia tovarensis")
)

3. Compound dendrogram from DreaMS distances

A compound dendrogram groups features by their pairwise chemical similarity. Features that are structurally related end up on nearby branches; features from unrelated compound classes end up far apart. This is a useful first visualization for understanding the chemical space covered by your dataset and for identifying which compound classes are well-represented.

3.1 Build the dendrogram

FeatureDendrogram() takes the mmo object and the name of a stored distance matrix. Here we use the DreaMS dissimilarity matrix added in section 1. No filtering is applied so the tree includes all features that have an entry in the distance matrix.

tree_dreams <- FeatureDendrogram(
  mmo,
  distance = "dreams",
  method   = "average"   # UPGMA — sensible default for spectral similarity trees
)

# The return value is a list; the core objects are:
# tree_dreams$hclust     — the hclust object for further manipulation
# tree_dreams$phylo      — ape phylo object (for iTOL export or ape functions)
# tree_dreams$dist_used  — the distance matrix actually used

3.2 Plot colored by NPC compound class

PlotFeatureDendrogram() takes the tree and colors each tip by a column in the SIRIUS annotation table. The default column is "NPC#pathway", which gives a broad chemical class label (e.g. Terpenoids, Alkaloids, Shikimates and Phenylpropanoids). Features without a CANOPUS prediction are shown in grey.

Tip labels are hidden by default (show_tip_labels = FALSE) because feature IDs are not meaningful at a glance and a large tree becomes unreadable with them. The color legend is the primary way to read the tree.

#need to make this circular


PlotFeatureDendrogram(
  tree    = tree_dreams,
  mmo     = mmo,
  color_by = "NPC#pathway"
)

If you want to zoom in on a specific class, filter the mmo object first and rebuild the tree:

# Terpenoid-only dendrogram
terpenoid_ids <- mmo$sirius_annot |>
  filter(grepl("Terpenoid", `NPC#pathway`)) |>
  pull(id)

tree_terp <- FeatureDendrogram(
  mmo,
  distance = "dreams",
  features = terpenoid_ids
)

PlotFeatureDendrogram(
  tree     = tree_terp,
  mmo      = mmo,
  color_by = "NPC#pathway",
  main     = "Terpenoid features — DreaMS dendrogram"
)

To save the plot to a PDF:

PlotFeatureDendrogram(
  tree        = tree_dreams,
  mmo         = mmo,
  color_by    = "NPC#pathway",
  save_output = TRUE,
  outprefix   = "output/compound_dendro_dreams"
)

And to export the tree topology for use in iTOL or FigTree:

tree_dreams <- FeatureDendrogram(
  mmo,
  distance    = "dreams",
  save_newick = TRUE,
  outprefix   = "output/compound_dendro_dreams"
)
# Writes output/compound_dendro_dreams.nwk

4. Ion identity networking and the IIN-constrained dendrogram

4.1 What is ion identity networking?

Untargeted metabolomics detects ionized forms of compounds, not compounds themselves. A single metabolite can appear in the feature table as multiple entries — for example as [M+H]+, [M+Na]+, and [M+2H]2+ — each at a slightly different m/z but with identical retention time and correlated abundance across samples. These are adducts of the same compound, not different compounds.

MZmine’s Ion Identity Networking (IIN) module identifies these groups by checking m/z differences that match known adduct mass shifts and confirming that the features co-elute and co-vary. Each confirmed group of adducts is assigned a shared ion_identities:iin_id.

In our interspecific dataset, IIN has identified 574 features belonging to 283 adduct groups out of 4,982 total features.

# Inspect the IIN columns in feature_info
mmo$feature_info |>
  select(id,
         `ion_identities:iin_id`,
         `ion_identities:ion_identities`) |>
  filter(!is.na(`ion_identities:iin_id`)) |> 
  
  ## order by  ion_identities:iin_id decreasing
  head(10)

4.2 Why IIN matters for the dendrogram

In the plain DreaMS dendrogram from section 3, adducts of the same compound appear as separate tips. Because DreaMS similarity is computed from MS2 spectra and adducts of the same compound produce nearly identical fragmentation, they will usually cluster together anyway — but not always. Noise, chimeric spectra, or missing MS2 can scatter adducts across branches.

The IIN-constrained dendrogram addresses this directly: pairs of features within the same IIN group are assigned a small fixed distance (within_group_dist = 0.01) before clustering. This guarantees that adducts collapse onto the same branch, and all higher-level topology is then driven by genuine chemical differences between compounds rather than by adduct artefacts.

4.3 Build the IIN-constrained dendrogram

tree_iin <- FeatureDendrogram(
  mmo,
  distance     = "dreams",
  method       = "average",
  ion_identity = "ion_identity_network",
  iin_col      = "ion_identities:iin_id",  # default — shown here for clarity
  within_group_dist = 0.01
)
# Prints: "Ion identity constraint applied: 283 groups, 574 features affected."

The returned object is identical in structure to the plain tree. The difference is in tree_iin$dist_used, where within-group pairs have been set to 0.01 before hclust ran. tree_iin$tip_map now contains the group assignment for every feature, which PlotFeatureDendrogram() uses to draw the group highlights.

4.4 Plot with IIN groups highlighted

Passing highlight_groups = TRUE draws a semi-transparent rectangle behind each IIN group in the tree. This lets you see at a glance which tips are adduct siblings and how those groups are distributed across the chemical-class landscape.

PlotFeatureDendrogram(
  tree             = tree_iin,
  mmo              = mmo,
  color_by         = "NPC#pathway",
  highlight_groups = TRUE
)

The rectangles mark the span of each adduct group on the tip axis. When the constraint has worked as intended, each rectangle should contain tips on consecutive branches. Groups that span a wide gap in the tree indicate features where the DreaMS spectra were dissimilar despite the adduct relationship — worth inspecting individually.

4.5 Correlation groups as an alternative

MZmine also assigns a feature_group to features that co-elute and show correlated abundance across samples, regardless of whether their m/z differences match known adducts. This is a broader grouping than IIN — in this dataset 2,283 features fall into 740 correlation groups. Correlation groups capture the same compound across different charge states or neutral losses that IIN might miss, but they can also group unrelated features that happen to co-vary.

tree_corr <- FeatureDendrogram(
  mmo,
  distance     = "dreams",
  method       = "average",
  ion_identity = "correlation",
  corr_col     = "feature_group"  # default — shown here for clarity
)

PlotFeatureDendrogram(
  tree             = tree_corr,
  mmo              = mmo,
  color_by         = "NPC#pathway",
  highlight_groups = TRUE,
  main             = "DreaMS dendrogram — correlation group constraints"
)

When to use which:

Use ion_identity_network when you want conservative, chemically validated grouping. Only confirmed adduct relationships are collapsed.
Use correlation when your data lacks IIN annotation or when you want a broader grouping that includes all co-eluting features regardless of adduct confirmation.
For the dendrogram and iTOL output, IIN is generally preferable because the groups have a direct chemical interpretation (same compound, different ion forms).

5. Export to iTOL

iTOL (Interactive Tree of Life, itol.embl.de) is a web-based tool for interactive tree visualisation. It lets you annotate, rotate, and style trees in the browser and export publication-quality figures. This is the recommended route for producing the circular compound dendrogram with NPC class coloring and prevalence bars shown in qemistree-style figures.

ExportITOL() generates three files from your tree and mmo object:

.nwk — the Newick tree file, uploaded to iTOL first
_colorstrip.txt — a coloured strip for each tip, coloured by NPC pathway class
_barplot.txt — a bar chart for each tip showing the proportion of samples in which that feature was detected

5.1 Basic export

ExportITOL(
  tree      = tree_dreams,
  mmo       = mmo,
  outprefix = "output/itol_dreams"
)
# Writes:
#   output/itol_dreams.nwk
#   output/itol_dreams_colorstrip.txt
#   output/itol_dreams_barplot.txt

For the IIN-constrained tree, the export is identical — just swap the tree object. The Newick topology will reflect the within-group constraint:

ExportITOL(
  tree      = tree_iin,
  mmo       = mmo,
  outprefix = "output/itol_dreams_iin"
)

5.2 Loading in iTOL

Go to itol.embl.de/upload.cgi and upload the .nwk file
Once the tree is displayed, drag and drop _colorstrip.txt onto the tree — a colour strip appears next to the tips
Drag and drop _barplot.txt — a bar chart appears outside the colour strip
In the Controls panel on the right, set Tree structure → Display mode to Circular
Under each dataset in the Datasets panel, click Display to toggle visibility

The legend for the NPC pathway colours is written into the annotation file header and will appear automatically in iTOL.

5.3 Choosing the annotation column

The default color_by = "NPC#pathway" gives broad chemical classes (Terpenoids, Alkaloids, Polyketides, etc.). You can use any column in mmo$sirius_annot for finer or coarser resolution:

# NPC superclass — one level finer than pathway
ExportITOL(tree_dreams, mmo,
           outprefix = "output/itol_npc_superclass",
           color_by  = "NPC#superclass")

# ClassyFire class
ExportITOL(tree_dreams, mmo,
           outprefix = "output/itol_classyfire",
           color_by  = "ClassyFire#class")

Available NPC/ClassyFire columns in mmo$sirius_annot:

Column	Resolution
`NPC#pathway`	Broadest (Terpenoids, Alkaloids, …)
`NPC#superclass`	Intermediate
`NPC#class`	Fine
`ClassyFire#superclass`	Broad chemical hierarchy
`ClassyFire#class`	Intermediate
`ClassyFire#most specific class`	Finest

6. Export a molecular network for Cytoscape

A molecular network represents features as nodes and pairwise chemical similarity as edges. Features that are structurally related are connected; isolated features have no close neighbours. Visualising this in Cytoscape lets you explore chemical space interactively, color nodes by compound class or abundance, and identify clusters of related metabolites.

ExportCytoscape() generates two files from the mmo object:

_edges.csv — one row per retained edge, with source, target, similarity, and distance_method columns
_nodes.csv — one row per feature, combining abundance statistics and all available annotations from mmo$feature_info and mmo$sirius_annot

6.1 Choosing an edge filter

The raw DreaMS distance matrix contains a similarity value for every feature pair. Most of these pairs are chemically unrelated and should not appear as edges. Two parameters control which edges are retained:

sim_threshold sets a minimum similarity floor. Any pair below this value is excluded regardless of how many other neighbours each node has. This is the primary lever — raise it for a sparser, higher-confidence network; lower it to include more distant structural neighbours.
top_k limits each node to its k most similar neighbours after threshold filtering. This prevents highly-connected hub features from overwhelming the layout and makes the network easier to read. An edge is kept if it falls in the top-k for either endpoint.

A warning is printed if the retained edge count exceeds 50 000, as Cytoscape becomes slow at that scale.

6.2 Basic export

# Default: similarity >= 0.7, no k limit
ExportCytoscape(
  mmo,
  distance  = "dreams",
  outprefix = "output/network_dreams"
)
# Writes:
#   output/network_dreams_edges.csv
#   output/network_dreams_nodes.csv

6.3 Sparser networks with top-k filtering

For large datasets or exploratory work, combining a moderate threshold with a small top_k produces a cleaner network where each feature is connected only to its closest structural neighbours:

# Each node connects to at most its 5 most similar neighbours, sim >= 0.6
ExportCytoscape(
  mmo,
  distance      = "dreams",
  outprefix     = "output/network_k5",
  sim_threshold = 0.6,
  top_k         = 5
)

If you have already filtered the mmo object to a compound class of interest (e.g. flavonoids from section 2.3), pass the filtered object to get a focused subnetwork:

ExportCytoscape(
  mmo_flavonoids,
  distance      = "dreams",
  outprefix     = "output/network_flavonoids",
  sim_threshold = 0.5   # lower threshold — fewer features, so more edges are tolerable
)

6.4 What is in the node table

The node table is built automatically from whatever is present in the mmo object — no columns are assumed. It always includes:

id — feature identifier
prevalence — proportion of samples in which the feature was detected
mean_<group> — one column per biological group with mean abundance

If mmo$feature_info is present, all its columns are appended (m/z, RT, IIN group, adduct type, MZmine network cluster ID, etc.). If mmo$sirius_annot is present, all annotation columns are appended (NPC pathway/superclass/class, ClassyFire hierarchy, structure names). Column names are sanitised to be Cytoscape-safe (special characters replaced with _).

6.5 Loading in Cytoscape

Import edges: File → Import → Network from File → select _edges.csv. In the import dialog, set source as Source Node and target as Target Node. similarity is imported as an edge attribute automatically.
Import node table: File → Import → Table from File → select _nodes.csv. Set id as the Key column. All other columns become node attributes.
Color by compound class: In the Style panel, click the box next to Fill Color, set the column to NPC_pathway (or any annotation column), and choose a discrete color mapping.
Size by prevalence: Map Size to prevalence using a continuous mapping to make widely-detected features visually prominent.
Layout: Apply a force-directed layout (Layout → Prefuse Force Directed) to pull similar features into clusters.