3  Working with networks

3.1 Metadata network

The construction of metadata network is accessible from the menu option Tools ➡️ Networks ➡️ Metadata Network.

Options for metadata network

In metadata networks, nodes representing metadata are connected to nodes representing peptides by the following relationships:

Metadata node names and relationships in starPepDB.
Metadata node Relationship
Origin produced_by
Target assessed_against
Function related_to
Database compiled_in
Crossref linked_to
Nterminus modified_by
UnusualAA constituted_by
Cterminus modified_by
Subcategory of another node is_a

3.2 Similarity network

The construction of similarity network is accessible from the menu option Tools ➡️ Networks ➡️ Similarity Network. To create a similarity network, we first recommended to configure the workflow using the Configuration Wizard and then press the button Run.

Chemical Space window (Nodes tab).

  1. Runs the workflow for building the similarity network.
  2. Opens Configuration Wizard (Sect.3.2.1).
  3. Changes between Nodes and Edges tabs.
  4. Applies PCA coordinates changes.
  5. Selects X and Y axis for PCA coordinates.
  6. PCA results panel.

Chemical Space window (Edges tab).

  1. Similarity threshold selector: After changing the value, it is necessary to press Apply.
  2. Network Density plot: Helps to decide a similarity threshold.

3.2.1 Configuration wizard

This section will show the configuration wizard for mapping and visualizing the Chemical Space.

3.2.1.1 Wizard Step 1: Input sequences

To remove redundant sequences, press Yes (recommended). Then, you can choose between local or global alignment, multiple substitution matrices, and a identity threshold.

Wizard Step 1: Input sequences.

3.2.1.2 Wizard Step 2: Feature extraction

If you already calculated a set of molecular descriptors, you can select the first option and press Next. If not, select the new descriptors to be calculated.

Wizard Step 2: Feature extraction

3.2.1.3 Wizard Step 3: Feature selection

If you plan to use all available descriptors, select the first option, and press Next. If not, select and configure the two-stage unsupervised feature selection method.

Wizard Step 3: Feature selection.

3.2.1.4 Wizard Step 4: Distance function.

Select the desired distance function and the standardization/normalization for the calculated descriptors.

Wizard Step 4: Distance function.

3.2.1.5 Wizard Step 5: Network model

For generating a network model, select between the Half-Space Proximal Network or the traditional Chemical Space Network/Similarity Network (not recommended for large datasets). For more details, please refer to the methodological paper.

Note

The position of nodes may be determined by the first two principal components of descriptor space. However, layout algorithms are recommended for a better rearrangement of nodes.

Wizard Step 5: Network model.

3.3 Network model options

After creating the network model, the following options are available.

Network model options.

  1. Positioning nodes.
  2. Adding/removing similarity edges.
  3. Embedding new peptides. When new peptides are projected, a network model will be opened into a new workspace.

3.4 Layout algorithms

A layout algorithm option may be opened from Tools ➡️ Network ➡️ Layout ➡️ [layout option]. The main graph layouts available are Fruchterman Reingold, ForceAtlas 2, Yifan Hu Proportional, and Random Layout. Any layout result could be adjusted using the options Rotate, Contraction, Expansion, Noverlap, and Label Adjust.

option.

3.5 Appearance

This panel is opened from Tools ➡️ Network ➡️ Appearance.

  1. Runs the appearance customization of either nodes or edges. If the Preview window is active in the Network visualization window, you need to press the button Refresh from mentioned window to update the network view.
  2. Selects the elements (either Nodes or Edges) whose appearance is to be changed.
  3. Applies configuration via Unique, Partition, or Ranking functions. For nodes, the calculated measures are available as Partition or Ranking options.
  4. Modifiable configurations panel. For color options, you need to press and drag the cursor to the desired color, or press right-click to open the color window.
  5. Changes the color of either Nodes or Edges (if edges are not taking the color of attached nodes, see sect. 2.5.2).
  6. Changes Nodes size (this option only applies to nodes).
  7. Changes label color of either or Nodes or Edges.
  8. Changes label size of either or Nodes or Edges.
  9. Resets current options.
  10. Resets customization to the default appearance.

3.6 Clustering

A clustering panel may be opened from Tools ➡️ Network ➡️ Clustering ➡️ [clustering algorithm]. For instance, k-means:

k-means clustering option.

Note

After running the clustering algorithm, you may visualize the network structure in Tools ➡️ Network ➡️ Appearance ➡️ Nodes ➡️ Partition.

3.7 Centrality

A centrality panel may be opened from Tools ➡️ Network ➡️ Centrality ➡️ [measure option]. For instance, Betweenness Centrality:

Betweenness Centrality option.

Note

After running the centrality measure, you may customize the appearance of nodes according to the centrality values in Tools ➡️ Network ➡️ Appearance ➡️ Nodes ➡️ Ranking.

3.8 Case study

In this case study, we will try to answer the following questions for a given sequence of interest.

>Example sequence
FLPAIVGAAGQFLPKIFCAISKKC

3.8.1 Which biological database holds peptides similar to the sequence of interest?

Step 1: Opens the Search panel with the commands Tools ➡️ Peptide search by ➡️ Single query sequence. Types the query sequence in the input field, configures the sequence alignment at 70% of sequence identity, and press Run. This search should return 25 peptide sequences and 595 metadata relationships.

Sequence search panel

Step 2: Creates the metadata network by selecting the option Database.

Options for metadata network

Step 3: In the graph table view of Navigator window, select the option Columns..., then mark Degree and click OK.

Display settings

We can sort the graph table by node Degree by clicking the Degree column 2 times, and now we can observe that the database SATPdb contains the most similar sequences to the query sequence.

Graph table

3.8.2 What are the biological functions of peptides similar to the sequence of interest?

Follow the Step 1 of the previous example.

Step 2: Creates the metadata network by selecting the option Function.

Step 3: In the graph table view of Navigator window, select the option Columns..., then mark Degree and click OK.

Graph table

Step 4: In the Network visualization window, select the following options.

Network visualization

  1. Shows node labels.
  2. Disables the option Show peptide labels.
  3. Modifies the label size to Node size.

In the Appearance panel (see sect. 3.5), customizes the appearance of nodes for sizing and coloring nodes according to the degree measure.

  1. In the Nodes view, select Node size ➡️ Ranking ➡️ Degree. Set min and max sizes to 5 and 100 respectively, select the interpolator Bezier, select the second predefined spline and press Run.
  2. In the Nodes view, customizes Node color ➡️ Ranking ➡️ Degree, and press Run.

Run the Tools ➡️ Network ➡️ Layout ➡️ Fruchterman Reingold about 10s and then press stop. The result may be similar to the following network:

Metadata network