How to prepare data files

From SpatoWiki
Revision as of 14:24, 6 June 2011 by Christian (talk | contribs) (Created page with "Spato documents are mainly XML files, but can be supplemented by binary data. === Writing all data into text (XML) files === The easiest way to create a new document is to crea...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Spato documents are mainly XML files, but can be supplemented by binary data.

Writing all data into text (XML) files

The easiest way to create a new document is to create a directory with a name that ends in .spato, e.g., My_Network.spato. Then create a text file called document.xml within that directory. Here is an example:

<?xml version="1.0" ?>
  <title>Everglades Food Web</title>
    From the Pajek website:
    Originally described by Ulanowicz, R.E., J.J. Heymans, and M.S. Egnotovich. (2000)
    Reduced to largest connected component
  <nodes src="nodes.xml" />
  <links src="links.xml" />
  <slices name="SPT" src="spt.xml" />
  <dataset src="nodeprops.xml" selected="true" />
  <dataset src="dist.xml" />

Each document has to have exactly one nodes tag and may have a links tag in which the weight matrix of the network is defined. The slices tag defines the shortest-path tree (or any other tree) for each node. If no slices element is found in the document, then SPaTo will automatically compute shortest-path trees using Dijkstra's algorithm. The dataset tags define collections of node properties which can be used to color the nodes. As you can guess, the actual content is stored in additional XML files (nodes.xml, links.xml, etc.), but this is optional.

In the following, we will go through the individual tags and how their content is supposed to look like.


Here are the contents nodes.xml from the Everglades food web example:

<?xml version="1.0" ?>
  <projection name="LonLat" />
  <node id="Living Sediments" name="Living Sediments" location="-5.95118,-6.82493" strength="1.72566" />
  <node id="Living POC" name="Living POC" location="-4.33367,-8.01817" strength="0.105276" />
  <node id="Periphyton" name="Periphyton" location="4.39522,-2.45945" strength="5.54369" />
  <!-- ... many nodes omitted for brevity ... -->
  <node id="Passerines" name="Passerines" location="-1.42228,8.97679" strength="0.00031404" />

Each node is defined by one node tag that usually has four attributes:

  • name is the name of the node, which is sometimes displayed in the bottom right corner of application window and can be searched using the search input field
  • id is a string that will be displayed as the “label” next to the node in the network visualization and can also be searched using the search input field
  • location is a comma-separated pair of coordinates that defines the node position in the map view
  • strength is a float value that should represent the node strength (node flux), which is currently used to show only the labels of the strongest nodes in the network

The order of the node tags is significant and determines the node index by which it can be referred to in the links and slices tags.

There is one more tag named projection that is used to determine how to project the coordinates in the map view. For historical reasons, the cartesian projection is called “LonLat”. Other currently recognized values are “Albers” (projection parameters will be automatically determined) and “LonLat Roll” (which is a cartesian, or equirectangular, projection with periodic boundary conditions).


Again, starting with links.xml from the Everglades food web example:

<?xml version="1.0" ?>
<links inverse="true">
  <source index="1">
    <target index="7" weight="0.0167681" />
    <target index="8" weight="0.0150874" />
    <target index="9" weight="0.0271651" />
    <!-- ... omitted ... -->
    <target index="41" weight="3.7967e-05" />
  <source index="2">
    <target index="8" weight="0.000810104" />
    <!-- ... omitted ... -->
    <target index="41" weight="1.81713e-07" />
  <!-- ... omitted ... -->
  <source index="63">
    <target index="4" weight="3.59066e-05" />
    <!-- ... omitted ... -->
    <target index="63" weight="1.3815e-09" />

For each node defined in the nodes tag, there should be one source. The attribute index refers to the 'index'-th node (index numbers start at 1). Within each source tag, there is one target tag for each link that originates from the source node, which specifies the index of the linked node and the weight of that link.

The boolean attribute inverse is used by the built-in shortest-path algorithm if no trees are explicitly given in the document (see below). If true, the inverse link weight will be used to calculate the path length.

Slices (Trees)

This is spt.xml from the Everglades food web example:

<?xml version="1.0" ?>
<slices name="SPT">
    <slice root="1">0 21 10 8 10 46 ... 3 7 13 6 4 10</slice>
    <slice root="2">21 0 21 8 10 46 ... 3 7 13 6 4 10</slice>
    <!-- ... ommited ... -->
    <slice root="63">10 21 10 8 10 46 ... 3 7 13 6 4 0</slice>

For each node in the network, one tree has to be defined using a slice tag, with the root attribute specifying the (1-based) index of the node. The contents of the tag is a space-separated list of N integers (where N is the number of nodes). The 'n'-th integer in that list states the index of the parent of node 'n' in the tree. The index 0 is used for the root node (which, by definition, has no parent) or if the node is disconnected from the root node.


Each dataset can contain multiple node properties (“quantities”), each defined in a data tag. Each data tag must have a values tag, the contents of which are a space-separated list of float values specifying the value of this quantity for each node, in the order in which the nodes are defined inside the nodes tag.

An optional colormap tag indicates which colormap to use when coloring nodes according to this quantity, with the (optional) name attribute giving the name of the colormap and the boolean log attribute indicating whether to use a log-scale when mapping values to colors. You can also specify the limits of the colormap using minval and maxval attributes. Values outside these limits will be mapped to the color associated with the lower (or higher) limit.

In the Everglades food web example, nodeprops.xml defines some general node properties:

<?xml version="1.0" ?>
<dataset name="Node Properties">
  <data name="Node Degree">
    <colormap log="true" />
    <values>22 21 17 18 9 11 ... 29 8 5 31 9 37</values>
  <data name="Node Strength" selected="true">
    <colormap log="true" />
    <values>1.7257 0.10528 5.5437 ... 0.00025997 2.4328e-05 0.00031404</values>
  <!-- ... omitted ... -->

All the quantities in the example above are static properties of each node. It is also possible to define quantities in which the value associated with a node also depends on the currently selected root node, i.e., properties that depend on pairs of nodes. The shortest-path distance is such a property; here are the contents of dist.xml from the Everglades food web:

<?xml version="1.0" ?>
<dataset name="Distance Measures">
  <data name="SPD" distmat="true">
    <colormap log="true" minval="0.459869" maxval="2.09185e+08" />
    <values root="1">0 11.4585 3.40721 ... 8732.1 82394.8 14601.1</values>
    <values root="2">11.4585 0 12.5688 ... 8741.26 82403.9 14611.1</values>
    <!-- ... omitted ... -->
    <values root="63">14601.1 14611.1 14598.6 ... 23327.3 96989.9 0</values>

All of these two-dimensional quantities can be used as the radial distance measure in the tomogram view and will be listed in the distance matrix selector in the upper left corner of the application window.

Both dataset and data tags can have name attributes which will be used in the graphical user interface. Boolean selected attributes mark the dataset and quantity which is used to color the node when loading the document. Similarly, the boolean distmat attribute marks the quantity used as the radial distance in the tomogram view.