DESCRIPTION
Recently, graph-theory has been characterised as an efficient and useful tool for conservation planning (e.g. Bunn et al. 2000, Calabrese & Fagan 2004, Minor & Urban 2008, Zetterberg et. al. 2010). As a part of the r.connectivity.* tool-chain, r.connectivity.network is intended to make graph-theory more easily available to conservation planning.
r.connectivity.network is the 2nd tool of the r.connectivity.* toolchain and performs the (core) network analysis (usig the igraph-package in R) on the network data prepared with r.connectivity.distance. This network data is analysed on graph, edge and vertex level.
Connectivity measures for the graph level are: number of vertices, number of edges, number of clusters, size of the largest cluster, average cluster size, diameter, density, modularity, number of communities, community size (in number of vertices) Network characteristics are visualised in a plot showing an overview over number of connections, number of components and and the size of the largest network component within the network with regards to cost-distance between patches.
Connectivity measures calculated for the edges are:
Minimum-spanning-trees, directness, edge-betweenness, local edge-betweenness, edge-betweenness-community, bridges (biconnected components), tree edges (biconnected components), and potential connectors of clusters and communities.
Connectivity measures calculated for the vertices are:
Degree-centrality, Eigenvector-centrality, closeness centrality, vertex-betweenness, local vertex-betweenness, cluster-membership, edge betweenness community structure, community-membership, neighbourhood size, local neighbourhood size, articulation points, and articulation.
Most measured can be calculated both on a directed and/or an undirected graph.
Analysis is based on a negative exponential decay kernel (as described e.g. in Bunn et al. (2000), which the user can modify according to the dispersal characteristics of her/his species or habitat type.
!!!This script is designed to work based on the output of r.connectivity.distance!!!
REQUIREMENTS
For running this tool R language and environment for statistical computing and graphics has to be installed (see: http://www.r-project.org). On Windows the path to R has to be added to the %path% variable in the environment settings (see: http://grass.osgeo.org/wiki/R_statistics#MS_Windows) and GRASS GIS has to be started after that.
Furthermore the igraph package in R is required with version 0.6-2 or newer.
For parallel processing (at the moment only supported for Linux) the R packages doMC, multicore, iterators, codetools and foreach are required as well.
All R packages can be installed by running the AddOn using the i-flag (-i). Installation of R packages requires internet access.
For postscript output (overview and kernel plot) also ghostscript is required.
OUTPUT NOTES
Field naming
Due to SQL-standards (especially with the dbf-driver) the length of field names is limited to ten signs. This required to abbreviate the naming of the edge and vertex measures in the vector attribute table. This was (mostly) done using the following pattern for field names (of vertex and edge measures):
- weight _ measure _ (sub)graph
(Sub)Graphs
In r.connectivity.network a graph is build from the network dataset produced in r.connectivity.distance. From this (entire, directed) graph a set of (sub)graphs is extracted, specified by:
- u = for the undirected graph
- d = for the graph with only direct edges
- c = for the graph with only edges shorter than cost distance threshold
Graph measures
- Number of edges
- The number of edges is the number of connections between pairs of vertices in the network.
- Number of vertices
- The number of vertices is the number of polygons (patches) from the input vector map which are analysed in the r.connectivity.*.
- Cluster
- A cluster is a group of vertices which are connected with each other, but not with the rest of vertices in a graph (a single isolated vertex can be a cluster as well).
- Diameter
- The diameter of a graph is the length of the longest geodesic.
- Density
- The density of a graph is the ratio of the number of edges and the number of possible edges.
Edge measures
- Input parameters
- con_id = ID of the directed connection between two vertices (edge)
- con_id_u = ID of the undirected connection between two vertices (edge)
- from_p = Patch ID (patch_id) of the start-vertex of an (directed) edge
- from_pop = Population proxy (pop_proxy) of the start-vertex of an (directed) edge
- to_p = Patch ID (patch_id) of the end-vertex of an (directed) edge
- to_pop = Population proxy (pop_proxy) of the end-vertex of an (directed) edge
- Weights
- Cost distance (cd)
- Cost distance is calculated for the directed (cd) and undirected graph (cd_u). The cd weight for directed edges is exactly like measured in r.connectivity.distance, the cd weight for undirected edges is the average cost distance of the two directed edges between a pair of vertices (mean( a--b, b-a)). The cost distance weight does not take he user defined dispersal information (the population proxy for the different patches and the dispersal kernel) into account.
- Negative exponential decay kernel flow (distk)
- The negative exponential decay kernel is defined by the variables euler, base, and exponent specified by the user. It is calculated for the directed (distk) and undirected edges(distk_u) using the following formulas:
- distk = euler ^ ( ( basis * ( 10 ^ exponent ) )*( cd))
- distk_u = euler ^ ( ( basis * ( 10 ^ exponent ) )*( cd_u))
- Maximum potential flow (mf)
For the maximum potential flow (mf) weight the user defined dispersal information (the population proxy and the dispersal kernel) is used to model the flow of propagules between a pair of vertices. This weight is based on the assution that propagules disperse evenly and in full amount into all sourrounding patches. Maximum potential flow is calculated using the following formula:
- mf = disk * pop_proxy
- Maximum potential flow is calculated for the directed graph as incoming (mf_i) and outgoing (mf_o) flow as well as for the undirected graph (mf_u). Unlike the cost distance weight, mf represents the "closeness" of a pair of vertices. For some (most) algorithms it had to be inverted in order to give meaningful results (mf_inv = Inverted maximum potential flow for directed edges, mf_inv_u = Inverted maximum potential flow for undirected edges).
- Competing potential flow (cf)
- The competing potential flow (cf) weight is based on the work by Ranius & Roberge 2011. Like in the maximum potential flow (mf) weight, the user defined dispersal information (the population proxy and the negative exponential decay kernel) is used to model the flow of propagules between a pair of vertices. But here it is assumed that the total amount of propagules is limited to the population proxy given by the user, and that this amount is distributed according to the "attractiveness" (defined by population size and cost distance) of the surrounding patches.
- Competing potential flow is calculated for the directed (cf) and undirected (cf_u) graph. Unlike the cost distance weight, cf represents the "closeness" between a pair of vertices. For some (most) algorithms it had to be inverted in order to give meaningful results (cf_inv = Inverted competing potential flow for directed edges, cf_inv_u = Inverted competing potential flow for undirected edges).
- The mf and cf weights are inverted (mf_inv, cf_inv) by multiplying them with -1 followed by linear normalising to the original maximum and minimum values.
- Output parameters
- Shortest connections (isshort)
- In r.connectivity.network all edges are classified if they represent the shortest path from the start- to the end-vertex, which is stored as a logical value (0 = FALSE or 1 = TRUE). Shortest connections are identified for every weight (isshort_cd, isshort_mf, isshort_cf). Finally the isshort attribute is set to 1 if an edge represents the shortest path for either the cd-weight, the mf-weight or the cf-weight (or 0 otherwise).
- Biconnected components (bc)
- If the removal of a single vertex and its adjacent edges does not disconnect the graph (increase the number of clusters) it is part of a biconnected component.
- Bridges (is_br)
- A bridge is an edge who's removal increases the number of clusters in a graph. Edges are classified either as bridge (1) or not (0).
- Edge betweenness (eb)
- The edge betweenness (eb) value represents the number of shortest paths that go through an edge, considering the shortest paths between all possible pairs of vertices. Edge betweenness is calculated with all three weights for the entire directed graph (cd_eb_u, mf_eb_u, cf_eb_u), the undirected graph with only direct edges (cd_eb_ud, mf_eb_ud, cf_eb_ud), and the undirected graph with only direct edges shorter than cost distance threshold (cd_eb_udc, mf_eb_udc, cf_eb_udc).
- Local edge betweenness (leb)
- The local edge betweenness (leb) value represents the number of shortest paths that go through an edge, considering the shortest paths between all possible pairs of vertices which are shorter than the user defined local neighbourhood (lnbh_threshold). Local edge betweenness is calculated with all three weights for the entire directed graph (cd_leb_u, mf_leb_u, cf_leb_u), the undirected graph with only direct edges (cd_leb_ud, mf_leb_ud, cf_leb_ud), and the undirected graph with only direct edges shorter than cost distance threshold (cd_leb_udc, mf_leb_udc, cf_leb_udc).
- Edge betweenness community (ebc)
- Some groups of vertices can be densely connected with each other but only little connected to other vertices. Such groups of relative intense connected vertices are communities. One algorithm to identify communities in a graph is edge betweenness community (ebc). Edge betweenness community is calculated in an iterative loop, where, first edge betweenness is calculated, then the edge with the largest edge betweenness value is removed, edge betweenness is recalculated and so on. The algorithm produces mainly two edge measures:
- 1. the order in which edges were deleted (ebc_r)
- 2. the edge betweenness value of an edge at the time of removal
- For large graphs calculating edge betweenness community can take long time. Therefore, it is only calculated on request (y-flag) and only for one weight (cf). It is only calculated for the entire (undirected) graph.
- Minimum spanning trees (mst)
- A minimum spanning tree (mst) is a subgraph consisting of the minimal possible number of edges (= number of vertices - 1) connecting all vertices to a minimal possible number of clusters, while the sum of all edge weights is minimised. If the graph consists of more than one cluster, the result of the minimum spanning tree algorithm is a so called minimum spanning forest, consisting of the minimum spanning tree for each cluster. In the context of nature conservation, the minimum spanning tree is sometimes referred to as the "backbone" of an area network (Bunn etal. 2000). In r.connectivity.network the minimum spanning tree/forest is calculated with all weights (cd, mf, cf) and for all undirected (sub)graphs (u, ud, udc) resulting in the following (logical) edge measures cd_mst_u, cd_mst_ud, cd_mst_udc, mf_mst_u, mf_mst_ud, mf_mst_udc, cf_mst_u, cf_mst_ud, cf_mst_udc, where edges are either part of the minimum spanning tree/forest (1) or not (0).
- Potential cluster connectors (cl_pc_u)
- A cluster is a group of vertices which are connected with each other, but not with the rest of vertices in a graph (a single isolated vertex can be a cluster as well). Potential cluster connectors are edges which are longer than the connectivity threshold, and connect clusters in the subgraph with only edges shorter than the connectivity threshold.
- Potential community connectors (cf_ebc_pc, cf_iebc_pc)
- Potential community connectors are edges which connect communities identified by the edge betweenness community algorithm for a user defined community level.
Vertex measures
- Input parameters
- Patch ID (patch_id)
- ID of the vertex = category value (cat) of the patch vector given by the user in r.connectivity.distance.
- Population proxy (pop_proxy)
- The population proxy given by the user in r.connectivity.distance.
- Output parameters
- Cluster membership (cl)
- A cluster is a group of vertices which are connected with each other, but not with the rest of vertices in a graph (a single isolated vertex is a cluster as well). The cluster membership value (integer) is the id of the cluster a vertex belongs to.
- Community structure and membership (cf_(i)ebc_cs, cf_(i)ebc_cl)
- The edge betweenness community algorithm is used to analyse the community structure of a graph. It results in a hierarchical structure which describes how a graph is split up by the edge removal process into an increasing number of communities. The hierarchical community structure is stored as the (character) vertex attribute "cf_ebc_cs". The community membership value (integer) is the id of the community (identified by edge betweeenness community) a vertex belongs to (on a user defined level of community division).
- Articulation points (art_p)
- Articulation points (art_p) are vertices who's removal would increase the number of clusters in a graph. Articulation points are identified for the entire directed graph (art_p_u), the undirected graph with only direct edges (art_p_ud), and the undirected graph with only direct edges shorter than cost distance threshold (art_p_udc). Articulation points (art_p) are a (logical) measure of the graph structure and do not take edge weights into account. Vertices are either articulation points (1) or not (0).
- Articulation (art)
- The articulation value (integer) (art) is the number of new clusters which would occur when a vertex is removed. The articulation value (art) is a measure of the graph structure and does not take edge weights into account.
- Degree centrality (deg)
- Degree centrality of a vertex is defined as the number of vertices connected to this vertex. This (integer) measure is a measure of the graph structure and does not take edge weights into account.. Degree centrality is calculated for all three undirected graphs (deg_u, deg_ud, deg_udc).
- Eigenvector centrality (evc)
- Eigenvector centrality (evc) in r.connectivity.network is a variation of the eigenvector centrality algorithm provided by the igraph library (see also: Csardi & Nepusz 2006). In r.connectivity.network eigenvector centrality is the sum of incoming potential flow of a vertex.
- Because eigenvector centrality takes the direction of potential flows into account it is calculated only for the directed graph and the directed graph with only edges shorter than connectivity threshold, using both cf and mf weight (cf_evc_u, cf_evc_uc, mf_evc_u, mf_evc_uc). Eigenvector centrality is stored with double precision.
- Closeness centrality (cl)
- Cloness centrality represents the number of steps which are necessary to access every other vertex from a given vertex. Closeness centrality (integer) is calculated with all three weights for the entire directed graph (cd_cl_u, mf_cl_u, cf_cl_u), the undirected graph with only direct edges (cd_cl_ud, mf_cl_ud, cf_cl_ud), and the undirected graph with only direct edges shorter than cost distance threshold (cd_cl_udc, mf_cl_udc, cf_cl_udc).
- Vertex betweenness (vb)
- The vertex betweenness (vb) value represents the number of shortest paths that go through a vertex (but that do not start or end in that vertex), considering the shortest paths between all possible pairs of vertices. Vertex betweenness is calculated with all three weights for the entire directed graph (cd_vb_u, mf_vb_u, cf_vb_u), the undirected graph with only direct edges (cd_vb_ud, mf_vb_ud, cf_vb_ud), and the undirected graph with only direct edges shorter than cost distance threshold (cd_vb_udc, mf_vb_udc, cf_vb_udc).
- Local vertex betweenness (lvb)
- The local vertex betweenness (lvb) value represents the number of shortest paths that go through a vertex (but that do not start or end in that vertex), considering the shortest paths between all possible pairs of vertices wich are shorter than the user defined local neighbourhood (lnbh_threshold). Local vertex betweenness is calculated with all three weights for the entire directed graph (cd_vb_u, mf_vb_u, cf_vb_u), the undirected graph with only direct edges (cd_vb_ud, mf_vb_ud, cf_vb_ud), and the undirected graph with only direct edges shorter than cost distance threshold (cd_vb_udc, mf_vb_udc, cf_vb_udc).
- Neighbourhood size (nbh_s)
- The neighbourhood size (nbh_s) is the number of of other vertices which can be reached from a vertex.
- Local neighbourhood size (nbh_sl)
- The local neighbourhood size (nbh_sl) is the number of other vertices which can be reached from a vertex along a path which is shorter than the user defined local neighbourhood.
EXAMPLE
The following example is based on the North Carolina dataset!
Please be aware that all input parameter of the following example are purely hypothetical (though they intend to imitate a real life situation) and chosen only for the matter of the exercise. Parameters have to be adjusted in other cases according to ecological knowledge in order to give meaningful results!
Let us assume we want to analyse the habitat connectivity for a hypothetical species of the Hardwood Swamps in North Carolina. This species lives only in the larger core area of the swamps (> 1ha) while the borders are no suitable habitats.
It is not the most mobile of species and can cover (under optimal conditions) maximal 1.5 km.
Prepare input data
Befor we can run the connectivity analysis with the r.connectivity.*-tools we need to prepare the the example input data. Because we want to use cost distance as a distance measure we have to provide a cost raster map in addition to the required vector map of (habitat) patches:
Create input patch vector map
Create input patch vector map
###Set region
g.region -p rast=landuse96_28m@PERMANENT align=landuse96_28m@PERMANENT
###Patch input vector map
##Extract habitat patches
#Condition 1: Category 11 = Bottomland Hardwoods/Hardwood Swamps
#Condition 2: no border-cells are suitable
r.mapcalc "patches=if(landuse96_28m[0,1]==11&&landuse96_28m[0,-1]==11&&landuse96_28m[1,1]==11&&landuse96_28m[1,0]==11&&landuse96_28m[1,-1]==11&&landuse96_28m[-1,1]==11&&landuse96_28m[-1,0]==11&&landuse96_28m[-1,-1]==11&&landuse96_28m==11,1,null())"
#Vectorize patches
r.to.vect input=patches output=patches feature=area
#Add a column for the population proxy (in this case area in hectares)
v.db.addcol map=patches layer=1 'columns=area_ha double precision'
#Upload area to attribute table (later used as population proxy)
v.to.db map=patches type=point,line,boundary,centroid layer=1 qlayer=1 option=area units=hectares columns=area_ha
#Extract core habitat areas with more than 1 ha
v.extract input=patches output=patches_1ha type=area layer=1 where="area_ha>1"
Create a cost raster (se also: r.cost)
For the cost raster, we assume that areas which are developed with high intensity are absolute barriers for our species (not crossable at all).
Other developed and managed areas can be crossed, but only at high costs (they are avoided where possible). Other Hardwood landcover types offer best opportnity for the dispersal of our species (they are prefered), while the costs for crossing the other landcover types is somewhere inbetween.
Hint: One might also add infrastructure like e.g. roads
#Reclassify land use map
#Windows users may have to use the GUI version of r.reclass and paste the rules there...
echo '0 = 56 #not classified (2*resolution (28m))
1 = NULL #High Intensity Developed (absolut barrier)
2 = 140 #Low Intensity Developed (5*resolution (28m))
3 = 112 #Cultivated (4*resolution (28m))
4 = 70 #Managed Herbaceous Cover (2,5*resolution (28m))
6 = 28 #Riverine/Estuarine Herbaceous (1*resolution (28m))
7 = 42 #Evergreen Shrubland (1,5*resolution (28m))
8 = 42 #Deciduous Shrubland (1,5*resolution (28m))
9 = 42 #Mixed Shrubland (1,5*resolution (28m))
10 = 28 #Mixed Hardwoods (1*resolution (28m))
11 = 28 #Bottomland Hardwoods/Hardwood Swamps (1*resolution (28m))
15 = 56 #Southern Yellow Pine (2*resolution (28m))
18 = 28 #Mixed Hardwoods/Conifers (1*resolution (28m))
20 = 42 #Water Bodies (1,5*resolution (28m))
21 = 84 #Unconsolidated Sediment (3*resolution (28m))' | r.reclass input=landuse96_28m@PERMANENT output=costs --overwrite
Create the network (r.connectivity.distance)
In the first step the network dataset is created, and the cost distance between all possible pairs of vertices (patches) is calculated.
Our species is known to be uable to cover more than 1.5 km distance. In order to identify potential for improving the connectectivity of the landscape for our speies we set the cutoff distance (maximum search distance for connections) to three times dispersal potential of our species (4500). In lack of real population data we use the area (ha) as a proxy. The distance between two patches is measured as the maximum distance along the closest 500m of boundary of a patch (ca. 18 border cells with 28m resolution). All non map output is stored in a folder named hws_connectivity in the current working directory.
r.connectivity.distance --verbose patches=patches_1ha costs=costs pop_proxy=area_ha prefix=hws_connectivity folder=./hws_connectivity cutoff=4500 border_dist=18
Analyse the network (r.connectivity.network)
In the next step network analysis is performed. We define the connectivity cutoff (maximum cost distance where connectivity is assumed) to be 1500.0 cost distance units, which represents the maximum known dispersal ability of our species. The dispersal kernel is adjusted to characteristics of this species by setting the exponent to -3, which gives a steeper decrease in successful dispersal with increasing cost distance. Only very few dispersal units of our species are expected to cross more than ca. 900 cost distance units successfully (se: ./hws_connectivity/kernel.eps after running the command or run it with the x-flag (-x) for only displaying the kernel). Connectivity metrics reslults are stored in te attribute table of the vector maps on vertex (patches, map name: "prefix" _ vertex_measures) and edge level (connections, map name: "prefix" _ edge_measures). An overview over connectivity metrics on the graph level (the entire network) is stored in "folder" (./hws_connectivity). We requested also an R-plot of the dispersal kernel and an R-plot givig an overview over network characteristics to be stored in the same folder.
Users with a multi-processor computer (e.g. dual-core) may speed up processing by setting cores > 1 in r.connectivity.network.
r.connectivity.network -y -i folder=./hws_connectivity connectivity_cutoff=1500.0 lnbh_cutoff=2.0 cl_thres=10 exponent=-3 kernel_plot=./hws_connectivity/kernel.eps overview_plot=./hws_connectivity/overview.eps cores=1
Calculate corridors (r.connectivity.corridors)
In this example two alternative (or to some extent complementary) sets of corridors are calculated:
Corridor set 1:
For the first set, corridors are only computed for the minimum spanning tree (in literture sometimes called the backbone of an ecological network). This is specified in the "where"-parameter. Edge betweenness values are used as weights, which represent the centrality of a connection (edge) in a network (and is therewith a measure of importance). The output is a set of corridor raster maps (named: "prefix" _corridor_ "edge_id" _ "weight") and a summary over all computed corridors (named: "prefix" _corridor_ "edge_id" _ "weight" _sum_ "suffix").
r.connectivity.corridors --verbose folder=./hws_connectivity suffix=mst weights=cf_eb_ud where="cf_mst_ud > 1" cores=1
Corridor set 2:
For the second set, corridors are computed for all edges in the graph (network). This takes a little more time. Corridors are then weighted by the modeled, undirected (= in oth directions) flow of propagules ("cf_u").
r.connectivity.corridors --verbose folder=./hws_connectivity suffix=alle weights=cf_u cores=1
Users with a multi-processor computer (e.g. dual-core) may speed up corridor processing by setting cores > 1 r.connectivity.corridors.
An other means for speeding up processing would be to use integer weights (e.g. by converting edge betweenness to integer afte multiplying with 10 to the power of the relevant number of digits). Then r.reclass is used in the script instead of r.recode.
When all analysis is finnished some cleanup may be appropriate, in order to save disk space:
#Add the f-lag (-f) to the following two commands if you are sure you want to delete all intermediate maps from r.connectivity.*.
g.mremove rast=hws_connectivity_corridor_*
g.mremove rast=hws_connectivity_patch_*
REFERENCE
- Bunn, A. G., Urban, D. L. & Keitt, T. H. 2000: Landscape connectivity: A conservation application of graph theory. Journal of Environmental Management (2000) 59: 265-278 http://www.sciencedirect.com/science/article/pii/S0301479700903736
- Calabrese, J. M. & Fagan, W. F. 2004: A comparison-shopper's guide to connectivity metrics. Front Ecol Environ 2 (10): 529-536 http://dx.doi.org/10.1890/1540-9295(2004)002[0529:ACGTCM]2.0.CO;2
- Minor, E. S. & Urban, D. L. 2008: A Graph-Theory Framework for Evaluating Landscape Connectivity and Conservation Planning. Conservation Biology 22 (2): 297-307 http://www.uic.edu/labs/minor/minor&urban2008.pdf
- Zetterberg, A., Mörtberg, U. M. & Balfors, B. 2010: Making graph theory operational for landscape ecological assessments, planning, and design. Landscape and Urban Planning (2010) 95: 181-191 http://www.sciencedirect.com/science/article/pii/S0169204610000204
- Ranius, T. & Roberge, J.-M. 2011: Effects of intensified forestry on the landscape-scale extinction risk of dead wood dependent species. Biodiversity and Conservation 20 (13): 2867-2882http://www.springerlink.com/content/ch9qtv2665h624q4/
- Csardi G. & Nepusz T. 2006: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net
- Csardi, G. 2012: igraph: Network analysis and visualization. http://cran.r-project.org/web/packages/igraph/index.html
SEE ALSO
r.connectivity.distance,
r.connectivity.corridors
AUTHOR
Stefan Blumentrath, Norwegian Institute for Nature Research (NINA)
Last changed: $Date$