= Performance comparison of Open Source Geometry Libraries (under construction!) = The [/ggl GGL] is compared to other Open Source Geometry and GIS libraries. This comparison is a study driven by curiosity of how different programming models and designs influence performance as well as to compare Open Source products available on the market. We are aware of the weaknesses of performance tests and that it is hard to design objective benchmarks. So, this comparison is not considered to be objective in terms of intentions. However, we did what we can to achieve most comparable results. All compared products are Open Source. The benchmark is Open Source itself and we aim to improve it in future (ie. fix bugs). Therefore, everyone can review it and provide his critique as well as point bugs and weaknesses, so the benchmark can be improved to provide more objective results. We also encourage to download the benchmark code, build and run performance tests as well as share the results with GGL [http://lists.osgeo.org/mailman/listinfo/ggl developers and other users]. This page explains how the comparison is done, how it might be reproduced, and shows the results. == Dataset == Used, for all comparisons, are the US Counties. They are stored in a large shapefile (82MB), which can be downloaded for free from [http://www.weather.gov/geodata/catalog/county/html/county100k.htm US National Weather Services] web page. [[Image(county100k.gif)]] Source: [http://www.weather.gov/geodata/catalog/county/graphics/county100k.gif weather.gov] Note that also other shapefiles can be used by the comparison programs. == Algorithms == Measured is the elapsed time executing following algorithms: * area * [http://en.wikipedia.org/wiki/Centroid centroid] * [http://en.wikipedia.org/wiki/Convex_hull_algorithms convex hull] * intersection (polygon / polygon) * clip (intersection polygon / box) * simplify * within The algorithms will be explained in more detail below. === ''Area'' === Areas of all polygons are calculated. Because area calculation is a fast operation, it is measured several times per polygon (ten times for all US counties). The US Counties are stored in latitude-longitude, the area is also calculated in those units for all libraries. This results in a (not so useful) square degree area. However, the purpose is to measure performance. === ''Centroid'' === The centroid of all polygons is calculated. Like area, it is a fast operation, so it is done ten times for all polygons. === ''Convex hull'' === The convex hull of all polygons is calculated. The resulting area is shown to check the results === ''Intersection (polygon / polygon)'' === For all counties, an '''ellipse''' is generated (the generation is not included in the timing). The ellipse has 100 points for the standard test. The ellipse is generated such that the center of the bounding box of the polygon is the center of the ellipse, and the minor and major axes of the ellipse are formed of the width and height of the bounding box of the polygon, multiplied by factor 1.1. Therefore the ellipse intersects the polygon in nearly all cases. To test scalability of the intersection, also tests are done with ellipses of 1000 and 10000 points. To test the performance of cases with a high intersection density, also a '''star-ellipse''' test is done. This star-ellipse is generated in the same way, but with alternating factors (of 0.2 and 1.1). This way a star-shape is formed, intersecting the polygon again and again. This test is done with star-ellipses of 101, 1001 and 10001 points. ''Note that, currently, in the results below the 101 point star-ellipse test is shown.'' === ''Clip (Intersection polygon / box)'' === Some libraries implement a specific intersection of a polygon with a box. This '''clipping''' operation is very useful in user interfaces or tiling. For each polygon (US county) a clipping box is generated based on the bounding box, in horizontal direction multiplied by factor 1.1, in vertical direction multiplied by factor 0.9, such that the polygon always intersects with its clipping box. === ''Simplify'' === The boundaries of all counties are simplified to reduce the number of vertices. This is not an OGC operation but some libraries implement it. Note that there are different algorithms for simplifying a linestring, most used is the Douglas-Peucker algorithm. Terralib is using another algorithm (so the performance timings are not really comparable). The simplification is done per linestring, not per polygon, because not all libraries implement simplification of polygons. The simplifying distance is chosen such that the resulting linestring is 99% of the original linestring. This is, for the US Counties, a distance of 0.0008975 degrees. === ''Within'' === The '''Within''' algorithm determines if a geometry (in this case a point, the center of the bounding box) is lying within another geometry (in this case the polygon, US county). Within calculations are usually fast and were therefore calculated for all centers combined with all polygons. The Within algorithm normally profits from first comparing with the polygons bounding box before considering the polygon itself. Per library, the fastest method of these is used in the comparisons. == Libraries == The following libraries are compared * [http://www.cgal.org/ CGAL 3.4] * [http://trac.osgeo.org/geos/ GEOS 3.1.1] * [http://www.cs.man.ac.uk/~toby/alan/software GPC 2.32] (intersection) * [/ggl/ GGL] * [https://svn.boost.org/svn/boost/sandbox/gtl GTL (now called Boost.Polygon)] * [http://www.terralib.org TerraLib 3.3.1] * [http://www.wykobi.com Wykobi 0.0.4] == Compiler == The comparison is done using [http://www.mingw.org/ MinGW] on a Dell Latitude D820 laptop. All programs are compiled using [http://gcc.gnu.org/gcc-4.4/ GCC 4.4.0] with full optimization: {{{ gcc -O3 ... }}} Most (but not all) libraries also compile using Microsoft Visual C++ Compiler (MSVC) 8.0 (2005) and 9.0 (2008). However, measurements are done using GCC because it has one simple optimization flag, allows also running on Linux and has no big differences between different versions. (MSVC 2005/2008 [http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example3 have sometimes]). == Results == All tests are run 10 times. The best results are listed in the tables below. === Area === ||'''Library'''||'''Result (summed area)'''||'''Speed (s)'''|| ||cgal||746.29304||0.109|| ||geos||746.29304||1.187|| ||ggl||746.29304||0.109|| ||gtl||746.29304||0.156|| ||terralib||746.29304||0.109|| ||wykobi||746.29304||0.109|| === Centroid === ||'''Library'''||'''Result (summed X-value)'''||'''Speed (s)'''|| ||cgal||-92.28795||0.093|| ||geos||-92.28962||1.203|| ||ggl||-92.28962||0.156 ^1^|| ||terralib||-92.28964||0.859|| ||wykobi||-92.28962||0.171|| ^1^ enhanced in Oct'09 from 0.171 to 0.156 === Clip === ||'''Library'''||'''Result (summed area)'''||'''Speed (s)'''|| ||ggl||376.60163||0.203|| ||wykobi||376.60163||0.578|| === Convex hull === ||'''Library'''||'''Result (summed area)'''||'''Speed (s)'''|| ||cgal||840.51277||0.875|| ||geos||840.51277||4.094|| ||ggl||840.51277||0.875|| ||terralib||840.51277||3.344|| ||wykobi||840.51277||1.437|| === Interesection === ||'''Library'''||'''Result (summed area)'''||'''Speed (s) ^3^'''|| ||cgal||695.81160||122.8 ^1^|| ||geos||696.86163||13.4|| ||ggl||696.86163||1.1|| ||gpc||696.86163||9.0|| ||gtl||696.86163||92.2 ^2^|| ||terralib||696.86163||4.6|| ^1^ some counties are not accepted by CGAL (which might be a correct choice), therefore area is less and this might influence performance. These kind of counties are not accepted, with the CGAL message: [wiki:CgalNotStrictlySimple not strictly simple] ^2^ gtl needs integer coordinates, therefore area is slightly less (696.861628315 vs. 696.86163168) ^3^ intersections of ellipses and star-ellipses with different amount of points are reported [wiki:IntersectionMore here] === Simplify === ||'''Library'''||'''Result (summed perimeter)'''||'''Speed (s)'''|| ||geos||6403.80878||3.828|| ||ggl||6403.86492||0.718|| ||terralib||6402.16732||15.297 ^1^|| ^1^ terralib uses another algorithm (non Douglas-Peucker), results still have to be evaluated visually === Within === ||'''Library'''||'''Result ^1^'''||'''Speed (s)'''|| ||cgal||2775||0.109|| ||geos||2775||3.125|| ||ggl||2775||0.047 ^2^|| ||gtl||2775||0.078|| ||terralib||2775||0.203|| ||wykobi||2775||0.078|| ^1^ number of counties with the center of their bounding box inside their polygon outline, which should be the case in 2775 of 2776 counties. ^2^ depends on selected strategy: ||'''Strategy'''||'''Speed (s)'''||'''Properties'''|| ||Windings||0.093||Default strategy. Agnostic; point on border handled correctly|| ||Crossings-multiply||0.062||Cartesian; point on border non-determinate|| ||Franklin||0.047||Cartesian; point on border non-determinate|| == Source == The source programs of the comparison benchmark are be available for download: https://svn.boost.org/svn/boost/sandbox/ggl/other/comparisons. Note that the GGL uses its latest sources, which are not yet in the Boost sandbox, so polygon intersection will not yet compile. This will be resolved (moved) soon.