.. _vector_optimization:

*****************************************************************************
 Vector 
*****************************************************************************

:Author:        HostGIS 
:Revision: $Revision$
:Date: $Date$
:Last Updated: 2008/08/08

.. contents:: Table of Contents
    :depth: 2
    :backlinks: top


Splitting your data
-------------------

If you find yourself making several layers, all of them using the same dataset
but filtering to only use some of the records, you could probably do it
better. If the criteria are static, one approach is to pre-split the data.

The *ogr2ogr* utility can select on certain features from a datasource, and
save them to a new data source. Thus, you can split your dataset into several
smaller ones that are already effectively filtered, and remove the FILTER
statement.

Shapefiles
----------
Use :ref:`shptree` to generate a spatial index on your shapefile. This is
quick and easy ("shptree foo.shp") and generates a .qix file. MapServer will
automagically detect an index and use it.

.. note:

    :ref:`tileindex` :ref:`shapefiles` can be indexed with :ref:`shptree`.

MapServer also comes with the :ref:`sortshp` utility. This reorganizes a
shapefile, sorting it according to the values in one of its columns. If you're
commonly filtering by criteria and it's almost always by a specific column,
this can make the process slightly more efficient.

Although shapefiles are a very fast data format, :ref:`PostGIS
<input_postgis>` is pretty speedy as well, especially if you use indexes well
and have memory to throw at caching.

PostGIS
-------

The single biggest boost to performance is indexing. Make sure that there's a
GIST index on the geometry column, and each record should also have an indexed
primary key. If you used shp2pgsql, then these statements should create the
necessary indexes:

.. code-block:: sql

    ALTER TABLE table ADD PRIMARY KEY (gid);
    CREATE INDEX table_the_geom ON table (the_geom) USING GIST;

PostgreSQL also supports reorganizing the data in a table, such that it's
physically sorted by the index. This allows PostgreSQL to be much more
efficient in reading the indexed data. Use the CLUSTER command, e.g.

.. code-block:: sql 

    CLUSTER the_geom ON table;

Then there are numerous optimizations one can perform on the database server
itself, aside from the geospatial component. The easiest is to increase
*max_buffers* in the *postgresql.conf* file, which allows PostgreSQL to use
more memory for caching. More information can be found at the `PostgreSQL
website`_.

Databases in General (PostGIS, Oracle, MySQL)
---------------------------------------------

By default, MapServer opens and closes a new database connection for each
database-driven layer in the mapfile. If you have several layers reading from
the same database, this doesn't make a lot of sense. And with some databases
(Oracle) establishing connections takes enough time that it can become
significant.

Try adding this line to your database layers:

.. code-block:: mapfile 

    PROCESSING "CLOSE_CONNECTION=DEFER"

This causes MapServer to not close the database connection for each layer until
after it has finished processing the mapfile and this may shave a few seconds
off of map generation times.

.. #### rST Link Section ####

.. _`PostgreSQL website`: http://www.postgresql.org/