Geospatial PDF

(Available for GDAL >= 1.8.0)

GDAL supports reading Geospatial PDF documents, by extracting georeferencing information and rasterizing the data. Non-geospatial PDF documents will also be recognized by the driver.

GDAL must be compiled with libpoppler support (GPL-licensed), and libpoppler itself must have been configured with --enable-xpdf-headers so that the xpdf C++ headers are available. Note: the poppler C++ API isn't stable, so the driver compilation may fail with too old or too recent poppler versions. Successfully tested versions are poppler >= 0.12.X and <= 0.18.0.

Starting with GDAL 1.9.0, as an alternative, the PDF driver can be compiled against libpodofo (LGPL-licensed) to avoid the libpoppler dependency. This is sufficient to get the georeferencing information. However, for getting the imagery, the pdftoppm utility that comes with the poppler distribution must be available in the system PATH. A temporary file will be generated in a directory determined by the following configuration options : CPL_TMPDIR, TMPDIR or TEMP (in that order). If none are defined, the current directory will be used. Successfully tested versions are libpodofo 0.8.4 and 0.9.1.

The driver supports reading georeferencing encoded in either of the 2 current existing ways : according to the OGC encoding best practice, or according to the Adobe Supplement to ISO 32000.

The dimensions of the raster can be controlled by specifying the DPI of the rasterization with the GDAL_PDF_DPI configuration option. Its default value is 150.

Multipage documents are exposed as subdatasets, one subdataset par page of the document.

The neatline (for OGC best practice) or the bounding box (Adobe style) will be reported as a NEATLINE metadata item, so that it can be later used as a cutline for the warping algorithm.

Starting with GDAL 1.9.0, XMP metadata can be extracted from the file, and will be stored as XML raw content in the xml:XMP metadata domain.

Restrictions

The opening of a PDF document (to get the georeferencing) is fast, but at the first access to a raster block, the whole page will be rasterized, which can be a slow operation.

Only a few of the possible Datums available in the OGC best practice spec have been currently mapped in the driver. Unrecognized datums will be considered as being based on the WGS84 ellipsoid.

For documents that contain several neatlines in a page (insets), the georeferencing will be extracted from the inset that has the largest area (in term of screen points).

There is currently no support for selective layer rendering.

See also