Geospatial PDF

(Available for GDAL >= 1.8.0)

GDAL supports reading Geospatial PDF documents, by extracting georeferencing information and rasterizing the data. Non-geospatial PDF documents will also be recognized by the driver.

Starting with GDAL >= 1.10.0, PDF documents can be created from other GDAL raster datasets, and OGR datasources can also optionally be drawn on top of the raster layer (see OGR_* creation options in the below section).

GDAL must be compiled with libpoppler support (GPL-licensed), and libpoppler itself must have been configured with --enable-xpdf-headers so that the xpdf C++ headers are available. Note: the poppler C++ API isn't stable, so the driver compilation may fail with too old or too recent poppler versions. Successfully tested versions are poppler >= 0.12.X and <= 0.24.0.

Starting with GDAL 1.9.0, as an alternative, the PDF driver can be compiled against libpodofo (LGPL-licensed) to avoid the libpoppler dependency. This is sufficient to get the georeferencing information. However, for getting the imagery, the pdftoppm utility that comes with the poppler distribution must be available in the system PATH. A temporary file will be generated in a directory determined by the following configuration options : CPL_TMPDIR, TMPDIR or TEMP (in that order). If none are defined, the current directory will be used. Successfully tested versions are libpodofo 0.8.4 and 0.9.1.

The driver supports reading georeferencing encoded in either of the 2 current existing ways : according to the OGC encoding best practice, or according to the Adobe Supplement to ISO 32000.

Multipage documents are exposed as subdatasets, one subdataset par page of the document.

The neatline (for OGC best practice) or the bounding box (Adobe style) will be reported as a NEATLINE metadata item, so that it can be later used as a cutline for the warping algorithm.

Starting with GDAL 1.9.0, XMP metadata can be extracted from the file, and will be stored as XML raw content in the xml:XMP metadata domain.

Starting with GDAL 1.10.0, additional metadata, such as found in USGS Topo PDF can be extracted from the file, and will be stored as XML raw content in the EMBEDDED_METADATA metadata domain.

Configuration options

LAYERS Metadata domain

Starting with GDAL >= 1.10.0 and when GDAL is compiled against libpoppler, the LAYERS metadata domain can be queried to retrieve layer names that can be turned ON or OFF. This is usefull to know which values to specify for the GDAL_PDF_LAYERS or GDAL_PDF_LAYERS_OFF configuration options.

For example :

$ gdalinfo ../autotest/gdrivers/data/adobe_style_geospatial.pdf -mdd LAYERS

Driver: PDF/Geospatial PDF
Files: ../autotest/gdrivers/data/adobe_style_geospatial.pdf
[...]
Metadata (LAYERS):
  LAYER_00_NAME=New_Data_Frame
  LAYER_01_NAME=New_Data_Frame.Graticule
  LAYER_02_NAME=Layers
  LAYER_03_NAME=Layers.Measured_Grid
  LAYER_04_NAME=Layers.Graticule
[...]

$ gdal_translate ../autotest/gdrivers/data/adobe_style_geospatial.pdf out.tif --config GDAL_PDF_LAYERS_OFF "New_Data_Frame"

Restrictions

The opening of a PDF document (to get the georeferencing) is fast, but at the first access to a raster block, the whole page will be rasterized, which can be a slow operation.

Note: starting with GDAL 1.10, some raster-only PDF files (such as some USGS GeoPDF files), that are regularly tiled are exposed as tiled dataset by the GDAL PDF driver, and can be rendered with either the Poppler or the Podofo backends.

Only a few of the possible Datums available in the OGC best practice spec have been currently mapped in the driver. Unrecognized datums will be considered as being based on the WGS84 ellipsoid.

For documents that contain several neatlines in a page (insets), the georeferencing will be extracted from the inset that has the largest area (in term of screen points).

Creation Issues (GDAL >= 1.10.0)

PDF documents can be created from other GDAL raster datasets, that have 1 band (graylevel or with color table), 3 bands (RGB) or 4 bands (RGBA).

Georeferencing information will be written by default according to the ISO32000 specification. It is also possible to write it according to the OGC Best Practice conventions (but limited to a few datum and projection types).

Note: PDF write support does not require linking to poppler or podofo.

Creation Options

Update of existing files (GDAL >= 1.10.0)

Existing PDF files (created or not with GDAL) can be opened in update mode in order to set or update the following elements : For geotransform or GCPs, the Geo encoding method used by default is ISO32000. OGC_BP can be selected by setting the GDAL_PDF_GEO_ENCODING configuration option to OGC_BP.

Updated elements are written at the end of the file, following the incremental update method described in the PDF specification.

Examples

See also

Other drivers :

Specifications :

Libraries :

Samples :