Geospatial PDF

(Available for GDAL >= 1.8.0)

GDAL supports reading Geospatial PDF documents, by extracting georeferencing information and rasterizing the data. Non-geospatial PDF documents will also be recognized by the driver.

Starting with GDAL >= 1.10.0, PDF documents can be created from other GDAL raster datasets, and OGR datasources can also optionally be drawn on top of the raster layer (see OGR_* creation options in the below section).

The driver supports reading georeferencing encoded in either of the 2 current existing ways : according to the OGC encoding best practice, or according to the Adobe Supplement to ISO 32000.

Multipage documents are exposed as subdatasets, one subdataset par page of the document.

Vector support

Starting with GDAL 1.10, this driver can read and write geospatial PDF with vector features. Vector read support requires linking to one of the above mentioned dependent libraries, but write support does not. The driver can read vector features encoded according to PDF's logical structure facilities (as described by "ยง10.6 - Logical Structure" of PDF spec), or retrieve only vector geometries for other vector PDF files.

If there is no such logical structure, the driver will not try to interpret the vector content of the PDF, unless you defined the OGR_PDF_READ_NON_STRUCTURED configuration option to YES.

Feature style support

For write support, the driver has partial support for the style information attached to features, encoded according to the OGR Feature Style Specification.

The following tools are recognized:

The supported attributes for each tool are summed up in the following table:

Tool Supported attributes Example
PENcolor (c); width (w); dash pattern (p)PEN(c:#FF0000,w:5px)
BRUSHforeground color (fc)BRUSH(fc:#0000FF)
LABELGDAL >= 2.3.0: text (t), limited to ASCII strings; font name (f), see note below; font size (s); bold (bo); italic (it); text color (c); x and y offsets (dx, dy); angle (a); anchor point (p), values 1 through 9; stretch (w)LABEL(c:#000000,t:"Hello World!",s:5g)
GDAL <= 2.2.x: text (t), limited to ASCII strings; font size (s); text color (c); x and y offsets (dx, dy); angle (a)
SYMBOLid (ogr-sym-0 to ogr-sym-9, and filenames for raster symbols); color (c); size (s)SYMBOL(c:#00FF00,id:"ogr- sym-3",s:10)
SYMBOL(c:#00000080,id:"a_symbol.png")

Alpha values are supported for colors to control the opacity. If not specified, for BRUSH, it is set at 50% opaque.

For SYMBOL with a bitmap name, only the alpha value of the color specified with 'c' is taken into account.

A font name starting with "Times" or containing the string "Serif" (case sensitive) will be treated as Times. A font name starting with "Courier" or containing the string "Mono" (case sensitive) will be treated as Courier. All other font names will be treated as Helvetica.

Metadata

The neatline (for OGC best practice) or the bounding box (Adobe style) will be reported as a NEATLINE metadata item, so that it can be later used as a cutline for the warping algorithm.

Starting with GDAL 1.9.0, XMP metadata can be extracted from the file, and will be stored as XML raw content in the xml:XMP metadata domain.

Starting with GDAL 1.10.0, additional metadata, such as found in USGS Topo PDF can be extracted from the file, and will be stored as XML raw content in the EMBEDDED_METADATA metadata domain.

Configuration options

Open Options

Since GDAL 2.0, above configuration options are also available as open options.

LAYERS Metadata domain

Starting with GDAL >= 1.10.0 and when GDAL is compiled against Poppler or PDFium, the LAYERS metadata domain can be queried to retrieve layer names that can be turned ON or OFF. This is useful to know which values to specify for the GDAL_PDF_LAYERS or GDAL_PDF_LAYERS_OFF configuration options.

For example :

$ gdalinfo ../autotest/gdrivers/data/adobe_style_geospatial.pdf -mdd LAYERS

Driver: PDF/Geospatial PDF
Files: ../autotest/gdrivers/data/adobe_style_geospatial.pdf
[...]
Metadata (LAYERS):
  LAYER_00_NAME=New_Data_Frame
  LAYER_01_NAME=New_Data_Frame.Graticule
  LAYER_02_NAME=Layers
  LAYER_03_NAME=Layers.Measured_Grid
  LAYER_04_NAME=Layers.Graticule
[...]

$ gdal_translate ../autotest/gdrivers/data/adobe_style_geospatial.pdf out.tif --config GDAL_PDF_LAYERS_OFF "New_Data_Frame"

Restrictions

The opening of a PDF document (to get the georeferencing) is fast, but at the first access to a raster block, the whole page will be rasterized (with Poppler), which can be a slow operation.

Note: starting with GDAL 1.10, some raster-only PDF files (such as some USGS GeoPDF files), that are regularly tiled are exposed as tiled dataset by the GDAL PDF driver, and can be rendered with any backends.

Only a few of the possible Datums available in the OGC best practice spec have been currently mapped in the driver. Unrecognized datums will be considered as being based on the WGS84 ellipsoid.

For documents that contain several neatlines in a page (insets), the georeferencing will be extracted from the inset that has the largest area (in term of screen points).

Creation Issues (GDAL >= 1.10.0)

PDF documents can be created from other GDAL raster datasets, that have 1 band (graylevel or with color table), 3 bands (RGB) or 4 bands (RGBA).

Georeferencing information will be written by default according to the ISO32000 specification. It is also possible to write it according to the OGC Best Practice conventions (but limited to a few datum and projection types).

Note: PDF write support does not require linking to any backend.

Creation Options

Update of existing files (GDAL >= 1.10.0)

Existing PDF files (created or not with GDAL) can be opened in update mode in order to set or update the following elements : For geotransform or GCPs, the Geo encoding method used by default is ISO32000. OGC_BP can be selected by setting the GDAL_PDF_GEO_ENCODING configuration option to OGC_BP.

Updated elements are written at the end of the file, following the incremental update method described in the PDF specification.

Build dependencies

For read support, GDAL must be built against one of the following libraries :

Note: it is also possible to build against a combination of several of the above libraries. PDFium will be used in priority over Poppler, itself used in priority over PoDoFo.

Unix build

The relevant configure options are --with-poppler, --with-podofo, --with-podofo-lib and --with-podofo-extra-lib-for-test.

Starting with GDAL 2.1.0, --with-pdfium, --with-pdfium-lib, --with-pdfium-extra-lib-for-test and --enable-pdf-plugin are also available.

Poppler

libpoppler itself must have been configured with --enable-xpdf-headers so that the xpdf C++ headers are available. Note: the poppler C++ API isn't stable, so the driver compilation may fail with too old or too recent poppler versions. Successfully tested versions are poppler >= 0.12.X and <= 0.31.0.

PoDoFo

As a partial alternative, the PDF driver can be compiled against libpodofo to avoid the libpoppler dependency. This is sufficient to get the georeferencing and vector information. However, for getting the imagery, the pdftoppm utility that comes with the poppler distribution must be available in the system PATH. A temporary file will be generated in a directory determined by the following configuration options : CPL_TMPDIR, TMPDIR or TEMP (in that order). If none are defined, the current directory will be used. Successfully tested versions are libpodofo 0.8.4, 0.9.1 and 0.9.3. Important note: using PoDoFo 0.9.0 is strongly discouraged, as it could cause crashes in GDAL due to a bug in PoDoFo.

PDFium (GDAL > 2.1.0)

Using PDFium as a backend allows access to raster, vector, georeferencing and other metadata. The PDFium backend has also support for arbitrary overviews, for fast zoom-out.

Only GDAL builds against static builds of PDFium have been tested. Building PDFium can be challenging. A PDFium forked version for simpler builds is available (for Windows, a dedicated win_gdal_build branch is recommended). A build repository is available with a few scripts that can be used as a template to build PDFium for Linux/MacOSX/Windows. Those forked versions remove the dependency to the V8 JavaScript engine, and have also a few changes to avoid symbol clashes, on Linux, with libjpeg and libopenjpeg. Building the PDF driver as a GDAL plugin is also a way of avoiding such issues. PDFium build requires a C++11 compatible compiler, as well as for building GDAL itself against PDFium. Successfully tested versions are GCC 4.7.0 (previous versions aren't compatible) and Visual Studio 12 / VS2013.

Examples

See also

Specifications :

Libraries :

Samples :