Comma Separated Value (.csv)

OGR supports reading and writing primarily non-spatial tabular data stored in text CSV files. CSV files are a common interchange format between software packages supporting tabular data and are also easily produced manually with a text editor or with end-user written scripts or programs.

While in theory .csv files could have any extension, in order to auto-recognise the format OGR only supports CSV files ending with the extention ".csv". The datasource name may be either a single CSV file or to a directory. For a directory to be recognised as a .csv datasource at least half the files in the directory need to have the extension .csv. One layer (table) is produced from each .csv file accessed.

The OGR CSV driver supports reading and writing. Because the CSV format has variable length text lines, reading is done sequentially. Reading features in random order will generally be very slow. OGR CSV layer never have any coordinate system. When reading a field named "WKT" is assumed to contain WKT geometry, but also is treated as a regular field. The OGR CSV driver returns all attribute columns with a type of string if no field type information file (with .csvt extension) is available.

Limited type recognition can be done for Integer, Real, String, Date (YYYY-MM-DD), Time (HH:MM:SS+nn) and DateTime (YYYY-MM-DD HH:MM:SS+nn) columns through a descriptive file with same name as the CSV file, but .csvt extension. In a single line the types for each column have to be listed: double quoted and comma separated (e.g., "Integer","String"). It is also possible to specify explicitely the width and precision of each column, e.g. "Integer(5)","Real(10.7)","String(15)". The driver will then use these types as specified for the csv columns.

Format

CSV files have one line for each feature (record) in the layer (table). The attribute field values are separated by commas. At least two fields per line must be present. Lines may be terminated by a DOS (CR/LF) or Unix (LF) style line terminators. Each record should have the same number of fields. Starting with GDAL 1.7.0, the driver will also accept a semicolon or a tabulation character as field separator . This autodetection will work only if there's no other potential separator on the first line of the CSV file. Otherwise it will default to comma as separator.

Complex attribute values (such as those containing commas, quotes or newlines) may be placed in double quotes. Any occurances of double quotes within the quoted string should be doubled up to "escape" them.

The driver attempts to treat the first line of the file as a list of field names for all the fields. However, if one or more of the names is all numeric it is assumed that the first line is actually data values and dummy field names are generated internally (field_1 through field_n) and the first record is treated as a feature.

Example (employee.csv):

ID,Salary,Name,Comments
132,55000.0,John Walker,"The ""big"" cheese."
133,11000.0,Jane Lake,Cleaning Staff

Note that the Comments value for the first data record is placed in double quotes because the value contains quotes, and those quotes have to be doubled up so we know we haven't reached the end of the quoted string yet.

Many variations of textual input are sometimes called Comma Separated Value files, including files without commas, but fixed column widths, those using tabs as seperators or those with other auxilary data defining field types or structure. This driver does not attempt to support all such files, but instead to support simple .csv files that can be auto-recognised. Scripts or other mechanisms can generally be used to convert other variations into a form that is compatible with the OGR CSV driver.

Reading CSV containing spatial information

It is possible to extract spatial information (points) from a CSV file which has columns for the X and Y coordinates, through the use of the VRT driver

Consider the following CSV file (test.csv):

Latitude,Longitude,Name
48.1,0.25,"First point"
49.2,1.1,"Second point"
47.5,0.75,"Third point"

You can write the associated VRT file (test.vrt):

<OGRVRTDataSource>
    <OGRVRTLayer name="test">
        <SrcDataSource>test.csv</SrcDataSource>
        <GeometryType>wkbPoint</GeometryType>
        <LayerSRS>WGS84</LayerSRS>
        <GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/>
    </OGRVRTLayer>
</OGRVRTDataSource>

and ogrinfo -ro -al test.vrt will return :

OGRFeature(test):1
  Latitude (String) = 48.1
  Longitude (String) = 0.25
  Name (String) = First point
  POINT (0.25 48.1 0)

OGRFeature(test):2
  Latitude (String) = 49.2
  Longitude (String) = 1.1
  Name (String) = Second point
  POINT (1.1 49.200000000000003 0)

OGRFeature(test):3
  Latitude (String) = 47.5
  Longitude (String) = 0.75
  Name (String) = Third point
  POINT (0.75 47.5 0)

Creation Issues

The driver supports creating new databases (as a directory of .csv files), adding new .csv files to an existing directory or .csv files or appending features to an existing .csv table. Deleting or replacing existing features is not supported.

Layer Creation options:

LINEFORMAT: By default when creating new .csv files they are created with the line termination conventions of the local platform (CR/LF on win32 or LF on all other systems). This may be overridden through use of the LINEFORMAT layer creation option which may have a value of CRLF (DOS format) or LF (Unix format).

GEOMETRY (Starting with GDAL 1.6.0): By default, the geometry of a feature written to a .csv file is discarded. It is possible to export the geometry in its WKT representation by specifying GEOMETRY=AS_WKT. It is also possible to export point geometries into their X,Y,Z components (different columns in the csv file) by specifying GEOMETRY=AS_XYZ, GEOMETRY=AS_XY or GEOMETRY=AS_YX. The geometry column(s) will be prepended to the columns with the attributes values.

CREATE_CSVT=YES/NO (Starting with GDAL 1.7.0): Create the associated .csvt file (see above paragraph) to describe the type of each column of the layer and its optional width and precision. Default value : NO

SEPARATOR=COMMA/SEMICOLON/TAB (Starting with GDAL 1.7.0): Field separator character. Default value : COMMA

Examples

This example shows using ogr2ogr to transform a shapefile with point geometry into a .csv file with the X,Y,Z coordinates of the points as first columns in the .csv file
```
ogr2ogr -f CSV output.csv input.shp -lco GEOMETRY=AS_XYZ
```

Other Notes

Development of the OGR CSV driver was supported by DM Solutions Group and GoMOOS.