Comma Separated Value (.csv)

OGR supports reading and writing primarily non-spatial tabular data stored in text CSV files. CSV files are a common interchange format between software packages supporting tabular data and are also easily produced manually with a text editor or with end-user written scripts or programs.

While in theory .csv files could have any extension, in order to auto-recognise the format OGR only supports CSV files ending with the extension ".csv". The datasource name may be either a single CSV file or point to a directory. For a directory to be recognised as a .csv datasource at least half the files in the directory need to have the extension .csv. One layer (table) is produced from each .csv file accessed.

Starting with GDAL 1.8.0, for files structured as CSV, but not ending with .CSV extension, the 'CSV:' prefix can be added before the filename to force loading by the CSV driver.

The OGR CSV driver supports reading and writing. Because the CSV format has variable length text lines, reading is done sequentially. Reading features in random order will generally be very slow. OGR CSV layer might have a coordinate system stored in a .prj file (see GeoCSV specification). When reading a field named "WKT" is assumed to contain WKT geometry, but also is treated as a regular field. The OGR CSV driver returns all attribute columns as string data types if no field type information file (with .csvt extension) is available.

Limited type recognition can be done for Integer, Real, String, Date (YYYY-MM-DD), Time (HH:MM:SS+nn), DateTime (YYYY-MM-DD HH:MM:SS+nn) columns through a descriptive file with the same name as the CSV file, but a .csvt extension. In a single line the types for each column have to be listed with double quotes and be comma separated (e.g., "Integer","String"). It is also possible to specify explicitly the width and precision of each column, e.g. "Integer(5)","Real(10.7)","String(15)". The driver will then use these types as specified for the csv columns. Starting with GDAL 2.0, subtypes can be passed between parenthesis, such as "Integer(Boolean)", "Integer(Int16)" and "Real(Float32)". Starting with GDAL 2.1, accordingly with the GeoCSV specification, the "CoordX" or "Point(X)" type can be used to specify a column with longitude/easting values, "CoordY" or "Point(Y)" for latitude/northing values and "WKT" for geometries encoded in WKT

Starting with GDAL 2.2, the "JSonStringList", "JSonIntegerList", "JSonInteger64List" and "JSonRealList" types can be used in .csvt to map to the corresponding OGR StringList, IntegerList, Integer64List and RealList types. The field values are then encoded as JSon arrays, with proper CSV escaping.

Starting with GDAL 2.0, automatic field type guessing can also be done if specifying the open options described in the below "Open options" section.

Format

CSV files have one line for each feature (record) in the layer (table). The attribute field values are separated by commas. At least two fields per line must be present. Lines may be terminated by a DOS (CR/LF) or Unix (LF) style line terminators. Each record should have the same number of fields. The driver will also accept a semicolon, a tabulation or a space (GDAL >= 2.0 for the later) character as field separator . This autodetection will work only if there's no other potential separator on the first line of the CSV file. Otherwise it will default to comma as separator.

Complex attribute values (such as those containing commas, quotes or newlines) may be placed in double quotes. Any occurrences of double quotes within the quoted string should be doubled up to "escape" them.

By default, the driver attempts to treat the first line of the file as a list of field names for all the fields. However, if one or more of the names is all numeric it is assumed that the first line is actually data values and dummy field names are generated internally (field_1 through field_n) and the first record is treated as a feature. Starting with GDAL 1.9.0 numeric values are treated as field names if they are enclosed in double quotes. Starting with GDAL 2.1, this behaviour can be modified via the HEADERS open option.

All CSV files are treated as UTF-8 encoded. Starting with GDAL 1.9.0, a Byte Order Mark (BOM) at the beginning of the file will be parsed correctly. From 1.9.2, The option WRITE_BOM can be used to create a file with a Byte Order Mark, which can improve compatibility with some software (particularly Excel).

Example (employee.csv):
ID,Salary,Name,Comments
132,55000.0,John Walker,"The ""big"" cheese."
133,11000.0,Jane Lake,Cleaning Staff

Note that the Comments value for the first data record is placed in double quotes because the value contains quotes, and those quotes have to be doubled up so we know we haven't reached the end of the quoted string yet.

Many variations of textual input are sometimes called Comma Separated Value files, including files without commas, but fixed column widths, those using tabs as separators or those with other auxiliary data defining field types or structure. This driver does not attempt to support all such files, but instead to support simple .csv files that can be auto-recognised. Scripts or other mechanisms can generally be used to convert other variations into a form that is compatible with the OGR CSV driver.

Reading CSV containing spatial information

Building point geometries

Consider the following CSV file (test.csv):

Latitude,Longitude,Name
48.1,0.25,"First point"
49.2,1.1,"Second point"
47.5,0.75,"Third point"

Starting with GDAL 2.1, it is possible to directly specify the potential names of the columns that can contain X/longitude and Y/latitude with the X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES open option.

ogrinfo -ro -al test.csv -oo X_POSSIBLE_NAMES=Lon* -oo Y_POSSIBLE_NAMES=Lat* -oo KEEP_GEOM_COLUMNS=NO will return :

OGRFeature(test):1
  Name (String) = First point
  POINT (0.25 48.1)

OGRFeature(test):2
  Name (String) = Second point
  POINT (1.1 49.2)

OGRFeature(test):3
  Name (String) = Third point
  POINT (0.75 47.5)

Otherwise, if one or several columns contain a geometry definition encoded as WKT, WKB (encoded in hexadecimal) or GeoJSON (in which case the GeoJSON content must be formatted to follow CSV rules, that is to say it must be surrounded by double-quotes, and double-quotes inside the string must be repeated for proper escaping), the name of such column(s) the GEOM_POSSIBLE_NAMES open option.

For older versions, it is possible to extract spatial information (points) from a CSV file which has columns for the X and Y coordinates, through the use of the VRT driver.

You can write the associated VRT file (test.vrt):
<OGRVRTDataSource>
    <OGRVRTLayer name="test">
        <SrcDataSource>test.csv</SrcDataSource>
        <GeometryType>wkbPoint</GeometryType>
        <LayerSRS>WGS84</LayerSRS>
        <GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/>
    </OGRVRTLayer>
</OGRVRTDataSource>

and ogrinfo -ro -al test.vrt will return :
OGRFeature(test):1
  Latitude (String) = 48.1
  Longitude (String) = 0.25
  Name (String) = First point
  POINT (0.25 48.1 0)

OGRFeature(test):2
  Latitude (String) = 49.2
  Longitude (String) = 1.1
  Name (String) = Second point
  POINT (1.1 49.200000000000003 0)

OGRFeature(test):3
  Latitude (String) = 47.5
  Longitude (String) = 0.75
  Name (String) = Third point
  POINT (0.75 47.5 0)

Building line geometries

Consider the following CSV file (test.csv):

way_id,pt_id,x,y
1,1,2,49
1,2,3,50
2,1,-2,49
2,2,-3,50

With a GDAL build with Spatialite enabled, ogrinfo test.csv -dialect SQLite -sql "SELECT way_id, MakeLine(MakePoint(CAST(x AS float),CAST(y AS float))) FROM test GROUP BY way_id" will return :
OGRFeature(SELECT):0
  way_id (String) = 1
  LINESTRING (2 49,3 50)

OGRFeature(SELECT):1
  way_id (String) = 2
  LINESTRING (-2 49,-3 50)

Open options

Starting with GDAL 2.0, the following open options can be specified (typically with the -oo name=value parameters of ogrinfo or ogr2ogr):

Creation Issues

The driver supports creating new databases (as a directory of .csv files), adding new .csv files to an existing directory or .csv files or appending features to an existing .csv table. Starting with GDAL 2.1, deleting or replacing existing features, or adding/modifying/deleting fields is supported, provided the modifications done are small enough to be stored in RAM temporarily before flushing to disk.

Layer Creation options:

Configuration options (set with "--config key value" on command line utilities):

VSI Virtual File System API support

(Some features below might require OGR >= 1.9.0)

The driver supports reading and writing to files managed by VSI Virtual File System API, which include "regular" files, as well as files in the /vsizip/ (read-write) , /vsigzip/ (read-only) , /vsicurl/ (read-only) domains.

Writing to /dev/stdout or /vsistdout/ is also supported.

Examples

Particular datasources

The CSV driver can also read files whose structure is close to CSV files :

Other Notes