DESCRIPTION

hd.hdfs.in.vector module is provide conversion of GRASS map serialized GeoJSON and copy it to HDFS

NOTES

Vector maps in native GRASS format are not suitable for serialization which is needed to exploit the potential of spatial frameworks for Hadoop. The effective way and in the most cases the only possible is to store spatial data in JSON, especially GeoJSON. This format suits well for serialization and library for reading is available in catalog of Hive. Module hd.hdfs.in.vector supports transformation of GRASS map to GeoJSON format and transfer to HDFS. Behind the module there are two main steps. Firstly, the map is converted to GeoJSON using v.out.ogr and edited to format which is suitable for parsing by widely used SerDe functions for Hive. After that, custom GeoJSON format is uploaded to the destination on HDFS. By default, the HDFS path is set to hdfs://grass_data_hdfs/LOCATION_NAME/MAPSET/vector. In addition, hd.hdfs.* package also includes module hd.hdfs.in.fs which allows transfer of external files to HDFS. Usage of this module becomes important for uploading CSV or GeoJSON files outside of GRASS. For uploading external GoeJSON files to HDFS it is necessary to modify its standardized format. The serialization for JSON has several formatting requirements. See documentation on wiki page.

EXAMPLES

PUT vector map to HDFS
hd.hdfs.in.vector  driver=webhdfs  hdfs=/data map=klad_zm10 layer=1

SEE ALSO

hd.hdfs.in.fs, hd.hdfs.in.vector, hd.hdfs.out.vector, hd.hdfs.info, hd.hive.execute, hd.hive.csv.table, hd.hive.select, hd.hive.info, hd.hive.json.table

See also related wiki page.

AUTHOR

Matej Krejci, OSGeoREL at the Czech Technical University in Prague, developed during master thesis project 2016 (mentor: Martin Landa)