Details Ticket 4564


Comment | Reply | Take | Resolve


Serial Number 4564
Subject v.in.ogr w/ large datasets
Area grass6
Queue grass
Requestors Brian.Beckage@uvm.edu,holl@gdf-hannover.de
Owner none
Status open
Last User Contact Tue Jun 13 04:07:35 2006 (2 yr ago)
Current Priority 50
Final Priority 70
Due No date assigned
Last Action Thu Aug 3 15:31:06 2006 (2 yr ago)
Created Tue Apr 19 06:20:34 2005 (3 yr ago)

Transaction History Ticket 4564


Tue, Apr 19 2005 06:20:34    Request created by guest (as #3161)  
Subject: inability to import shape files using v.in.ogr

Platform: Mac OSX
grass obtained from: Trento Italy site
grass binary for platform: Downloaded precompiled Binaries
GRASS Version: 6.0.0

Please enter your name and error description here

From Brian Beckage:

I have had been unable to import a vector file (using v.in.ogr) of the Everglades
coastline in Florida, 
USA.  The coastline is very irregular with many small islands.  My machine worked
on this for three 
days, and appeared to be making very slow progress (e.g., the line number at
the bottom of the 
output shown below keeps increasing), albeit slowly.  I estimated that it would
take about a week to 
complete this import.  I'm using Grass 6.0.0 on Mac OSX 10.3.8 (1GHZ, 1 GB RAM)
and the output is 
displayed below.  I've had to kill the process.

>v.in.ogr dsn=/Users/PROJECTS/ENP/Plot_Info/GIS/PaulsCD/k24bound.shp output=k24bound
A datum name nad27 (North_American_Datum_1927) was specified without transformation
parameters.
WARNING: Non-interactive mode: the GRASS default for nad27 is
         towgs84=-22.000,157.000,176.000.
Projection of input dataset and current location appear to match.
Proceeding with import...
Layer: k24bound
-----------------------------------------------------
Building topology ...
18233 primitives registered      
Building areas:  100%
18233 areas built      
18233 isles built
Attaching islands:  100%
Attaching centroids:  100%
Topology was built.
Number of nodes     :   18233
Number of primitives:   18233
Number of points    :   0
Number of lines     :   0
Number of boundaries:   18233
Number of centroids :   0
Number of areas     :   18233
Number of isles     :   18233
Number of areas without centroid :   18233
-----------------------------------------------------
WARNING: Cleaning polygons, result is not guaranteed!
Building topology ...
Topology was built.
Number of nodes     :   18233
Number of primitives:   18233
Number of points    :   0
Number of lines     :   0
Number of boundaries:   18233
Number of centroids :   0
Number of areas     :   -
Number of isles     :   -
-----------------------------------------------------
Break polygons:
Registering points ... 745775
All points (vertices): 764031
Registered points (unique coordinates): 745775
Points marked for break: 18254
Breaks:    65
-----------------------------------------------------
Remove duplicates:
Duplicates:     0
-----------------------------------------------------
Break boundaries:
Intersections: 297999 (line 89173)





Tue, Feb 21 2006 20:41:26    Mail sent by msieczka (as #3161)  
Current vector engine is very memory demanding in case of big datasets. The
defualt DBF database driver is another memory hog, and badly slow. You could
speed up things using other DB backend. Postgres is often advised. Sqlite,
although very powerfull and in most cases faster than DBF etremely slow for
v.in.ogr, for a reason I don't understand, so it won't be any help in your
case. It still needs otimization.

Maciek
Tue, May 9 2006 12:51:15    Comments added by hbowman (as #3161)  
see these threads:
 http://thread.gmane.org/gmane.comp.gis.grass.user/7391/focus=7394
 http://thread.gmane.org/gmane.comp.gis.grass.devel/7417
 http://article.gmane.org/gmane.comp.gis.grass.devel/7400
 http://grass.itc.it/pipermail/grassuser/2005-April/028518.html
 http://grass.itc.it/pipermail/grass-dev/2005-March/017745.html


Hamish
Fri, Jun 9 2006 14:47:57    Request created by sholl  
Subject: v.in.ogr w/ large datasets

Dear GRASSers,

we have problems importing a large dataset (~300.000 features) with v.in.ogr.
It eats up all mem and dies without importing.

We have tried using with -c (no clean) in order to reduce the mem consumption,
but without luck.

At least this thread seems to be realated to our problem.
http://grass.itc.it/pipermail/grass-dev/2006-May/023088.html

Some links to similar problems discussed on the mailinglist:
http://grass.itc.it/pipermail/grass-dev/2006-May/022917.html

Radim wrote something about it, but did not elaborate where to start solving
this problem.

Best regards

   Stephan
Mon, Jun 12 2006 05:48:08    Mail sent by hamish_nospam@yahoo.com  
Return-Path <hamish_nospam@yahoo.com>
Delivered-To grass-bugs@lists.intevation.de
Date Mon, 12 Jun 2006 15:12:21 +1200
From Hamish <hamish_nospam@yahoo.com>
To Request Tracker <grass-bugs@intevation.de>
Cc grass-dev@grass.itc.it
Subject Re: [GRASS-dev] [bug #4564] (grass) v.in.ogr w/ large datasets
Message-Id <20060612151221.216cd498.hamish_nospam@yahoo.com>
In-Reply-To <20060609124757.651E91006A4@lists.intevation.de>
References <20060609124757.651E91006A4@lists.intevation.de>
X-Mailer Sylpheed version 1.0.4 (GTK+ 1.2.10; i386-pc-linux-gnu)
X-Face M<EoB)"*Z~u!,vFhXmw}R_KbdBta*P_=T|rbBL'e1/CQ9;/1g\BU3&!=y8ria$2Uk!HT&BB 8i?|X_+7~1jsy}F~g$2va%3fV`*=L(*cem[@3\yg,G,@rg6/QMJ
Mime-Version 1.0
Content-Type text/plain; charset=US-ASCII
Content-Transfer-Encoding 7bit
X-Spam-Status No, hits=0.9 tagged_above=-999.0 required=3.0 tests=FORGED_YAHOO_RCVD
X-Spam-Level
> this bug's URL: http://intevation.de/rt/webrt?serial_num=4564
> ---------------------------------------------------------------------
> 
> Subject: v.in.ogr w/ large datasets
> 
> Dear GRASSers,
> 
> we have problems importing a large dataset (~300.000 features) with
> v.in.ogr. It eats up all mem and dies without importing.
> 
> We have tried using with -c (no clean) in order to reduce the mem
> consumption, but without luck.
> 
> At least this thread seems to be realated to our problem.
> http://grass.itc.it/pipermail/grass-dev/2006-May/023088.html
> 
> Some links to similar problems discussed on the mailinglist:
> http://grass.itc.it/pipermail/grass-dev/2006-May/022917.html
> 
> Radim wrote something about it, but did not elaborate where to start
> solving this problem.


populating the table will be the problem? Try running "top" in another
window and type "M" to sory by memory use.

Memory use of builing topology isn't a problem until ~ 1-3 million
features?


Try using v.external.


What sort of features? areas? lines? points? 3D?


Hamish


Mon, Jun 12 2006 16:29:59    Comments added by sholl  
Cc:  hamish_nospam@yahoo.com

Hamish,

the dead of the process usually happens during some topological operation. I
will give "top" a try soon.

v.external is no option since it segfaults in the current CVS.

The data are polygons inside an Oracle-DB, with enabled 3d-column (needed for
3rd-party-tools) which is empty.

thanks for your suggestions, I post back my tests.

Stephan
Tue, Jun 13 2006 04:07:35    Mail sent by guest  
I believe that the problem is in Vect_build and below are Radim's suggestions
how to solve it,
I am putting it here in case somebody wants to give it a try:

Spatial index occupies a lot of memory but it is necessary for 
topology building. Also, it takes long time to release the memory
occupied by spatial index (dig_spidx_free) . 

The function building topology (Vect_build) is usually called 
at the end of module (before Vect_close) so it is faster to call
exit() and operating system releases all the memory much faster.
By default the memory is not released.

It is possible to call Vect_set_release_support() before Vect_close()
to force  to release the memory, but it takes long time on large files.

Currently most of the modules do not release spatial index and work 
like this:
main
{
     Vect_open_new()
     //writing new vector

     Vect_build()
     Vect_close()  // memory is not released
}

you can add Vect_set_release_support():

main
{
     Vect_open_new()
     // writing new vector

     Vect_build()
     Vect_set_release_support()
     Vect_close()  // memory is released
}

but it only takes longer time. 

It make sense to release spatial index if it is used only at the beginning
of a module or in permanently running programs like QGIS.
For example:

main
{
     Vect_open_old()
     // select features using spatial index, e.g.  Vect_select_lines_by_box()
Vect_set_release_support()
     Vect_close()  // memory is released

     // do some processing which needs memory
}


Radim
Thu, Aug 3 2006 15:31:06    Comments added by sholl  
Cc: tutey@o2.pl

HI all,  
  
I am currently trying so use the sollution above from Radim, which is  
already in v.in.ogr/main.c. My dataset has ~300000 polygons. The maschine I 
am working on has 512MB RAM and 2 GB of swap.  
  
Processes go _slow_ and the box is completely unresponsive.  
 
Other thoughts where to dig into? 
 
Thu, Aug 3 2006 16:08:52    Comments added by sholl (as #3161)  
Cc: tutey@o2.pl

perhaps this is related to bug 4564, where large datasets "are 
difficult/cannot imported" using v.in.ogr 
 
 
http://intevation.de/rt/webrt?serial_num=4564&display=History 
 
Stephan 
Thu, Aug 3 2006 17:19:30    Request 3161 merged into 4564 by msieczka (as #3161)  
Comment | Reply | Take | Resolve

You are currently authenticated as guest.
[Show Configuration] [Login as another user]

Users Guide - Mail Commands - Homepage of RequestTracker 1.0.7 - list any request