Thu, Jun 16 2005
17:58:03
|
|
Request created by guest
|
|
Subject: v.in.ascii crashing on large inputs
Platform: GNU/Linux/i386
grass obtained from: CVS
grass binary for platform: Compiled from Sources
GRASS Version: 6.1 CVS 11 June
A followup to bug 2903. I cannot import more than about three or four million
LIDAR points using v.in.ascii. It crashes with an out of memory error.
From: Hamish
In reply to: Andrew Danner
Re: [GRASSLIST:7168] v.in.ascii out of memory
Date: Thu, 16 Jun 2005 15:42:02 +1200 (05:42 CEST)
> I'm trying to import a somewhat large file of 24.8 million LIDAR
> points (866MB) into GRASS 6.1 (11 Jun CVS) using v.in.ascii
Status: (all due to memory limitations [ie leaks])
GRASS 6.0.0's v.in.ascii ~ 100k points
GRASS 6.1-cvs v.in.ascii ~ 3M points (you)
> File format is just a list of easting, northing, and elevation in NC
> state-plane.
>
> 1939340.8400,825793.8900,657.2200
> 1939071.9500,825987.7800,660.2200
> 1939035.5200,826013.9700,662.4600
> 1938762.4500,826210.1500,686.2800
>
>
> I'm using
>
> v.in.ascii -z input=strip_09.txt output=g6Npts fs=, z=3
try with -t too. No need to create a table if you have no attributes!
*This only offers a small improvement of < 100000 points
> and I get the following:
>
> Maximum input row length: 34
> Maximum number of columns: 3
> Minimum number of columns: 3
>
> Building topology ...
> Registering lines: ERROR: G_realloc: out of memory
>
> There shouldn't be any lines to register. This is just a list of
> points. Any ideas?
I've seen the same thing. Not sure if it has to know everything to
build topology or if it is a memory leak. I think it's a leak.
I did some tests with valgrind back in March. Radim plugged the biggest
leak, but I think "Registering Lines.." still has one too.
see this thread:
http://article.gmane.org/gmane.comp.gis.grass.devel/7212/
> At three million points, I noticed that I'm using over 1GB of memory.
> Is it trying to build the entire topology in memory? I don't need to
> build any topology for a set of points. Is there a way to turn this
> off?
>
> In 5.4 I was developing a scalable version of s.surf.rst that could
> interpolate surfaces on over 500 million points, but now I can't even
> import that many points into a vector/site format.
>
> Comments/Suggestions?
a) Try adding more swap.
b) We need to fix the leak. (more valgrind)
*I have 8GB of swap. There may be some Linux 2.6 oom-killer issues, and I will
try to fix this, but reading in just a list of points should not require over
1GB of memory. There is no topology to build.
|
|
Tue, Jul 19 2005
22:56:22
|
|
Mail sent by guest
|
|
Hi,
can you help us with valgrind analysis on v.in.ascii?
Best
Markus |
|
Sat, Jul 1 2006
00:14:03
|
|
Request created by guest (as #4769)
|
|
Subject: memory leak in v.in.ascii
Platform: GNU/Linux/x86
grass obtained from: CVS
grass binary for platform: Compiled from Sources
GRASS Version: cvs June 10? 2006
Recent changes in v.in.ascii to support LL data introduced a memory leak for
projected data sets. Affected file is vector/v.in.ascii/points.c :points_analyse
tokens = G_tokenize (buf, fs); allocates an array of char*s that point to various
points in buf, but these char*s are not freed before the next G_tokenize. On
large lidar point sets, the leak causes out of memory errors.
There was a G_free_tokens(tokens) inside the loop in an older CVS version (1.12),
but I don't think this is right either, as G_free_tokens frees both the char*
array AND the buffer. I think the correct order of frees to prevent memory leaks
is
Inside loop: call G_free(tokens), not G_free_tokens(tokens). This frees the char*
array, but not the underlying buffer, buf
Outside of while loop: call
G_free(buf)
G_free(tmp_token)
G_free(coorbuf)
Do not call G_free_tokens(tokens) outside of the loop if G_free(tokens) is called
inside.
I do not have LL point data to check this. If you need some projected data to
test that both cases work, I can provide a few megabyte sample set. Memory usage
in this function should not be more than a few KB, and should not depend on the
number of points imported
|
|
Sat, Jul 1 2006
19:15:23
|
|
Mail sent by guest (as #4769)
|
|
I looked into G_tokenize a bit more and it allocates it's own buffer space
using G_store. You probably shouldn't redirect tokens[0] after G_tokenize as
points.c does in v.in.ascii. This patch saves the original tokens[0] pointer
if the LL code re-assigns it and restores the pointer after all tokens have
been read so that G_free_tokens frees the right buffer. I tested this on
projected data and there is no more leak for me, but I don't have LL data to
test if there are no seg-faults or memory leaks
Index: points.c
===================================================================
RCS file: /home/grass/grassrepository/grass6/vector/v.in.ascii/points.c,v
retrieving revision 1.16
diff -u -r1.16 points.c
--- points.c 12 May 2006 05:57:44 -0000 1.16
+++ points.c 1 Jul 2006 17:08:18 -0000
@@ -64,13 +64,14 @@
struct Cell_head window;
double northing=.0;
double easting=.0;
- char *coorbuf, *tmp_token;
+ char *coorbuf, *tmp_token, *sav_buf;
buflen = 1000;
buf = (char *) G_malloc ( buflen );
coorbuf=(char *) G_malloc(256);
tmp_token=(char *) G_malloc(256);
-
+ sav_buf = NULL;
+
/* fetch projection for LatLong test */
G_get_window(&window);
@@ -121,6 +122,16 @@
for ( i = 0; i < ntokens; i++ ) {
if ((G_projection() == PROJECTION_LL)){
if (i==xcol || i==ycol ){
+ if(i==0){ /* Save position of original internal token buffer */
+ /* Prevent memory leaks */
+ sav_buf=tokens[0];
+ }
+ if(i==ntokens-1 && sav_buf != NULL){
+ /* Restore original token buffer so free_tokens works */
+ /* Only do this if tokens[0] was re-assigned */
+ tokens[0]=sav_buf;
+ sav_buf = NULL;
+ }
/* check if coordinates are DMS or decimal or not latlong at all */
sprintf(coorbuf,"%s", tokens[i]);
G_debug (4, "token: %s", coorbuf);
@@ -156,7 +167,7 @@
len = strlen (tokens[i]);
if ( len > collen[i] ) collen[i] = len;
}
-
+ G_free_tokens(tokens);
row++;
}
@@ -166,9 +177,9 @@
*column_type = coltype;
*column_length = collen;
- G_free_tokens(tokens);
+ G_free(buf);
G_free(coorbuf);
- /* G_free(tmp_token); ?? */
+ G_free(tmp_token);
return 0;
}
|
|
Sat, Jul 1 2006
22:44:26
|
|
Mail sent by mneteler (as #4769)
|
|
Hi,
I have tested it with Latlong data, unfortunately it crashes
in points.c, line 136:
sprintf(coorbuf,"%s", tokens[i]);
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1227356480 (LWP 10484)]
0xb77654c0 in strcpy () from /lib/tls/libc.so.6
(gdb) bt
#0 0xb77654c0 in strcpy () from /lib/tls/libc.so.6
#1 0x0804c127 in points_analyse (ascii_in=0x80676a0, ascii=0x8069e10,
fs=0x804ced2 "|",
rowlength=0xbff182e4, ncolumns=0xbff182e0, minncolumns=0xbff182dc,
column_type=0xbff182d8,
column_length=0xbff182d4, skip_lines=0, xcol=0, ycol=1) at points.c:136
#2 0x0804b07b in main (argc=4, argv=0xbff18844) at in.c:208
Apparently tokens[i] with i=1 is undefined.
For testing, you can easily generate random data in Latlong:
v.random random n=1000000
v.out.ascii random > random.csv
v.in.ascii random.csv out=random2
Best,
Markus |
|
Sat, Jul 1 2006
23:46:22
|
|
Mail sent by adanner@cs.duke.edu (as #4769)
|
|
Return-Path |
<adanner@cs.duke.edu>
|
Delivered-To |
grass-bugs@lists.intevation.de
|
Subject |
Re: [bug #4769] (grass) memory leak in v.in.ascii
|
From |
Andrew Danner <adanner@cs.duke.edu>
|
To |
Markus Neteler via RT <grass-bugs@intevation.de>
|
In-Reply-To |
<20060701204426.C73841005B9@lists.intevation.de>
|
References |
<20060701204426.C73841005B9@lists.intevation.de>
|
Content-Type |
multipart/mixed; boundary="=-txZBTO5WKhinGbHdXE+L"
|
Date |
Sat, 01 Jul 2006 17:46:06 -0400
|
Message-Id |
<1151790366.6693.3.camel@meerschweinchen>
|
Mime-Version |
1.0
|
X-Mailer |
Evolution 2.6.1
|
X-Virus-Scanned |
by amavisd-new at intevation.de
|
X-Spam-Status |
No, hits=-4.047 tagged_above=-999 required=4 tests=[AWL=0.953, BAYES_00=-5]
|
X-Spam-Level |
|
--=-txZBTO5WKhinGbHdXE+L
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Hi Markus,
Ooops. Yes. I'm dumb. I tried to restore the sav_buf pointer in the
wrong place. It needs to be restored after processing the last token in
the line, regardless if that token is an x or y column. A new patch is
attached. I tried it with a lat/long mapset and the random data trick
you suggested and it worked fine for 1 million points. Thanks for the
hint.
-Andy
On Sat, 2006-07-01 at 22:44 +0200, Markus Neteler via RT wrote:
> Hi,
>
> I have tested it with Latlong data, unfortunately it crashes
> in points.c, line 136:
>
> sprintf(coorbuf,"%s", tokens[i]);
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1227356480 (LWP 10484)]
> 0xb77654c0 in strcpy () from /lib/tls/libc.so.6
> (gdb) bt
> #0 0xb77654c0 in strcpy () from /lib/tls/libc.so.6
> #1 0x0804c127 in points_analyse (ascii_in=0x80676a0, ascii=0x8069e10,
> fs=0x804ced2 "|",
> rowlength=0xbff182e4, ncolumns=0xbff182e0, minncolumns=0xbff182dc,
> column_type=0xbff182d8,
> column_length=0xbff182d4, skip_lines=0, xcol=0, ycol=1) at points.c:136
> #2 0x0804b07b in main (argc=4, argv=0xbff18844) at in.c:208
>
> Apparently tokens[i] with i=1 is undefined.
>
>
> For testing, you can easily generate random data in Latlong:
> v.random random n=1000000
> v.out.ascii random > random.csv
> v.in.ascii random.csv out=random2
>
> Best,
>
> Markus
>
> -------------------------------------------- Managed by Request Tracker
--=-txZBTO5WKhinGbHdXE+L
Content-Disposition: attachment; filename=v_in_ascii_points_c.udiff-2.patch
Content-Type: text/x-patch; name=v_in_ascii_points_c.udiff-2.patch; charset=utf-8
Content-Transfer-Encoding: 7bit
Index: points.c
===================================================================
RCS file: /home/grass/grassrepository/grass6/vector/v.in.ascii/points.c,v
retrieving revision 1.16
diff -u -r1.16 points.c
--- points.c 12 May 2006 05:57:44 -0000 1.16
+++ points.c 1 Jul 2006 21:41:11 -0000
@@ -64,13 +64,14 @@
struct Cell_head window;
double northing=.0;
double easting=.0;
- char *coorbuf, *tmp_token;
+ char *coorbuf, *tmp_token, *sav_buf;
buflen = 1000;
buf = (char *) G_malloc ( buflen );
coorbuf=(char *) G_malloc(256);
tmp_token=(char *) G_malloc(256);
-
+ sav_buf = NULL;
+
/* fetch projection for LatLong test */
G_get_window(&window);
@@ -121,7 +122,11 @@
for ( i = 0; i < ntokens; i++ ) {
if ((G_projection() == PROJECTION_LL)){
if (i==xcol || i==ycol ){
- /* check if coordinates are DMS or decimal or not latlong at all */
+ if(i==0){ /* Save position of original internal token buffer */
+ /* Prevent memory leaks */
+ sav_buf=tokens[0];
+ }
+ /* check if coordinates are DMS or decimal or not latlong at all */
sprintf(coorbuf,"%s", tokens[i]);
G_debug (4, "token: %s", coorbuf);
if (G_scan_northing ( coorbuf, &northing, window.proj) ){
@@ -141,6 +146,12 @@
}
} /* G_scan_northing else */
}
+ if(i==ntokens-1 && sav_buf != NULL){
+ /* Restore original token buffer so free_tokens works */
+ /* Only do this if tokens[0] was re-assigned */
+ tokens[0]=sav_buf;
+ sav_buf = NULL;
+ }
} /* PROJECTION_LL */
G_debug (4, "row %d col %d: '%s' is_int = %d is_double = %d",
row, i, tokens[i], is_int(tokens[i]), is_double(tokens[i]) );
@@ -156,7 +167,7 @@
len = strlen (tokens[i]);
if ( len > collen[i] ) collen[i] = len;
}
-
+ G_free_tokens(tokens);
row++;
}
@@ -166,9 +177,9 @@
*column_type = coltype;
*column_length = collen;
- G_free_tokens(tokens);
+ G_free(buf);
G_free(coorbuf);
- /* G_free(tmp_token); ?? */
+ G_free(tmp_token);
return 0;
}
--=-txZBTO5WKhinGbHdXE+L--
|
|
Tue, Jul 4 2006
12:04:02
|
|
Taken by mneteler (as #4769)
|
|
Tue, Jul 4 2006
12:09:27
|
|
Mail sent by mneteler (as #4769)
|
|
Hi Andrew,
could you send the patch to me? I cannot extract it from RT
in a safe way.
I have now "taken" the bug in RT to get notified once you
reply on this report.
Markus
|
|
Tue, Jul 4 2006
14:07:04
|
|
Request 4769 merged into 3354 by mneteler (as #4769)
|
|
Tue, Jul 4 2006
14:07:19
|
|
Taken by mneteler
|
|
Tue, Jul 4 2006
15:28:56
|
|
Mail sent by mneteler
|
|
Andrew,
patch received.
In spearfish UTM, 1 mio points consume around 400k of RAM.
Seems to be a reasonable amount.
In latlong, 1 mio points seem to run ok as well now!
Patch applied to CVS.
Thanks,
Markus
PS: notes to others:
for easy testing run v.random + v.out.ascii + v.in.ascii |
|
Tue, Jul 4 2006
15:29:00
|
|
Status changed to resolved by mneteler
|
|