Details Ticket 3354


Comment | Reply | Open


Serial Number 3354
Subject v.in.ascii crashing on large inputs
Area grass6
Queue grass
Requestors adanner@cs.duke.edu
Owner mneteler
Status resolved
Last User Contact Tue Jul 4 15:28:56 2006 (2 yr ago)
Current Priority 70
Final Priority 70
Due No date assigned
Last Action Tue Jul 4 15:29:00 2006 (2 yr ago)
Created Thu Jun 16 17:58:03 2005 (3 yr ago)

Transaction History Ticket 3354


Thu, Jun 16 2005 17:58:03    Request created by guest  
Subject: v.in.ascii crashing on large inputs

Platform: GNU/Linux/i386
grass obtained from: CVS
grass binary for platform: Compiled from Sources
GRASS Version: 6.1 CVS 11 June

A followup to bug 2903. I cannot import more than about three or four million
LIDAR points using v.in.ascii. It crashes with an out of memory error. 

From: 	Hamish
In reply to: Andrew Danner
Re: [GRASSLIST:7168] v.in.ascii out of memory
Date: 	Thu, 16 Jun 2005 15:42:02 +1200  (05:42 CEST)

> I'm trying to import a somewhat large file of 24.8 million LIDAR
> points (866MB) into GRASS 6.1 (11 Jun CVS) using v.in.ascii

Status: (all due to memory limitations [ie leaks])
  GRASS 6.0.0's v.in.ascii   ~ 100k points
  GRASS 6.1-cvs v.in.ascii   ~ 3M points (you)


> File format is just a list of easting, northing, and elevation in NC
> state-plane. 
> 
> 1939340.8400,825793.8900,657.2200
> 1939071.9500,825987.7800,660.2200
> 1939035.5200,826013.9700,662.4600
> 1938762.4500,826210.1500,686.2800
> 
> 
> I'm using
> 
> v.in.ascii -z input=strip_09.txt output=g6Npts fs=, z=3

try with -t too. No need to create a table if you have no attributes!

*This only offers a small improvement of < 100000 points
 
> and I get the following:
> 
> Maximum input row length: 34
> Maximum number of columns: 3
> Minimum number of columns: 3
> 
> Building topology ...
> Registering lines: ERROR: G_realloc: out of memory
> 
> There shouldn't be any lines to register. This is just a list of
> points. Any ideas?

I've seen the same thing. Not sure if it has to know everything to 
build topology or if it is a memory leak. I think it's a leak.

I did some tests with valgrind back in March. Radim plugged the biggest 
leak, but I think "Registering Lines.." still has one too.

see this thread:
 http://article.gmane.org/gmane.comp.gis.grass.devel/7212/


> At three million points, I noticed that I'm using over 1GB of memory.
> Is it trying to build the entire topology in memory? I don't need to
> build any topology for a set of points. Is there a way to turn this
> off? 
>
> In 5.4 I was developing a scalable version of s.surf.rst that could
> interpolate surfaces on over 500 million points, but now I can't even
> import that many points into a vector/site format. 
> 
> Comments/Suggestions?

a) Try adding more swap.
b) We need to fix the leak. (more valgrind)


*I have 8GB of swap. There may be some Linux 2.6 oom-killer issues, and I will
try to fix this, but reading in just a list of points should not require over
1GB of memory. There is no topology to build. 

Tue, Jul 19 2005 22:56:22    Mail sent by guest  
Hi,

can you help us with valgrind analysis on v.in.ascii?

Best

 Markus
Sat, Jul 1 2006 00:14:03    Request created by guest (as #4769)  
Subject: memory leak in v.in.ascii

Platform: GNU/Linux/x86
grass obtained from: CVS
grass binary for platform: Compiled from Sources
GRASS Version: cvs June 10? 2006

Recent changes in v.in.ascii to support LL data introduced a memory leak for
projected data sets. Affected file is vector/v.in.ascii/points.c :points_analyse
tokens = G_tokenize (buf, fs);  allocates an array of char*s that point to various
points in buf, but these char*s are not freed before the next G_tokenize. On
large lidar point sets, the leak causes out of memory errors. 

There was a G_free_tokens(tokens) inside the loop in an older CVS version (1.12),
but I don't think this is right either, as G_free_tokens frees both the char*
array AND the buffer. I think the correct order of frees to prevent memory leaks
is 

Inside loop: call G_free(tokens), not G_free_tokens(tokens). This frees the char*
array, but not the underlying buffer, buf

Outside of while loop: call

G_free(buf)
G_free(tmp_token)
G_free(coorbuf)

Do not call G_free_tokens(tokens) outside of the loop if G_free(tokens) is called
inside. 

I do not have LL point data to check this. If you need some projected data to
test that both cases work, I can provide a few megabyte sample set. Memory usage
in this function should not be more than a few KB, and should not depend on the
number of points imported

  

Sat, Jul 1 2006 19:15:23    Mail sent by guest (as #4769)  
I looked into G_tokenize a bit more and it allocates it's own buffer space
using G_store. You probably shouldn't redirect tokens[0] after G_tokenize as
points.c does in v.in.ascii. This patch saves the original tokens[0] pointer
if the LL code re-assigns it and restores the pointer after all tokens have
been read so that G_free_tokens frees the right buffer. I tested this on
projected data and there is no more leak for me, but I don't have LL data to
test if there are no seg-faults or memory leaks 

Index: points.c
===================================================================
RCS file: /home/grass/grassrepository/grass6/vector/v.in.ascii/points.c,v
retrieving revision 1.16
diff -u -r1.16 points.c
--- points.c  12 May 2006 05:57:44 -0000  1.16
+++ points.c  1 Jul 2006 17:08:18 -0000
@@ -64,13 +64,14 @@
     struct Cell_head window;
     double northing=.0;
     double easting=.0;
-    char *coorbuf, *tmp_token;
+    char *coorbuf, *tmp_token, *sav_buf;

     buflen = 1000;
     buf = (char *) G_malloc ( buflen );
     coorbuf=(char *) G_malloc(256);
     tmp_token=(char *) G_malloc(256);
-
+    sav_buf = NULL;
+
     /* fetch projection for LatLong test */
     G_get_window(&window);

@@ -121,6 +122,16 @@
  for ( i = 0; i < ntokens; i++ ) {
      if ((G_projection() == PROJECTION_LL)){
       if (i==xcol || i==ycol ){
+          if(i==0){ /* Save position of original internal token buffer */
+            /* Prevent memory leaks */
+            sav_buf=tokens[0];
+          }
+          if(i==ntokens-1 && sav_buf != NULL){
+            /* Restore original token buffer so free_tokens works */
+            /* Only do this if tokens[0] was re-assigned */
+            tokens[0]=sav_buf;
+            sav_buf = NULL;
+          }
          /* check if coordinates are DMS or decimal or not latlong at all */
sprintf(coorbuf,"%s", tokens[i]);
          G_debug (4, "token: %s", coorbuf);
@@ -156,7 +167,7 @@
      len = strlen (tokens[i]);
      if ( len > collen[i] ) collen[i] = len;
  }
-
+  G_free_tokens(tokens);
  row++;
     }

@@ -166,9 +177,9 @@
     *column_type = coltype;
     *column_length = collen;

-    G_free_tokens(tokens);
+    G_free(buf);
     G_free(coorbuf);
-    /* G_free(tmp_token); ?? */
+    G_free(tmp_token);

     return 0;
 }

Sat, Jul 1 2006 22:44:26    Mail sent by mneteler (as #4769)  
Hi,

I have tested it with Latlong data, unfortunately it crashes
in points.c, line 136:

                sprintf(coorbuf,"%s", tokens[i]);

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1227356480 (LWP 10484)]
0xb77654c0 in strcpy () from /lib/tls/libc.so.6
(gdb) bt
#0  0xb77654c0 in strcpy () from /lib/tls/libc.so.6
#1  0x0804c127 in points_analyse (ascii_in=0x80676a0, ascii=0x8069e10,
fs=0x804ced2 "|",
    rowlength=0xbff182e4, ncolumns=0xbff182e0, minncolumns=0xbff182dc,
column_type=0xbff182d8,
    column_length=0xbff182d4, skip_lines=0, xcol=0, ycol=1) at points.c:136
#2  0x0804b07b in main (argc=4, argv=0xbff18844) at in.c:208

Apparently tokens[i] with i=1 is undefined.


For testing, you can easily generate random data in Latlong:
 v.random random n=1000000
 v.out.ascii random > random.csv
 v.in.ascii random.csv out=random2

Best,

 Markus
Sat, Jul 1 2006 23:46:22    Mail sent by adanner@cs.duke.edu (as #4769)  
Return-Path <adanner@cs.duke.edu>
Delivered-To grass-bugs@lists.intevation.de
Subject Re: [bug #4769] (grass) memory leak in v.in.ascii
From Andrew Danner <adanner@cs.duke.edu>
To Markus Neteler via RT <grass-bugs@intevation.de>
In-Reply-To <20060701204426.C73841005B9@lists.intevation.de>
References <20060701204426.C73841005B9@lists.intevation.de>
Content-Type multipart/mixed; boundary="=-txZBTO5WKhinGbHdXE+L"
Date Sat, 01 Jul 2006 17:46:06 -0400
Message-Id <1151790366.6693.3.camel@meerschweinchen>
Mime-Version 1.0
X-Mailer Evolution 2.6.1
X-Virus-Scanned by amavisd-new at intevation.de
X-Spam-Status No, hits=-4.047 tagged_above=-999 required=4 tests=[AWL=0.953, BAYES_00=-5]
X-Spam-Level
--=-txZBTO5WKhinGbHdXE+L
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hi Markus, 

 Ooops. Yes. I'm dumb. I tried to restore the sav_buf pointer in the
wrong place. It needs to be restored after processing the last token in
the line, regardless if that token is an x or y column. A new patch is
attached. I tried it with a lat/long mapset and the random data trick
you suggested and it worked fine for 1 million points. Thanks for the
hint. 

-Andy

On Sat, 2006-07-01 at 22:44 +0200, Markus Neteler via RT wrote:
> Hi,
> 
> I have tested it with Latlong data, unfortunately it crashes
> in points.c, line 136:
> 
>                 sprintf(coorbuf,"%s", tokens[i]);
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1227356480 (LWP 10484)]
> 0xb77654c0 in strcpy () from /lib/tls/libc.so.6
> (gdb) bt
> #0  0xb77654c0 in strcpy () from /lib/tls/libc.so.6
> #1  0x0804c127 in points_analyse (ascii_in=0x80676a0, ascii=0x8069e10,
> fs=0x804ced2 "|",
>     rowlength=0xbff182e4, ncolumns=0xbff182e0, minncolumns=0xbff182dc,
> column_type=0xbff182d8,
>     column_length=0xbff182d4, skip_lines=0, xcol=0, ycol=1) at points.c:136
> #2  0x0804b07b in main (argc=4, argv=0xbff18844) at in.c:208
> 
> Apparently tokens[i] with i=1 is undefined.
> 
> 
> For testing, you can easily generate random data in Latlong:
>  v.random random n=1000000
>  v.out.ascii random > random.csv
>  v.in.ascii random.csv out=random2
> 
> Best,
> 
>  Markus
> 
> -------------------------------------------- Managed by Request Tracker

--=-txZBTO5WKhinGbHdXE+L
Content-Disposition: attachment; filename=v_in_ascii_points_c.udiff-2.patch
Content-Type: text/x-patch; name=v_in_ascii_points_c.udiff-2.patch; charset=utf-8
Content-Transfer-Encoding: 7bit

Index: points.c
===================================================================
RCS file: /home/grass/grassrepository/grass6/vector/v.in.ascii/points.c,v
retrieving revision 1.16
diff -u -r1.16 points.c
--- points.c	12 May 2006 05:57:44 -0000	1.16
+++ points.c	1 Jul 2006 21:41:11 -0000
@@ -64,13 +64,14 @@
     struct Cell_head window;
     double northing=.0;
     double easting=.0;
-    char *coorbuf, *tmp_token;
+    char *coorbuf, *tmp_token, *sav_buf;
 
     buflen = 1000;
     buf = (char *) G_malloc ( buflen );
     coorbuf=(char *) G_malloc(256);
     tmp_token=(char *) G_malloc(256);
-
+    sav_buf = NULL;
+    
     /* fetch projection for LatLong test */
     G_get_window(&window);
 
@@ -121,7 +122,11 @@
 	for ( i = 0; i < ntokens; i++ ) {
 	    if ((G_projection() == PROJECTION_LL)){
 	     if (i==xcol || i==ycol ){
-	        /* check if coordinates are DMS or decimal or not latlong at all */
+          if(i==0){ /* Save position of original internal token buffer */
+            /* Prevent memory leaks */
+            sav_buf=tokens[0];
+          }
+        /* check if coordinates are DMS or decimal or not latlong at all */
sprintf(coorbuf,"%s", tokens[i]);
 	        G_debug (4, "token: %s", coorbuf);
 	        if (G_scan_northing ( coorbuf, &northing, window.proj) ){
@@ -141,6 +146,12 @@
 		 }
                } /* G_scan_northing else */
 	     }
+	     if(i==ntokens-1 && sav_buf != NULL){ 
+         /* Restore original token buffer so free_tokens works */
+         /* Only do this if tokens[0] was re-assigned */
+         tokens[0]=sav_buf;
+         sav_buf = NULL;
+       }         
 	    } /* PROJECTION_LL */
 	    G_debug (4, "row %d col %d: '%s' is_int = %d is_double = %d", 
 		         row, i, tokens[i], is_int(tokens[i]), is_double(tokens[i]) );
@@ -156,7 +167,7 @@
 	    len = strlen (tokens[i]);
 	    if ( len > collen[i] ) collen[i] = len;
 	}
-
+  G_free_tokens(tokens);
 	row++;
     }
 
@@ -166,9 +177,9 @@
     *column_type = coltype;
     *column_length = collen;
 
-    G_free_tokens(tokens);
+    G_free(buf);
     G_free(coorbuf);
-    /* G_free(tmp_token); ?? */ 
+    G_free(tmp_token);
 
     return 0;
 }

--=-txZBTO5WKhinGbHdXE+L--


Tue, Jul 4 2006 12:04:02    Taken by mneteler (as #4769)  
Tue, Jul 4 2006 12:09:27    Mail sent by mneteler (as #4769)  
Hi Andrew,

could you send the patch to me? I cannot extract it from RT
in a safe way.
I have now "taken" the bug in RT to get notified once you
reply on this report.

Markus
Tue, Jul 4 2006 14:07:04    Request 4769 merged into 3354 by mneteler (as #4769)  
Tue, Jul 4 2006 14:07:19    Taken by mneteler  
Tue, Jul 4 2006 15:28:56    Mail sent by mneteler  
Andrew,

patch received.

In spearfish UTM, 1 mio points consume around 400k of RAM.
Seems to be a reasonable amount.

In latlong, 1 mio points seem to run ok as well now!

Patch applied to CVS.

Thanks,

 Markus

PS: notes to others:
    for easy testing run v.random + v.out.ascii + v.in.ascii
Tue, Jul 4 2006 15:29:00    Status changed to resolved by mneteler  
Comment | Reply | Open

You are currently authenticated as guest.
[Show Configuration] [Login as another user]

Users Guide - Mail Commands - Homepage of RequestTracker 1.0.7 - list any request