.. _rfc24first: ====================================================================== MS RFC 24: Mapscript memory management ====================================================================== :Date: 2006/12/3 :Author: Umberto Nicoletti :Contact: umberto.nicoletti@gmail.com :Last Edited: 2006/12/31 :Status: Proposed :Version: Mapserver 5.0 0. Warning! ----------- *IMPORTANT: this is the first and obsolete version fo the rfc, which is kept here only for reference purposes and for a limited time.* 1. Overview ----------- Memory management in SWIG wrappers has a tradition of being difficult and error prone. The programmer of the wrapper has to deal with memory that can be allocated and then freed in two separate environments: the hosting language such as Java, C# or Perl and the wrapped native code. Most modern languages implement garbage collection, so that the developer does not have to care about memory management. The programming language tracks memory (or objects, really) allocations and when an object goes out of scope (it is no more reachable from the running program) it marks it as eligible for garbage collection. A background process once in a while wakes up and frees the memory associated with marked objects. For the details on GC see this wikipedia entry: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) What happens in most cases is that some memory is allocated in, say, Java, then another pointer is pointed to it by invoking some wrapped method. Eventually the GC runs and frees the memory. As soon as the other pointers are dereferenced the hosting language will crash because of a segmentation fault error (in Unix terms). Mapserver SWIG wrappers suffer from issues with garbage collections for example in dinamically adding layers to a map. The purpose of this RFC is to address these issues and provide a solution that can be implemented in time for the release of Mapserver 5.0. This RFC does not address thread safety. 2. Problem description ---------------------- This section gives an overview (along with examples) of errors in mapscript memory management. Most of the examples will be in Java, but they apply to all other mapscripts too. They can be reproduced against the latest CVS code of mapserver-HEAD as of 31st December 2006. 2.1 Object equality/identity ++++++++++++++++++++++++++++ Consider the following Java mapscript code: :: mapObj map=new mapObj("my.map"); layerObj layer=new layerObj(null); // add layer to the map int index=map.insertLayer(layer, -1); // set its name layer.setName("Change me"); // fetch from map layerObj newLayer=map.getLayer(index); // they should be the same... System.out.println(newLayer.getName()+"=="+layer.getName()); // and this should print true (it is a reference comparison) System.out.println(newLayer==layer); when executed will produce the following output: :: null==Change me false This happens because the current implementation strategy copies the layer when it is inserted into the map. The Java reference is not re-pointed to the new copy and is therefore 'disconnected' from the actual memory area. This currently happens for the most used insert methods (i.e. *insertClass*). 2.2 Early garbage collection ++++++++++++++++++++++++++++ Objects created through mapscript can be garbage-collected "early", when there are live objects still referencing them. See this example in Java: :: mapObj map=new mapObj("data/emptymap.map"); layerObj layer=new layerObj(map); layer.setName("Layer 0"); classObj clazz=new classObj(null); clazz.setName("Clazz 0 NULL"); int pos=layer.insertClass(clazz, -1); map=null; layer=null; // force garbage collection for(int i=0;i<100;i++) System.gc(); clazz.getLayer().getMap().draw(); // Java crashes because memory has been freed! and its perl equivalent: :: use mapscript; $map = new mapscript::mapObj("../../tests/test.map"); $layer = new mapscript::layerObj($map); print "Before first draw for $layer->{map}\n"; $layer->{map}->draw(); print "Map drawn, now undef map\n"; $map = undef; $map1=$layer->{map}->draw(); // perl interpreter segfault 2.3 Dynamically populated layers, classes, etc ++++++++++++++++++++++++++++++++++++++++++++++ See the following bug reports: http://mapserver.gis.umn.edu/bugs/show_bug.cgi?id=1400 http://mapserver.gis.umn.edu/bugs/show_bug.cgi?id=1743 http://mapserver.gis.umn.edu/bugs/show_bug.cgi?id=1841 Please note that this issue can be difficult to reproduce, credits go to Tamas for pointing it out. 3. Proposed implementation -------------------------- To solve the problems shown at items 2.1, 2.2 and 2.3 this RCF proposes that: 1. each mapscript wrapper objects acting as a container or as content will have its mapscript language class augmented to mantain references to the contained objects and, when it is available, to the container object. These references allow the GC to make better informed decisions on which objects can be freed. From now on, and only for the purpose of this RFC, we will call *cache* these references. 2. getters must be wrapped so that they look up references in the *cache* first and setters/mutators must be wrapped so that they keep the references in the *cache* up to date when they are changed 3. when an object has a non-null parent its memory must be *disowned* (in SWIG terms) and *owned* when it is removed from the parent This RFC should be applicable (with the necessary modifications) to all mapscript languages. Examples will be given for Perl or Java because of the author familiarity with these languages. The three items above are described in more detail in the following subsections. Subsections 3.4 and 3.5 offer an implementation example for the layerObj class. Please note that in the following we limit the scope of our analysis to the classes/layer relationship. 3.1 Add references to the mapscript wrapper object ++++++++++++++++++++++++++++++++++++++++++++++++++ The mapscript objects will be modified so that they keep a reference to the other mapscript objects added to them or they are added to, exactly like the C struct does. In example, in the case of the layerObj the layerObj class will be extended to contain - a reference to the mapObj that contains the layer - an array of classObjs The purpose of these changes is that whenever a class is added to the layer the corresponding object is also added to the class array (the *cache*). The same when a classObj is returned as a result to getClass. By doing so the hosting language knows of the web of relations between these objects and we solve the early garbage collection problem. The problem of object equality/identity is also resolved if we modify the getClass method to look first in the class array (the *cache*) and only after in the C data structures, as shown above. 3.2 Keep *cache* and C internal structure in sync +++++++++++++++++++++++++++++++++++++++++++++++++ For the mechanism described at the previous item to work some functions in the mapscript objects and in the native code must be modified so that the mapscript objects and the C data structures stay in sync. Returning to the example of the layerObj the *insertClass*, *getClass* and *removeClass* methods will have to be modified to keep the *cache* in sync with the C data structures. The current methods will be modified by using specific typemaps. The constructor will also need to be modified to store the reference to the mapObj. Eventually also the native code actually performing the copy-and-insert operation must be modified to only perform the insert operation (*layerobject.c*, line 52, function *msInsertClass*). The API will be kept backward compatible whenever possible. 3.3 Disown memory allocated by swig for objects with non-null parents +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The patch suggested by Tamas (bug 1743) must be extended so that memory is disowned also when a layer is inserted into a map and ported to all mapscript languages. 3.4 JAVA: mapscript code example for layerObj +++++++++++++++++++++++++++++++++++++++++++++ Code example for layerObj (*javamodule.i*): :: /* Modified constructor according to: - bug#1743, item 3.3 - cache population and sync, item 3.2 */ %typemap(javaconstruct) layerObj(mapObj map) %{ { this($imcall, true); if (map != null) { this.swigCMemOwn = false; /* Store reference in field member*/ this.map=map; /* Add myself to the layers array of the map */ map.layers[this.getIndex()]; } } %} %typemap(javaout) int insertClass { // call the C API, which needs to be modified // so that the classObj is not copied anymore int actualIndex=$jnicall; classes[actualIndex]=classobj; // disown the classObj just inserted, item 3.3 classobj.swigCMemOwn=false; return actualIndex; } %typemap(javaout) classObj *getClass { if (classes[i]!=null) return classes[i]; long cPtr = $jnicall; classObj result = (cPtr == 0) ? null : new classObj(cPtr, false); if (result!=null) classes[i]=result; return result; } %typemap(javacode) layerObj %{ /* an array of classes to keep referenced objects alive and prevent their garbage collection, item 3.1 */ classObj[] classes=new classObj[mapscriptConstants.MS_MAXCLASSES]; /* same for the map, item 3.1 */ mapObj map=null; %} 3.5 PERL: mapscript code example for layerObj +++++++++++++++++++++++++++++++++++++++++++++ Code example for layerObj (*plmodule.i*): :: %feature("shadow") ~mapObj() %{ sub DESTROY { return unless $_[0]->isa('HASH'); my $self = tied(%{$_[0]}); return unless defined $self; delete $ITERATORS{$self}; if (exists $OWNER{$self}) { mapscriptc::delete_mapObj($self); delete $OWNER{$self}; } delete $mapscript::MAPLAYERS{$self}; } %} %feature("shadow") layerObj(mapObj *map) %{ sub new { my $pkg = shift; my $self = mapscriptc::new_layerObj(@_); bless $self, $pkg if defined($self); if (defined($_[0])) { $self->DISOWN(); mapscript::LAYER_ADD_MAP_REF($self, $_[0]); mapscript::MAP_ADD_LAYER_REF($_[0], $self); } return $self; } %} %feature("shadow") ~layerObj() %{ sub DESTROY { return unless $_[0]->isa('HASH'); my $self = tied(%{$_[0]}); return unless defined $self; delete $ITERATORS{$self}; if (exists $OWNER{$self}) { mapscriptc::delete_layerObj($self); delete $OWNER{$self}; } delete $mapscript::LAYERMAP{$self}; } %} %perlcode %{ %LAYERMAP={}; sub LAYER_ADD_MAP_REF { my ($layer, $map)=@_; #print "MAP key=" . tied(%$layer) . "\n"; $LAYERMAP{ tied(%$layer) }=$map; } %MAPLAYERS={}; sub MAP_ADD_LAYER_REF { my ($map, $layer)=@_; my $layers=$MAPLAYERS{ tied(%$map) }; if (defined($layers)) { $layers->{$layer->{index}}=$layer; } else { $layers={}; $layers->{$layer->{index}}=$layer; } $MAPLAYERS{tied(%$map)}= $layers; } sub map_get { my $layer=@_[0]; #print "GET MAP key" . $layer . "\n"; if ( $LAYERMAP{ $layer } ) { return \%{$LAYERMAP{$layer}}; } return mapscriptc::layerObj_map_get(@_); } ################## # DEBUGGING ONLY # ################## sub getLayerFrom { my ($map, $idx)=@_; return $MAPLAYERS{tied(%$map)}->{$idx}; } sub getMAPLAYERS { return \%MAPLAYERS; } %} Note: this example implements a bidirectional *cache* (map->layers and layer->map), but this does not play well with Perl reference-count based GC and will cause memory leaks. To address this limitation Perl mapscript will only implement the layer->map or child->parent side of the *cache* as described in 4.3. 4. Implementation plan ---------------------- It seems that for most mapscripts (java, csharp, perl and python) there is enough functionality in SWIG to implement the features described in this RFC. For ruby we'll probably have to go a different route and implement the *%trackobjects* feature. As of Tcl I currently don't know if it's possible. The two following section describe in detail the required SWIG-mapscript features (injection of code and constructor customization). Each language gets then a specific section to deal with its own characteristics 4.1 Checking SWIG-Mapscript capabilites: %javacode ++++++++++++++++++++++++++++++++++++++++++++++++++ Swig provides the equivalent of *%javacode* for the following languages: 1. perl through %perlcode 2. python through %pythoncode 3. csharp through %cscode 4. ruby does not have any ruby code at all in its wrapper objects 5. swig-Tcl doesn't support a %tclcode construct This swig construct will be used to inject in the wrapper the definition for the references described in 3.1 and the wrapper methods. 4.1 Checking SWIG-Mapscript capabilites: constructor customization ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The %csconstruct used to wrap and customize the costructor of mapscript objects (item 3.3) is only available in csharp and java. It should be possible to simulate its behaviour with *%pythonprepend* or *%pythonprepend* in python and with *%perlcode* or *%feature("shadow")* in perl. This swig construct will be used to disown memory allocated by SWIG when the object has a reference to a valid parent (bug 1743). 4.2 Java and C-Sharp ++++++++++++++++++++ SWIG-Java and SWIG-CSharp share a common ground and are therefore very similar. The names of SWIG-Java constructs can be roughly translated into their C-Sharp equivalents by changing the *java* prefix into *cs* (i.e. *javacode* in *cscode*, *javaconstruct* in *csconstruct* and *javaout* in *csout*). The implementation should follow exactly this RCF. 4.3 Perl ++++++++ As in the example above most of the perl customization can be done with the use of the *shadow* construct. The implementation will differ from the RFC in the following manners: 1) the Perl wrapper will mantain references only in one direction (layer -> map, or generally contained -> container) and not both because of the implementation of the Perl GC (http://www.perl.com/doc/manual/html/pod/perlobj.html). If we had circular references this *island* of objects would normally never be garbage collected. 2) to wrap the *layerObj->{map}* invocation we must edit the *mapscript.pm* file manually. To overcome this limitation and automate find/replace we will use perl *pies* in the Makefile like the following two: :: perl -pi -e "s/\*swig_map_get = \*mapscriptc::layerObj_map_get/\*swig_map_get = \*mapscript::map_get/" mapscript.pm perl -pi -e "s/\*getMap = \*mapscriptc::layerObj_getMap/\*swig_map_get = \*mapscript::map_get/" mapscript.pm 4.4 Python ++++++++++ Python enjoys first-grade support in SWIG so the RFC should be implemented exactly as described. Differently from Perl Python's GC can find and free cycles of objects (http://arctrix.com/nas/python/gc/). 4.5 Ruby ++++++++ Needs investigation, probably we'll have to use rb_gc_* funtions to mark objects and prevent their garbage collection or use *%trackobjects*. Ruby will not implement this RFC. 4.6 Tcl +++++++ Needs investigation and a Tcl expert. At the time of this writing Tcl will probably not implement this RFC. 5.1 Implementation checklist ---------------------------- The following table will be used to track the implementation status of this RFC. There is a table for each mapscript object and when a language has implemented this RFC for a given object the mantainer will populate the relative cell with one of the following marks: - > if *cache* and wrappers have been implemented in the parent->child direction - < if *cache* and wrappers have been implemented in the child->parent direction - <> if *cache* and wrappers have been implemented in both directions - a plus sign (+) if object has been disowned - a minus sign (-) if object has been owned 5.2 mapObj ++++++++++ ===================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ===================== ============= ============ ============ ============= ========== ============= getSymbolset getFontset getLabelcache getExtent setSaved_extent getSaved_extent getImagecolor getOutputformat getReference getScalebar getLegend getQuerymap getWeb getConfigoptions insertLayer removeLayer getLayer getLayerByName prepareImage setOutputFormat draw drawQuery drawLegend drawScalebar drawReferenceMap getLabel nextLabel getOutputFormatByName appendOutputFormat removeOutputFormat ===================== ============= ============ ============ ============= ========== ============= 5.3 layerObj ++++++++++++ ======================= ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ======================= ============= ============ ============ ============= ========== ============= layerObj (constructor) getMap getOffsite getMetadata cloneLayer insertClass removeClass nextShape getFeature getShape getResult getClass getResults addFeature getExtent ======================= ============= ============ ============ ============= ========== ============= 5.4 classObj ++++++++++++ ======================= ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ======================= ============= ============ ============ ============= ========== ============= classObj (constructor) getLabel getMetadata getLayer clone createLegendIcon drawLegendIcon getStyle insertStyle removeStyle ======================= ============= ============ ============ ============= ========== ============= 5.5 webObj ++++++++++ ======================= ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ======================= ============= ============ ============ ============= ========== ============= webObj (constructor) getMap getExtent setExtent getMetadata ======================= ============= ============ ============ ============= ========== ============= 5.5 styleObj ++++++++++++ For styleObjs it is enough to disown them when they are fetched from the container object. It is not necessary to add the reference pointing back to the container object. ======================= ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ======================= ============= ============ ============ ============= ========== ============= styleObj (constructor) setColor getColor getBackgroundcolor setBackgroundcolor getOutlinecolor setOutlinecolor getMincolor setMincolor getMaxcolor setMaxcolor clone ======================= ============= ============ ============ ============= ========== ============= 5.6 labelObj ++++++++++++ For labelObjs it is enough to disown them when they are fetched from the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= labelObj (constructor) getColor setColor setOutlinecolor getOutlinecolor setShadowcolor getShadowcolor getBackgroundcolor setBackgroundcolor setBackgroundshadowcolor getBackgroundshadowcolor ========================== ============= ============ ============ ============= ========== ============= 5.7 hashTableObj ++++++++++++++++ For hashTableObjs it is enough to disown them when they are fetched from the container object (i.e. a layerObj). It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= hashTableObj (constructor) ========================== ============= ============ ============ ============= ========== ============= 5.8 colorObj ++++++++++++ For colorObjs it is enough to disown them when they are fetched from the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= colorObj (constructor) ========================== ============= ============ ============ ============= ========== ============= 5.9 imageObj ++++++++++++ For imageObjs it is enough to own them when they are fetched from the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= imageObj (constructor) ========================== ============= ============ ============ ============= ========== ============= 5.10 shapeObj +++++++++++++ For shapeObjs it is enough to set ownership properly when they are fetched from or added to the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= shapeObj (constructor) getLine getBounds get add clone copy buffer convexHull boundary getCentroid Union intersection difference symDifference ========================== ============= ============ ============ ============= ========== ============= 5.11 lineObj ++++++++++++ For lineObjs it is enough to set ownership properly when they are fetched from or added to the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= lineObj (constructor) getPoint get add set ========================== ============= ============ ============ ============= ========== ============= 5.12 pointObj +++++++++++++ For pointObjs it is enough to set ownership properly when they are fetched from or added to the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= pointObj (constructors) toShape ========================== ============= ============ ============ ============= ========== ============= 5.13 symbolsetObj +++++++++++++++++ For symbolsetObjs it is not necessary to add the reference pointing back to the map. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= symbolsetObj (constructor) setSymbol(symbolObj value) getSymbol getSymbol(int i) getSymbolByName index appendSymbol removeSymbol ========================== ============= ============ ============ ============= ========== ============= 5.14 symbolObj +++++++++++++++++ For symbolObjs it is enough to set ownership properly when they are fetched from or added to the container object. It is not necessary to add the reference pointing back to the container object. ========================== ============= ============ ============ ============= ========== ============= Method java C# perl python tcl ruby ========================== ============= ============ ============ ============= ========== ============= symbolObj (constructor) setPoints getPoints getImage setImage ========================== ============= ============ ============ ============= ========== ============= 6. Open issues -------------- Fetching an object from a map and then adding it to another map is likely to cause a segfault. The solution would be to remove it from the first map and then add it the second or implement a check before updating the reference to its parent and bail out (throw and exception). 7. Status --------- RFC opened for comments on Jan, the 10th 2007 with a post on mapserver-dev. RFC undergoing revision after discussion on mapserver-dev.