DESCRIPTION

g.infer is a tool to create rule-based data-driven workflows from GRASS data layers and additional data sources. g.infer can modify existing GRASS data layers, can create new vector layers or can start additional additional GRASS modules. This is controlled by an inference process, which applies a knowledge base on a set of known facts (data). g.infer provides a Production System to set up Expert Systems from domain knowledge and GIS data layers.

The g.infer inference environment is based on a strict separation of knowledge (rules) and data (facts and object instances). Data consists of GRASS-derived spatial information and other input. The g.infer inference engine applies the knowledge, consisting of rules (stored in the rulebase), to the available data. For development and testing, interactive access and logging options are available.

g.infer provides a flexible environment to start rule-based work. For complex tasks there are also advanced capabilities to extend the knowledge modelling whenever necessary. This approach can be used for geodata classification, GIS workflow control and other tasks.

The C-Language Integrated Production System (CLIPS) Expert System Shell environment is embedded within g.infer using the pyCLIPS Python module. In depth information is found in the References, which contain also links to literature describing the projects Jess and Drools which are based on CLIPS. Their documentation contains additional clues for CLIPS-related development.

The exchange of data between the GRASS and CLIPS environments is handled by extensions of the pyCLIPS functionality, connecting both GRASS and CLIPS. See the "Notes on g.infer pyCLIPS extensions" section for details.

Input

g.infer can ingest multiple input sources as data: Spatial information can be imported from GRASS data layers (raster, rast3d, point-vector) and additional sources. Rule base-related constructs (templates, facts, classes and instances) are read from external files or built-in rule bases via the library option.

Additional data can also be imported during the inference process if requested by the rule base. The user can further enter facts and rulebase elements via an interactive CLIPS prompt.

Output

g.infer can manipulate the content of imported GRASS data layers and other sources of data. These changes can be written back to the respective GRASS layers using the export option for persistent storage. A new vector layer can be created through the output option. The creation of additional GRASS layers can be invoked by the rulebase beyond the limited range of options provided by g.infer module parameters.

Rulebase

The g.infer rulebases contains the working knowledge for the inference process, which is seperate from the data (facts). They are read from file by the rulebase option or selected using the library option. Rulebases can be edited via the interactive CLIPS prompt (-i flag) before the actual inference run.

There are three ways to represent knowledge in g.infer. They are based on concepts from the CLIPS programming language (see CLIPS documentation for details). All of them can be part of the rulebase:

Rules adhere to a IF THEN layout defined by the CLIPS programming language (see below). The condition antecedent ("IF") -part is referred to as left hand side (LHS), while the invoked consequent ("THEN") is called the right hand side (RHS). LHS and RHS sides of a rule statement are connected by the "=>" characters.

Inference Engine

The inference engine is the part of the g.infer module which evaluates what rules are to be applied, based on the currently available data (facts). The CLIPS inference engine used in g.infer is based on the Rete algorithm (Forgy 1982), using a forward-chaining data-driven approach. The order in which rules are to be executed can be controlled in three ways: By explicitly defining salience values for each rule, knowledge base partitioning by modules or implicitly by setting the conflict resolution strategy (strategy option). The section "Rulbase Development, Operation and Debugging" holds a more detailed description.

Performance

The Rete-algorithm performs well for a change below 10 percent of the available data (facts) during an inference run. Scenarios, in which the complete content of a GRASS layer must be changed, resulting in 100 percent change, will result in decreased performance.

g.infer uses the current settings of the GRASS region to create facts from the GIS layers which are provided as input. The resolution of the region must be set appropriately for the task, as a very high resolution will result in a large number of facts.

Knowledge Engineering in g.infer

Knowledge engineering (modelling) in g.infer involves in most cases the definition of rules for the rulebase. It can also include more complex programming tasks, such as conditions, loops, functions and object oriented programming. For knowledge modelling and programming in g.infer, the programming language of the CLIPS Expert System Toolkit is used, which is closely related to the programming language LISP. For an in depth description of the CLIPS language, refer to the CLIPS Users's Guide and Reference Manuals (Giarratano 2007, 2008). A useful introduction to LISP is provided by Graham, 1995.

In the following, the given examples refer to a rulebase and are imported via a rulebase file, from a built in demo rulebase, or are entered using the interactive CLIPS prompt. This CLIPS-based programming can not be mixed with GRASS GIS scripting, but can invoke GRASS modules as part of the RHS of rules.

Rulebase Notation

g.infer rulebase constructs use a fully parenthesized polish prefix notation: Any term, or formula, must be put in braces, with the operator preceding the operands:
(operator operand1 operand2 ... operandN)

This differs from the infix notation commonly used in GRASS GIS, like in r.mapcalc:

Data Types

g.infer's rule engine provides eight primitive data types for representing information. As the programming language of the CLIPS Production System is weakly typed, variables can be declared without explicitly setting a type. The available types are A number consists only of digits (0-9), a decimal point (.), a sign (+ or -), and, optionally, an (e) for exponential notation with its corresponding sign. A number is either stored as a float or an integer. Any number consisting of an optional sign followed by only digits is stored as an integer. All other numbers are stored as floats.

Facts

Facts are the common high level form for representing information in g.infer. Within the CLIPS Production System, facts are the fundamental unit of data to be processed by the rulebase to infer information by the firing of applicable rules. Each fact represents a piece of data which has been placed in the overall fact list of the currently known facts.

Facts may be added to the fact list (using the assert command), removed from the fact list (using the retract command), modified (using the modify command), or duplicated (using the duplicate command) by either direct user interaction or while the rule base is executed. The number of facts in the fact list and the amount of information that can be stored within a fact is limited only by the amount of available memory. If a fact is to be asserted into the fact list which exactly matches an already existing fact, the new assertion will be ignored. This default behaviour can be changed.

Some CLIPS language commands, including retract, modify, and duplicate, require a fact to be specified by a reference. A fact can be specified either by fact index or fact address: Whenever a fact is asserted (or modified) it is assigned a unique index number called its fact index. Fact indices start at zero and are incremented by one for each new or changed fact. Whenever a reset or clear command is given, the fact indices restart at zero. A fact may also be specified through the use of a fact address. A fact address can be obtained by capturing the return value of commands which return fact addresses (such as assert, modify, and duplicate) or by binding a variable to the fact address of a fact which matches a pattern on the LHS of a rule. A fact identifier is a shorthand notation for printing a fact. It consists of the character f, followed by a dash, followed by the fact index of the fact.

There are two categories of facts, ordered facts and non-ordered facts. Both types are applied in g.infer to represent information imported from the GRASS GIS environment.

Ordered facts
Ordered facts consist of a symbol followed by a sequence of zero or more fields separated by spaces and delimited by an opening parenthesis on the left and a closing parenthesis on the right. Ordered facts encode information positionally and can contain multiple data fields. The first field of an ordered fact specifies a relation that applied to the remaining fields in the ordered fact.

For example,

(is-a GRASS GIS)
states that GRASS is a GIS.

See the CLIPS Introduction and User Guides in the Reference Section for an in-depth discussion.

A simple fact can contain multiple fields for content:

(assert (trail "OtterTrail" scenic flat))
GRASS environment variables as ordered facts
Ordered facts are used to represent GRASS enironment variables into g.infer, including g.region parameters. These ordered facts can be manipulated within the CLIPS environment and be used when calling GRASS modules from within CLIPS. They must be exported back into GRASS to become permanent within the current session, which is not done by default. The following representations of GRASS environment variables as facts are generated by default:

REGION_ROWS, REGION_COLS, REGION_CELLS, REGION_NSRES, REGION_EWRES, REGION_N, REGION_S, REGION_E, REGION_W, REGION_ROWS3, REGION_COLS3, REGION_CELLS3, REGION_NSRES3, REGION_EWRES3, REGION_TBRES, REGION_T, REGION_B, REGION_DEPTHS, GISDBASE, LOCATION_NAME, MAPSET, GRASS_GUI

Non-ordered facts
Facts to contain structured information require a template to define the structure:

Template based, non ordered facts allow to structure the content of a fact by assessing data fields by name (and type). The deftemplate construct is used to create a named template (which can then be used to access fields by name). This approach is used to import GRASS layers in g.infer to create the respective facts.

The deftemplate construct allows the name of a template to be defined along with zero or more definitions of named fields, also called slots. Unlike ordered facts, the slots of a deftemplate fact may be constrained by type, value, and numeric range. In addition, default values can be specified for a slot. A slot consists of an opening parenthesis followed by the name of the slot, zero or more fields, and a closing parenthesis.

Deftemplate facts are easily distinguished from ordered facts by their first field. If the symbol serving as the first field corresponds to the name of a deftemplate, then the fact is a deftemplate fact. Like ordered facts, deftemplate facts are enclosed by an opening parenthesis on the left and a closing parenthesis on the right.

In addition to being asserted and retracted, deftemplate facts can also be modified and duplicated (using the modify and duplicate commands). Modifying a fact changes a set of specified slots within that fact. Duplicating a fact creates a new fact identical to the original fact and then changes a set of specified slots within the new fact. The benefit of using the modify and duplicate commands is that slots which do not change, can be left out in the statement.

Examples are provided in the built-in rule bases via the library-option and in the CLIPS User's Guide.

Template based Facts for GRASS Layers

Whenever a GRASS layer is imported by g.infer, a template is created according to its data structure and facts are asserted for the layers content.While the template for vector layers will differ depending on the attribute layers, raster data templates adhere to the following structure:

Raster:

(geo_test "a comment" (x 599000.0) (y 4921800.0) 
(value 5) (attribute "something"))
Vector:
SOIL EXAMPLE (field as point layer)
Rast3d:
(random_slice_00009 (x 599000.0) (y 4921800.0) (z 9.0) 
(value 1.0))

"_slice_000009" has been added by g.infer for 
internal reference.

Rules

The primary method of representing and modifing knowledge in g.infer are rules. Any g.infer rulebase comprises of a set of rules which collectively solve a classification task, establish a workflow or do similar. Rules are used to represent heuristics, or "rules of thumb", which specify a set of actions to be performed for a given situation. Rules can cause actions such as the creation, modification or deletion of facts or the call-up of GRASS modules and scripts. A rule is composed of an antecedent and a consequent. The antecedent of a rule is also referred to as the if portion or the left hand side (LHS) of the rule. The consequent of a rule is also referred to as the then portion or the right hand side (RHS) of the rule.

The antecedent of a rule (LHS) is a set of conditions (or conditional elements) which must be satisfied for the rule to be applicable. The conditions of a rule are satisfied based on the existence or non existence of specified facts in the fact list or specified instances of user defined classes in the instance list. One type of condition which can be specified is a pattern. Patterns consist of a set of restrictions which are used to determine which facts or objects satisfy the condition specified by the pattern.

The process of matching facts and objects to patterns is called pattern matching. A mechanism, called the inference engine, matches patterns against the current state of the fact list and instance list and determines which rules are applicable during the inference run.

The consequent of a rule (RHS) is the set of actions to be executed when the rule is applicable. This is colloquially refefred to as "the rule fires". The actions of applicable rules are executed when the inference engine is instructed to begin execution of applicable rules. If more than one rule is applicable, the inference engine uses a conflict resolution strategy to determine which rule should fire first. The actions of the selected rule are executed (which can change the overall list of applicable rules). Afterwards the inference engine selects another rule and executes its actions. This process continues until no applicable rules remain.

In many ways, rules can be thought of as IF - THEN statements found in procedural programming languages. However, the conditions of an IF - THEN statement in a procedural language are only evaluated when the program flow of control is directly at the IF - THEN statement. In contrast, rules act like WHENEVER - THEN statements. The inference engine always keeps track of rules which have their conditions satisfied and thus rules can immediately be executed when they are applicable.In this sense, rules are similar to exception handlers found in other programming languages.

Comments

Rulebase content and other CLIPS code can be commented by starting lines with a semicolon.

Variables

Variables in CLIPS are weakly typed. They are not restricted to a predefined data type. So when creating a variable, it is not required to provide typing information. The defglobal construct allows variables to be defined which are global in scope throughout the CLIPS environment. Such a global variable can be accessed anywhere in the CLIPS environment and retains its value independent of other constructs. In contrast, some constructs (such as defrule and deffunction) allow local variables to be defined within the definition of the construct. These local variables can be referred to within the construct, but have no meaning outside the construct.

Functions

A function in CLIPS is a piece of executable code identified by a specific name which returns a useful value or performs a side effect (such as printing a message).

The defun command is used to define new functions. The body of a deffunction is a series of expressions similar to the RHS of a rule that are executed in order by the CLIPS inference engine when the deffunction is called. The return value of a deffunction is the value of the last expression evaluated within the deffunction. Calling a deffunction is identical to calling any other function in CLIPS.

Function Definition Example:

(defun fahrenheit_celsius (celsius_value) (+ (* celsius_value 1.8) 32))

Examples are provided in the built-in rule bases (library option) and in the CLIPS User's Guide and Basic Programming Guide.

Conditionals

Examples for conditions, comparing facts and variables with each other, are provided in the built-in rule bases (library option) and in the CLIPS User's Guide and Basic Programming Guide.

Rulebase Development, Operation and Debugging:

g.infer is a tool to set up and operate rule-based workflows for information classification and data processing. The workflow will be loaded via a 'rulebase'-file or a built in rulebase, consisting of a set of rules, called a knowledge base.

Rule activiation is controlled by the g.infer rule engine, based on spatial data from the GIS layers. When a rule is activated for the first time, it is placed on the agenda, based (in order) on the following factors:

a) Newly activated rules are placed above all rules of lower salience and below all rules of higher salience.

b) Among rules of equal salience, the current conflict resolution strategy is used to determine the placement among the other rules of equal salience.

c) If a rule is activated (along with several other rules) by the same assertion or retraction of a fact, and steps a and b are unable to specify an ordering, then the rule is arbitrarily (not randomly) ordered in relation to the other rules with which it was activated. In this respect, the order in which rules are defined has an arbitrary effect on conflict resolution (which is also dependent upon the current underlying implementation of rules). This arbitrary ordering for the proper execution of rules should not be depended on for knowledge modelling.

Once a knowledge base (in the form of rules) has been loaded into g.infer and the fact and instance lists are available, the inference engine is ready to execute rules:

Whenever a rule modifies a specific fact, the particular fact is retracted, that is, removed from the fact stack and a new (altered) fact is instantiated and added on the fact stack.

A rule base can be designed to act similar to the r.mapcalc module. This requires a close coupled multistage design of the rule base.

It is easily possible to construct rule sets which act like an infinity-loop, which never terminates. This is in the most part undesired, but may be an asset for monitoring activities.

Examples are provided in the built-in rule bases (library option)
 and in the CLIPS User's Guide and Basic Programming Guide.

Conflict Resolution

When more than one rule are eglible to fire, a priorization among the candidate rules is needed. Rules can be explicitly assigned a rank/priority called salience.

The conflict resolution strategy is an implicit mechanism for specifying the order in which rules of equal salience should be executed.

g.infer provides seven conflict resolution strategies. The default strategy is depth. The current strategy can be set by using the set strategy command (which will reorder the agenda based upon the new strategy):

Salience
The preferred mechanisms in g.infer for ordering the execution of rules are explicitly assigned salience values and knowledge base grouping using modules. Salience allows one to explicitly specify that one rule should be executed before another rule.

Options to apply salience values:

Defmodules
Modules allow one to explicitly specify that all of the rules in a particular group (module) should be executed before all of the rules in a different group.

Defmodules allow a knowledge base to be partitioned. Every construct defined must be placed in a module. The programmer can explicitly control which constructs in a module are visible to other modules and which constructs from other modules are visible to a module. The visibility of facts and instances between modules can be controlled in a similar manner. Modules can also be used to control the flow of execution of rules.

Examples are provided in the CLIPS User's Guide and Basic Programming 
Guide.

Agenda

The agenda is the list of all rules which have their conditions satisfied (and have not yet been executed). If the knowledge base is partitioned into multiple modules, each module has its own agenda. The agenda acts similar to a stack (the top rule on the agenda is the first one to be executed). Focus

The current focus determines which agenda the run command uses during execution. The reset and clear commands automatically set the current focus to the MAIN module.

Logging and Debugging

Several features of g.infer support the development and debugging of rule-bases:

Debugging: g.infer Flags for Logging, Interaction and Abort:

Watch-Option: Logging of Knowledge Base Performance

See the section "Notes on CLIPS" for complete definition of terminology.

Config-Option

config
Configuration options for CLIPS engine: auto-float-dividend, dynamic-constraint-checking,fact-duplication,incremental-reset, reset-globals,sequence-operator-recognition,static-constraint-checking.

Classdefault-Option

classdefault
Class default of CLIPS engine: convenience(default),conservation

Rulebase Libraries

g.infer provides several pre-configured rulebases. The rulebases from the library can be used alone or in combination by user-defined rulebases. They provide a basic interactive user interface to launch specific demo applications and can also be started from the CLIPS prompt (using the (run) command). All demo rulebases contain extensive comments to document their functionaility for the interested user.

Notes on g.infer pyCLIPS extensions

g.infer extends the pyCLIPS- amd CLIPS environments with several commands to communicate and interact the GRASS GIS:

Examples of the invocation of GRASS modules as part of the RHS of rules are provided in the built in demo rule bases.

CLIPS Object Oriented Language

The CLIPS Object Oriented Language (COOL) includes elements of data abstraction and knowledge representation. It supports abstraction, encapsulation, inheritance, polymorphism and dynamic binding. An overview of COOL as a whole, incorporating the elements of both concepts is given in the CLIPS Basic Programming Guide. The primary difference between objects and templates (or non ordered) facts is the notion of inheritance. Inheritance allows the properties and behavior of a class to be described in terms of other classes. COOL supports multiple inheritance: a class may directly inherit slots and message-handlers from more than one class. Since inheritance is only useful for slots and message - handlers, it is often not meaningful to inherit from one of the primitive type classes, such as MULTIFIELD or NUMBER. This is because these classes cannot have slots and usually do not have message - handlers.

EXAMPLES

Examples are provided in the built in rule bases (library option).

BUGS

REFERENCES

Browne P. (2009) JBOSS Drools Business Rules. Packt Publishing. ISBN 1-847-19606-3

Forgy C., (1982) Rete: A Fast Algorithm for the Many Pattern/ Many Object Pattern Match Problem", Artificial Intelligence, 19, pp 17–37

Friedman-Hill E. (2003). Jess in Action. Manning Publications. ISBN 1-930-11089-8

Garosi F. (2008). PyCLIPS Manual Release 1.0. URL http://sourceforge.net/projects/pyclips/files/pyclips/pyclips-1.0/pyclips-1.0.7.348.pdf/download

Giarratano J., Gary R. (2004). Expert Systems: Principles and Programming. Course Technology. ISBN 0-534-38447-1

Giarratano, J.C. (2007). CLIPS User's Guide. URL http://clipsrules.sourceforge.net/documentation/v630/ug.pdf

Giarratano, J.C. (2007). CLIPS Reference Manual: Basic Programming Guide. URL http://clipsrules.sourceforge.net/documentation/v630/bpg.pdf

Giarratano, J.C. (2008). CLIPS Reference Manual: Advanced Programming Guide. URL http://clipsrules.sourceforge.net/documentation/v630/apg.pdf

Graham P. (1995). ANSI Common Lisp. Prentice Hall, ISBN 0-133-79875-6

Löwe P. (2004). Technical Note - A Spatial Decision Support System for Radar-Metereology in South Africa. Transactions in GIS. 8(2):235-244. Blackwell Publishing Ltd. Oxford.

Löwe P. (2004). Methoden der Künstlichen Intelligenz in Radarmeteorologie und Bodenerosionsforschung (Dissertation). URL http://opus.bibliothek.uni-wuerzburg.de/volltexte/2004/759/

Jackson P. (1998). Introduction to Expert Systems. Addison Wesley. ISBN 0-201-87686-8

Puppe F. (1993) Systematic Introduction to Expert Systems. Springer. ISBN 3-540-56255-9

Riley G., (2008). The History of CLIPS. URL http://clipsrules.sourceforge.net/WhatIsCLIPS.html#History

Rudolph G. (2008). Some Guidelines For Deciding Whether To Use a Rule Engine. URL http://www.jessrules.com/jess/guidelines.shtml

SEE ALSO

r.fuzzy, r.mapcalc: Raster algebra, r.infer, v.in.ascii: Definition of vector columns for output vectors

AUTHOR

Peter Löwe