One major advantage of modern native XML databases is their ability to index the XML documents they contain. Proper use of indexes can significantly reduce the time required to execute a particular XQuery expression. The previous examples likely executed in a perceptible amount of time, because BDB XML was evaluating each and every document in the container against the query. Without indexes, BDB XML has no choice but to review each document in turn. With indexes, BDB XML can find a subset of matching documents with a single, or significantly reduced, set of lookups. By carefully applying BDB XML indexing strategies we can improve retrieval performance considerably.
By default, BDB XML turns several useful indexes on so you
do not have to worry about them. However, for the purposes
of this document we turned them off at the beginning of
this chapter. (Using the shell
setAutoIndexing
command). We do this here so
we can see relative performance differences between
containers with no indexes, and containers with the indexes
that we set.
To examine the usefulness of our indexes, we will use the
time
command with each of our queries.
This will report how long it takes for each operation to complete.
The following query execution times are relative to the computer and operating system used by the author. Your query times will differ as they depend on many qualities of your system. However, the percentage in improvement in query execution time should be relatively similar.
Recall the first structural query:
time query ' collection("parts.dbxml")/part[parent-part]' 10000 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part]' Time in seconds for command 'query': 0.437096
Notice the query execution time. This query takes almost a half
a second to execute because the query is examining each document in turn as it
searches for the presence of a parent-part element. To improve our
performance, we want to specify an index that allows BDB XML to identify
the subset of documents containing the parent-part
element without actually examining each document.
Indices are specified in four parts: path type, node type, key type,
and uniqueness. This query requires an index of the node elements to
determine if something is present or not. Because the pattern is not
expected to be unique, we do not want to turn on uniqueness. Therefore,
the BDB XML index type that we should use is
node-element-presence-none
.
dbxml> addIndex "" parent-part node-element-presence-none Adding index type: node-element-presence-none to node: {}:parent-part dbxml> time query ' collection("parts.dbxml")/part[parent-part]' 10000 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part]'
Our query time improved from .4 seconds to .2 seconds. As containers grow in size or complexity, indexes increase performance even more dramatically.
The previous index will also improve the performance of the value query
designed to search for the value of the parent-part
element.
But for better results, we should index the node as a double
value. (You use double here instead of decimal because the
XQuery specification indicates that implicit numerical casts
should be cast to double).
To do this, use a node-element-equality-double
index.
dbxml> time query ' collection("parts.dbxml")/part[parent-part = 1]' 3333 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part = 1]' Time in seconds for command 'query': 0.511752 dbxml> addIndex "" parent-part node-element-equality-double Adding index type: node-element-equality-decimal to node: {}:parent-part dbxml> time query ' collection("parts.dbxml")/part[parent-part = 1]' 3333 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part = 1]' Time in seconds for command 'query': 0.070674
Additional indexes will improve performance for the other value queries.
dbxml> time query ' collection("parts.dbxml")/part[@number > 100 and @number < 105]' 4 objects returned for eager expression 'collection("parts.dbxml")/part[@number > 100 and @number < 105]' Time in seconds for command 'query': 5.06106
At over 5 seconds there is plenty of room for improvement. To improve our range query, we can provide an index for the number attribute:
dbxml> addIndex "" number node-attribute-equality-double Adding index type: node-attribute-equality-double to node: {}:number dbxml> time query ' collection("parts.dbxml")/part[@number > 100 and @number < 105]' 4 objects returned for eager expression ' collection("parts.dbxml")/part[@number > 100 and @number < 105]' Time in seconds for command 'query': 3.33212
As you can see, proper use of indexes can dramatically effect query performance.
We mentioned at the beginning of this section that we had turned auto indexing off. If we had left it on, the container would have automatically had the following indexes:
node-element-string-equality node-attribute-string-equality node-element-double-equality node-attribute-double-equality
These indexes would have been added for all attribute and
leaf nodes. For this example, the indexes would have been added for
the <description>
,
<category>
and
<parent-part>
nodes. They would
have also been added for the number
attribute on the <part>
node.