Introducing Indices

One major advantage of modern native XML databases is their ability to index the XML documents they contain. Proper use of indices can significantly reduce the time required to execute a particular Xquery expression. The previous examples likely executed in a perceptible amount of time, because BDB XML was evaluating each and every document in the container against the query. Without indices, BDB XML has no choice but to review each document in turn. With indices, BDB XML can find a subset of matching documents with a single, or significantly reduced, set of lookups. By carefully applying BDB XML indexing strategies we can improve retrieval performance considerably.

To examine the usefulness of our indices, we begin by raising the level of verbosity in the shell:

dbxml> setVerbose 2 2

Note

The following query execution times are relative to the computer and operating system used by the author. Your query times will differ as they depend on many qualities of your system. However, the percentage in improvement in query execution time should be relatively similar.

Recall the first structural query:

query '
collection("parts.dbxml")/part[parent-part]'

Query      - Starting eager query execution
Query      - parts.dbxml - U : [3000] 256 512 768 1024 1280 1536 1792 2048
2304 2560 2816 257 513 769 1025 1281 1537 1793 2049 2305 ...
Query      - Finished eager query execution, time taken = 2495.82ms
300 objects returned for eager expression '
collection("parts.dbxml")/part[parent-part]'

Notice the query execution time. This query takes almost 2.5 seconds to execute because the query is examining each document in turn as it searches for the presence of a parent-part element. To improve our performance, we want to specify an index that allows BDB XML to identify the subset of documents containing the parent-part element without actually examining each document.

Indices are specified in four parts: path type, node type, key type, and uniqueness. This query requires an index of the node elements to determine if something is present or not. Because the pattern is not expected to be unique, we do not want to turn on uniqueness. Therefore, the BDB XML index type that we should use is node-element-presence-none.

dbxml> addIndex "" parent-part node-element-presence-none
Adding index type: node-element-presence-none to node: {}:parent-part

dbxml> query '
collection("parts.dbxml")/part[parent-part]'

Query      - Starting eager query execution
Query      - parts.dbxml - P(parent-part) : [300] 2 12 22 32 42 52 62 72 82 92 
102 112 122 132 142 152 162 172 182 192 ...
Query      - Finished eager query execution, time taken = 173.084ms
300 objects returned for eager expression '
collection("parts.dbxml")/part[parent-part]'

Our query time improved from 2.5 seconds to just under 1/5th of a second. As containers grow in size or complexity, indices increase performance even more dramatically.

The previous index will also improve the performance of the value query designed to search for the value of the parent-part element.

dbxml> query '
collection("parts.dbxml")/part[parent-part = 1]'

Query      - Starting eager query execution
Query      - parts.dbxml - P(parent-part) : [300] 2 12 22 32 42 52 62 72 82 92 
102 112 122 132 142 152 162 172 182 192 ...
Query      - Finished eager query execution, time taken = 223.821ms
100 objects returned for eager expression '
collection("parts.dbxml")/part[parent-part = 1]' 

This query time also improved from 2.4 seconds to just over 1/5th of a second. Because the node's content that we are examining involves a number, we can improve our query performance time even more by indexing that node as a decimal value. To do this, use node-element-equality-decimal.

dbxml> addIndex "" parent-part node-element-equality-decimal

Adding index type: node-element-equality-decimal to node: {}:parent-part

dbxml>  query '
collection("parts.dbxml")/part[parent-part = 1]'

Query      - Starting eager query execution
Query      - parts.dbxml - V(parent-part,=,'1') : [100] 12 42 72 102 132 162 
192 222 252 282 312 342 372 402 432 462 492 522 552 582 ...
Query      - Finished eager query execution, time taken = 69.803ms
100 objects returned for eager expression '
collection("parts.dbxml")/part[parent-part = 1]'
 

With this second index, the query runs in less than 1/10th of a second, or three times faster than without the index.

Additional indices will improve performance for the other value queries.

dbxml>  query '
collection("parts.dbxml")/part[@number > 100 and @number < 105]'

Query      - Starting eager query execution
Query      - parts.dbxml - U : [3000] 256 512 768 1024 1280 1536 1792 2048 
2304 2560 2816 257 513 769 1025 1281 1537 1793 2049 2305 ...
Query      - Finished eager query execution, time taken = 6938.48ms
4 objects returned for eager expression '
collection("parts.dbxml")/part[@number > 100 and @number < 105]'

At almost 7 seconds there is plenty of room for improvement. To improve our range query, we can provide an index for the number attribute: .

dbxml>  addIndex "" number node-attribute-equality-decimal

Adding index type: node-attribute-equality-decimal to node: {}:number

dbxml> query '
collection("parts.dbxml")/part[@number > 100 and @number < 105]'

Query      - Starting eager query execution
Query      - parts.dbxml - V(@number,>,'100') : [2899] 103 104 105 106 107 
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 ...
Query      - parts.dbxml - V(@number,<,'105') : [105] 2 3 4 5 6 7 8 9 10 
11 12 13 14 15 16 17 18 19 20 21 ...
Query      - parts.dbxml - n(V(@number,>,'100'),V(@number,<,'105')) : [4] 
103 104 105 106 
Query      - Finished eager query execution, time taken = 29.967ms
4 objects returned for eager expression '
collection("parts.dbxml")/part[@number > 100 and @number < 105]'

This query's execution time has been reduced to less than 1/10 of a second. Proper use of indices can dramatically effect query performance.

BDB XML provides a wide variety of different index types to improve the performance of queries.