Chapter 2. XQuery and Berkeley DB XML

Table of Contents

Adding Data
Queries Involving Document Structure
Value Queries
Introducing Indices
Reshaping the Result
Sorting the Result
Working with Data from Multiple Containers
Working with Data from a Specific Document
Using Metadata
Modifying Documents
Schema Constraints
The Berkeley DB XML API

This section steps through some of the XQuery functionality provided by BDB XML and then introduces a few of the facilities BDB XML provides that make working with XML highly efficient. Those unfamiliar with XQuery should first review one of the many excellent XQuery tutorials listed at the end of this document before proceeding.

Adding Data

In this example, the container will manage a few thousand documents modeling an imaginary parts database. Begin by using the following command to create a container called parts.dbxml:

dbxml> createContainer parts.dbxml

Creating node storage container with nodes indexed

A successful response indicates that the container was created on disk, opened, and made the default container within the current context of the shell.

Before we continue, we need to turn off a default behavior of BDB XML. We do this here so that we can make some points later about XQuery performance. We'll explain this later, but for now, simply enter the command:

dbxml> setAutoIndexing off

Set auto-indexing state to off, was on

Next populate the container with 100000 XML documents that have the following basic structure:

<part number="999">
    <description>Description of 999</description>
    <category>9</category>
</part>

Some of the documents will provide additional complexity to the database and have the following structure:

<part number="990">
   <description>Description of 990</description>
   <category>0</category>
   <parent-part>0</parent-part>
</part>

Use the following putDocument command to insert the sample data into the new parts container.

Note

Depending on the speed of your machine, you may want to reduce the total number of documents you add to your container for performance reasons. We use a moderately sized document set here so that we are better able to observe timing results later in this chapter. If you are using slow hardware, you should be able to observe the same results using a smaller document set size.

dbxml> putDocument "" '
for $i in (0 to 99999) 
return 
  <part number="{$i}">
    <description>Description of {$i}</description>
    <category>{$i mod 10}</category>
    {
      if (($i mod 10) = 0) 
      then <parent-part>{$i mod 3}</parent-part> 
      else ""
    }
  </part>' q

As the query executes, one line will be printed for each document inserted into the database.