Before you can begin indexing new documents, you have to instantiate the query engine and tell it what sorts of things to index or ignore. It's easy to instantiate:
XQEngine engine = new XQEngine();
Once you've got a copy of the engine, you can start calling its APIs.
If you don't tell it differently and just use the default settings, XQEngine indexes everything it encounters. That might not be what you want. For example you might not want to index common stop words like "a" and "the". If you want to supply a list of stop words to ignore when indexing, call setStopList(stopListFilePath) before you index your first document. A stop words file is a simple ASCII text file with one stop word per line. If you want to ignore numbers, call setDoIndexNumbers(false) .
NOTA: At the moment, both the setStopList() and setDoIndexNumbers() APIs are inoperative.
And if you consider short words to be insignificant and want to ignore them, call setMinIndexableWordLength(minLength) to discard any word that's less than minLength characters long.
All the above APIs are optional and have reasonable defaults. The only API you absolutely must call before indexing your first document is setXMLReader() to tell XQEngine which SAX2 XMLReader to use. (You need to be working with a SAX2-capable parser to use XQEngine.) How you actually instantiate a SAX parser and obtain an XMLReader reader from it is parser-dependent. The SampleApp sample application that ships with the download shows how to do this for both Sun's crimson-based parser and for Xerces. You'll need to put the parser code on the CLASSPATH, unless you're using the latest version of the JDK, which has the crimson-based parser code built-in.
Some of the APIs for the query engine uses a JavaBean-style "setter/getter" nomenclature. To index a document, for example, you call setDocument(fileAddress) on the document, passing in either :
- the local pathname of the file as a String or a file://-based URL if its on your local drive,
- an http://-based URL if it's on a remote machine, or
- a scheme-prefixed custom address to invoke the
content() method of a pre-registered custom protocol handler.
Here are some samples of setDocument() calls to index documents being addressed using options #1 and #2 above:
|
String fileName;
fileName = "file:/C:/XQEngine/testFiles/darkness/darkness.xml";
m_engine.setDocument( fileName );
fileName = "http://www/fatdog.com/XmlInXml.xml";
m_engine.setDocument( fileName );
fileName = "C:\\XQEngine\\testFiles\\bib_2.xml";
m_engine.setDocument( fileName );
fileName = "XQEngine/testFiles/bib.xml";
m_engine.setDocument( fileName );
|
|
You can see an example of option #3, using a custom protocol handler, at usage.
These method calls all ignore the docId function result returned by setDocument() . This int result can both be used as an argument to the doc() built-in function and to retrieve a document's filename. (See the API for getDocumentName()).
Once you've done some indexing of whatever type, you can then pose queries against the index (or repository, or collection) that's been created using setQuery( queryString ) , where queryString is any valid query as defined in either the XQuery or XPath specifications (although the current level of XQuery support offered by XQEngine is far less than the full standard at this point.) Any queries you make are posed against the sum content of all the documents you've indexed to that point.
You can also work with explicit documents, which allow you to index and query against XML documents that aren't file-based. They're called "explicit" because rather than passing in a filename as above, you actually pass in the literal content of the document itself. This uses an API called (no surprise), setExplicitDocument(String docContents) . This API lets you index and query against any sort of XML document, as long as you can supply the engine with a copy of that document in serialized, "angle bracket" form on demand.
Here's an example of a call to setExplicitDocument() :
|
int docId = setExplicitDocument(
"<log>" +
"<entry num="1" date="030615"><hits>1556</hits></entry>" +
"<entry num="2" date="030616"><hits>2033</hits></entry>" +
"<entry num="3" date="030617"><hits>2033</hits></entry>" +
"<entry num="4" date="030618"><hits>2033</hits></entry>" +
"<entry num="5" date="030619"><hits>2033</hits></entry>" +
... +
"</log>" );
|
|
Here's a canonical example of working with an XQEngine ResultList object to get serialized XML output from a query:
|
ResultList results = m_engine.setQuery( queryString );
results.emitXml( );
|
|
ResultList.emitXml() , shown below, in turn calls the emitXml() method belonging to each of its subordinate DocItems objects using an enumeration-type interface. (Note that DocItems was called DocumentItems in previous versions of the program):
|
public void emitXml( PrintWriter pw )
//-----------------------------------
{
DocumentItems doc;
while (( doc = nextDocument() ) != null )
{
doc.emitXml( );
}
}
|
|
Here are some examples of the several built-in functions that XQEngine currently supports, adopted from the JUnit tests in the TestCase> file "Function.java" (see XQEngine JUnit Tests):
count()
|
ResultList hits = m_engine.setQuery( "count( //node() )" );
int count = hits.evaluateAsInteger();
// => count == 90 (for "bib.xml")
|
|
|
ResultList hits = m_engine.setQuery( "count( 1,'two',3 )" );
int count = hits.evaluateAsInteger();
// => count == 3
|
|
|
ResultList hits = m_engine.setQuery( "count( () )" );
int count = hits.evaluateAsInteger();
// => count = 0
|
|
doc()
|
String query = "doc( 'C:/XQEngine_TestFiles/bib.xml')//editor/last/text()";
ResultList hits = m_engine.setQuery( query );
hits.emitXml( );
// => Gerbarg
|
|
contains-word()
|
ResultList hits = m_engine.setQuery( "contains-word( //title, 'tcp' )" );
hits.emitXml( );
// => <title>TCP/IP Illustrated</title>
|
|
exists()
|
ResultList hits = m_engine.setQuery( "exists( //title )" );
hits.emitXml( );
// => true
|
|
empty()
|
ResultList hits = m_engine.setQuery( "empty( //noSuchElementThatIKnowOf )" );
hits.emitXml( );
// => true
|
|
|