PhiloLogic 4 Query Syntax
PhiloLogic4’s query syntax has 5 basic operators:
token
"token"
a-f
|
, e.g. token | word
NOT
, e.g. token.* NOT tokens
This syntax is the same, but interpreted slightly differently, for the two different types of text query fields: word search and metadata search.
Full-text word search is unique in having the concept of a “term”, which is either a single plain/quoted term,
or a group of plain/quoted terms joined by |
, optionally followed by NOT
and another term-like filter expression.
When specifying a query, one can select a query method to constrain the relation between terms, such as within k words
or in the same sentence
OR
can conjoin plain and quoted tokens, and precedes evaluation of phrase distance.NOT
is a filter on a preceding term, but cannot stand alone: a.* NOT abalone
is legal, NOT a.*
is illegalMetadata search does not support phrases, but supports more sophisticated Boolean searching.
OR
can still be used to conjoin plain tokens, preceding the implied Boolean AND, as well as quoted tokens.NOT
is still available as both a filter, or a stand-alone negation: contrat NOT social
is legal, so is NOT rousseau
Metadata objects also have the unique property of recursion, which creates some unusual consequences for search semantics.
Searching for a div that has property NOT x
does not guarantee that the result does not contain a child with property x,
or a parent with property x. This can often be handled at the database level by normalizing metadata to a single fine-grained layer,
but is tricky. Likewise, searches for NULL
values in recursive objects will often return “virtual” philologic objects,
which don’t exist in the XML but are necessary for balanced tree arithmetic.
Basic regexp syntax, adapted from the egrep regular expression syntax.
.
matches any single character except newline.[aeiou]
or [a-z]
, but will only match a single character unless followed by one of the quantifiers below.*
indicates that the regular expression should match zero or more occurrences of the previous character or bracketed group.+
indicates that the regular expression should match one or more occurrences of the previous character or bracketed group.?
indicates that the regular expression should match zero or one occurrence of the previous character or bracketed group.
Thus, .*
is an approximate “match anything” wildcard operator, rather than the more traditional (but less precise) *
in many other search engines.