Docs

Filter Operators

SQL-like operators for metadata filters in bucket search — eq, ne, like, prefix, in, gt/gte/lt/lte, exists.

Bucket search accepts a filter object that narrows results by metadata. Plain scalar values are exact-match (backward compatible). Wrap a value in an operator dict to express richer predicates.

Operators

OperatorTypeExampleNotes
eqany{"eq": "urgent"}Exact match. Lowered to engine fast path.
neany{"ne": "draft"}Not-equal. Post-filter.
likestring{"like": "%legal%"}SQL LIKE: % = any chars, _ = single, \% / \_ = literal. Case-insensitive.
prefixstring{"prefix": "2024-"}Starts-with. Case-insensitive.
inarray{"in": ["a","b","c"]}Membership. Max 100 entries.
gt / gtenumber / ISO date{"gte": 0.8}Greater-than (or equal). Numeric coercion, ISO date string fallback.
lt / ltenumber / ISO date{"lt": 100}Less-than (or equal).
existsboolean{"exists": true}true requires field present and non-empty; false requires missing.

Example

bashcurl -X POST https://api.schift.io/v1/buckets/{bucket_id}/search \
  -H "Authorization: Bearer $SCHIFT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "fire safety inspection cycle",
    "top_k": 10,
    "min_score": 0.6,
    "filter": {
      "tag": "urgent",
      "source_url": {"like": "%fire-safety%"},
      "filename":   {"prefix": "2024-"},
      "doc_type":   {"in": ["policy", "spec"]},
      "score":      {"gte": 0.8},
      "stage":      {"ne": "draft"},
      "author":     {"exists": true}
    }
  }'

Filters are conjunctive (AND)

Top-level operator clauses must all match. Use in for OR within a single key, or $or (below) for cross-key OR.

Cross-key OR (`$or`)

Use $or for disjunctive predicates across different metadata keys. Each arm is a full sub-filter dict and supports any operator.

json{
  "filter": {
    "doc_type": "policy",
    "$or": [
      {"severity": "high"},
      {"priority": {"in": ["P0", "P1"]}}
    ]
  }
}

The above matches docs where doc_type=policy AND (severity=high OR priority∈{P0,P1}). Nesting allowed up to 3 levels; max 16 arms per $or.

Result ranking

Filters never modify scores — they prune the candidate set. The engine fetches top_k * 3 candidates when post-filter operators are present to absorb shrinkage, then trims to top_k after filtering.

min_score

A top-level min_score (0.0–1.0) drops hits whose final score is below the threshold, applied after rerank. Useful as a hallucination defense in chatbot pipelines.

Safety

Patterns are length-capped (256 chars), wildcard count is capped (16), and in lists are capped (100 entries). Patterns are translated to anchored case-insensitive regex with all metacharacters escaped — no SQL injection surface.