The Open API specification for Synapse is now available for download!

Download Open API Spec

POST /search/query/async/start

Start an asynchronous search query job against a SearchIndex.

The request wraps a SearchQuery — the top-level OpenSearch _search body, allowlist-validated server-side and submitted to AOSS. Each slot's contents are pass-through OpenSearch DSL.

Required: query is required (use {"match_all":{}} to match all documents). from and search_after are mutually exclusive: when search_after is supplied the server pins from=0 internally; supplying both search_after and from 0 is rejected with HTTP 400.

Two pagination modes. Use from + size for the simple "jump to page N" case. Use search_after for deep pagination past the OpenSearch from + size ~10,000-row ceiling: omit search_after on the first request and on every subsequent request pass back the previous response's nextSearchAfter unchanged. Cursors are stable as long as the underlying sort is unchanged. For shallow paging within the first ~10,000 rows, prefer from + size.

Field references and sub-field routing

Field references use column names everywhere (DSL clauses, aggregation field, highlight.fields keys, sort, _source.includes / _source.excludes). The server resolves names to internal column ids before sending to OpenSearch and rewrites them back to column names on response so callers see their original schema.

Pass the bare column name in every clause. The server knows the index schema and routes text-typed columns (STRING, STRING_LIST, MEDIUMTEXT, LARGETEXT, LINK) through {column}.keyword automatically when the operation requires it: term / terms / prefix / wildcard / fuzzy / range / match_phrase_prefix, every aggregation kind, sort, and collapse. The relevance-scored match-family clauses (match, multi_match, match_phrase, match_bool_prefix, simple_query_string) use the analyzed text field directly. Numeric, boolean, keyword (ENTITYID / USERID), and date columns always use the bare name.

Aggregation results come back with field references reported as the caller's bare column name — the server strips the .keyword suffix it auto-appended on the request side. Callers who prefer to be explicit may still supply {columnName}.keyword on a reference; the server preserves the suffix verbatim. The {field}^{boost} form on multi_match.fields is also preserved.

Allowlisted top-level keys

query — required. See the OpenSearch query DSL. Compound (bool / dis_max / constant_score / boosting) and leaf (match / multi_match / match_phrase / match_phrase_prefix / match_bool_prefix / term / terms / range / exists / prefix / wildcard / fuzzy / simple_query_string / match_all) clauses. The server wraps the supplied subtree as a must clause inside its own bool.

post_filter — optional. Same DSL shape as query, applied after aggregations are computed: aggregations see the unfiltered population (matched by query) while the returned hits are narrowed by post_filter. For filters that should also constrain aggregations, place them inside query.bool.filter instead.

aggregations — optional. Map of caller-chosen name to aggregation definition. Supports terms / histogram / date_histogram / range / date_range / min / max / avg / sum / stats / extended_stats / value_count / cardinality / missing, with nested sub-aggregations. Aggregations need doc values; text-typed columns are auto-routed through .keyword. The raw aggregation result comes back on SearchQueryResults.aggregationResults, with field references rewritten back to bare column names.

highlight — optional. Adds per-field snippet fragments with matched terms wrapped in em / /em tags (configurable via pre_tags / post_tags) to each SearchQueryResults.hits[*].highlights entry. highlight.fields keys are caller column names. Allowlisted highlighter types: unified (default), plain, fvh; semantic is rejected.

collapse — optional. Groups the result list so only one hit is returned per distinct value of field. Collapse needs doc values; text-typed columns are auto-routed through .keyword. inner_hits is rejected.

rescore — optional. Re-ranks the top window_size hits returned by query using a secondary, typically more expensive, scoring query. The original ranking is preserved past the rescore window. The inner rescore_query is validated against the same allowlist as query.

sort — optional. OpenSearch sort shape (string column name, {column: "asc|desc"}, or {column: {order: ...}}). The pseudo-column _score sorts by relevance. When omitted, results are sorted by relevance descending (_score DESC). Text-typed columns are auto-routed through .keyword.

_source — optional. Source filter. Accepts the full OpenSearch SourceConfig shape: a boolean (false to omit _source entirely), an array of column-name patterns (shorthand for {includes: [...]}), or {includes: [...], excludes: [...]}. Names are column-name → column-id rewritten before being sent to AOSS.

from — optional. Zero-based pagination offset; default 0. Maximum reach: from + size ~10,000. For deeper pagination, switch to search_after; when a cursor is supplied from is ignored.

size — optional. Maximum number of hits to return per page. Default: 25. Maximum: 100 (larger values are silently capped). Set to 0 with HITS omitted from SearchIndexQuery.responseParts to retrieve only aggregation counts.

search_after — optional. Opaque cursor emitted as nextSearchAfter on the previous response. Pass back unchanged. Stable as long as the underlying sort is unchanged. Mutually exclusive with from 0.

Any other top-level key returns HTTP 400 naming the offender.

Per-request limits

Violations return HTTP 400 with a message naming the limit:

Per-request Limits by Surface
Surface Limit Value
query max nesting depth 20
query max total clauses 256
query max inline terms array length 1,024
aggregations max nesting depth 10
aggregations max total aggregations 100
aggregations max bucket size / shard_size 1,000
highlight max fields entries 50
highlight max number_of_fragments per field 100
highlight max fragment_size per field 1,000
collapse max max_concurrent_group_searches 10
rescore max window_size (single stage only) 1,000

The query limits above apply equally to post_filter and rescore.rescore_query. In query / post_filter / rescore.rescore_query, prefix values starting with * or ? are rejected.

Any nested highlight_query is validated against the same allowlist as query.

Disallowed clauses

The following are rejected anywhere in the body with HTTP 400:

  • Inside query / post_filter / rescore.rescore_query / highlight.highlight_query: script, script_score, function_score, more_like_this, geo_shape / shape with an indexed shape, has_child / has_parent, terms-lookup form, percolate, wrapper.
  • Inside aggregations: scripted aggregations, pipeline aggregations, embedded script.
  • Inside highlight: embedded script or indexed_shape.
  • Inside collapse: inner_hits.

Results are returned as a SearchQueryResults. Use GET /search/query/async/get/{asyncToken} to poll for results — while the job is running the GET returns HTTP 202 with a AsynchronousJobStatus.

The caller must have READ access to the source table or view that backs the SearchIndex. Row-level access is enforced automatically: rows the caller cannot read are filtered out of the results before they leave the server.

Resource URL

https://repo-prod.prod.sagebase.org/repo/v1/search/query/async/start

Resource Information
Authentication Required
Required OAuth Scopes view
HTTP Method POST
Request Object SearchIndexQuery
(application/json)
Response Object AsyncJobId
(application/json)