POST /search/query/async/start
Start an asynchronous search query job against a SearchIndex.
The request wraps a
SearchQuery — the
top-level OpenSearch _search
body, allowlist-validated server-side and submitted to AOSS. Each slot's contents are
pass-through OpenSearch DSL.
Required: query is required (use {"match_all":{}} to
match all documents). from and search_after are mutually
exclusive: when search_after is supplied the server pins from=0
internally; supplying both search_after and from 0 is
rejected with HTTP 400.
Two pagination modes. Use from + size for the simple
"jump to page N" case. Use search_after for deep pagination past the
OpenSearch from + size ~10,000-row ceiling: omit
search_after on the first request and on every subsequent request pass
back the previous response's nextSearchAfter unchanged. Cursors are
stable as long as the underlying sort is unchanged. For shallow paging within the
first ~10,000 rows, prefer from + size.
Field references and sub-field routing
Field references use column names everywhere (DSL clauses, aggregation
field, highlight.fields keys,
sort, _source.includes / _source.excludes). The
server resolves names to internal column ids before sending to OpenSearch and rewrites
them back to column names on response so callers see their original schema.
Pass the bare column name in every clause. The server knows the index schema and
routes text-typed columns (STRING, STRING_LIST, MEDIUMTEXT, LARGETEXT, LINK) through
{column}.keyword automatically when the operation requires it:
term / terms / prefix / wildcard /
fuzzy / range / match_phrase_prefix, every
aggregation kind, sort, and collapse. The relevance-scored
match-family clauses (match, multi_match,
match_phrase, match_bool_prefix,
simple_query_string) use the analyzed text field directly. Numeric,
boolean, keyword (ENTITYID / USERID), and date columns always use the bare name.
Aggregation results come back with field references reported as the
caller's bare column name — the server strips the .keyword suffix it
auto-appended on the request side. Callers who prefer to be explicit may still supply
{columnName}.keyword on a reference; the server preserves the suffix
verbatim. The {field}^{boost} form on multi_match.fields is
also preserved.
Allowlisted top-level keys
query — required. See the
OpenSearch query DSL.
Compound
(bool
/ dis_max
/ constant_score
/ boosting)
and leaf (match
/ multi_match
/ match_phrase
/ match_phrase_prefix
/ match_bool_prefix
/ term
/ terms
/ range
/ exists
/ prefix
/ wildcard
/ fuzzy
/ simple_query_string
/ match_all)
clauses. The server wraps the supplied subtree as a
must clause inside its own bool.
post_filter — optional. Same DSL shape as query, applied
after aggregations are computed: aggregations see the unfiltered population
(matched by query) while the returned hits are narrowed by
post_filter. For filters that should also constrain aggregations, place
them inside query.bool.filter instead.
aggregations — optional. Map of
caller-chosen name to aggregation
definition. Supports terms
/ histogram
/ date_histogram
/ range
/ date_range
/ min
/ max
/ avg
/ sum
/ stats
/ extended_stats
/ value_count
/ cardinality
/ missing,
with nested
sub-aggregations. Aggregations need doc values; text-typed columns are auto-routed
through .keyword. The raw aggregation result
comes back on SearchQueryResults.aggregationResults, with field references
rewritten back to bare column names.
highlight — optional. Adds per-field
snippet fragments
with matched
terms wrapped in em / /em tags (configurable
via pre_tags / post_tags) to each
SearchQueryResults.hits[*].highlights entry. highlight.fields
keys are caller column names. Allowlisted highlighter types: unified
(default), plain, fvh; semantic is rejected.
collapse — optional.
Groups the result list
so only one hit is
returned per distinct value of field. Collapse needs doc values;
text-typed columns are auto-routed through .keyword.
inner_hits is rejected.
rescore — optional.
Re-ranks
the top window_size hits
returned by query using a secondary, typically more expensive, scoring
query. The original ranking is preserved past the rescore window. The inner
rescore_query is validated against the same allowlist as
query.
sort — optional. OpenSearch
sort
shape (string column name,
{column: "asc|desc"}, or {column: {order: ...}}). The
pseudo-column _score sorts by relevance. When omitted, results are sorted
by relevance descending (_score DESC). Text-typed columns are auto-routed
through .keyword.
_source — optional.
Source filter.
Accepts the full OpenSearch
SourceConfig shape: a boolean (false to omit
_source entirely), an array of column-name patterns (shorthand for
{includes: [...]}), or {includes: [...], excludes: [...]}.
Names are column-name → column-id rewritten before being sent to AOSS.
from — optional. Zero-based
pagination
offset; default
0. Maximum reach: from + size ~10,000. For deeper
pagination, switch to search_after; when a cursor is supplied
from is ignored.
size — optional. Maximum number of hits to return per page.
Default: 25. Maximum: 100 (larger values are silently capped). Set to 0 with HITS
omitted from SearchIndexQuery.responseParts to retrieve only aggregation
counts.
search_after — optional. Opaque
cursor
emitted as
nextSearchAfter on the previous response. Pass back unchanged. Stable as
long as the underlying sort is unchanged. Mutually exclusive with
from 0.
Any other top-level key returns HTTP 400 naming the offender.
Per-request limits
Violations return HTTP 400 with a message naming the limit:
Per-request Limits by Surface| Surface | Limit | Value |
|---|---|---|
query |
max nesting depth | 20 |
query |
max total clauses | 256 |
query |
max inline terms array length |
1,024 |
aggregations |
max nesting depth | 10 |
aggregations |
max total aggregations | 100 |
aggregations |
max bucket size / shard_size |
1,000 |
highlight |
max fields entries |
50 |
highlight |
max number_of_fragments per field |
100 |
highlight |
max fragment_size per field |
1,000 |
collapse |
max max_concurrent_group_searches |
10 |
rescore |
max window_size (single stage only) |
1,000 |
The query limits above apply equally to post_filter and
rescore.rescore_query. In query / post_filter /
rescore.rescore_query, prefix values starting with
* or ? are rejected.
Any nested highlight_query is validated against the same allowlist as
query.
Disallowed clauses
The following are rejected anywhere in the body with HTTP 400:
- Inside
query/post_filter/rescore.rescore_query/highlight.highlight_query:script,script_score,function_score,more_like_this,geo_shape/shapewith an indexed shape,has_child/has_parent,terms-lookup form,percolate,wrapper. - Inside
aggregations: scripted aggregations, pipeline aggregations, embeddedscript. - Inside
highlight: embeddedscriptorindexed_shape. - Inside
collapse:inner_hits.
Results are returned as a SearchQueryResults. Use GET /search/query/async/get/{asyncToken} to poll for results — while the job is running the GET returns HTTP 202 with a AsynchronousJobStatus.
The caller must have READ access to the source table or view that backs the
SearchIndex. Row-level access is enforced automatically: rows the caller cannot read are
filtered out of the results before they leave the server.
Resource URL
https://repo-prod.prod.sagebase.org/repo/v1/search/query/async/start
| Resource Information | |
|---|---|
| Authentication | Required |
| Required OAuth Scopes | view |
| HTTP Method | POST |
| Request Object | SearchIndexQuery (application/json) |
| Response Object | AsyncJobId (application/json) |