The Open API specification for Synapse is now available for download!

Download Open API Spec

TextAnalyzer

org.sagebionetworks.repo.model.search.table.TextAnalyzer

A shareable, named OpenSearch custom analyzer. The settings field is an opaque JSON object that holds the contents of the settings.analysis block of an OpenSearch create-index request body — not the outer settings wrapper, and not the full create-index body. The allowed root keys are char_filter, tokenizer, filter, and analyzer; index-level keys (index.max_ngram_diff, index.refresh_interval, analysis.normalizer, etc.) are out of scope here.

Synapse parses the JSON, enforces the addressability contract below, and resolves $ref entries against existing SynonymSets at index-build time. AOSS validates the rest of the analyzer shape (component types, parameters, chain ordering) at index-build time.

One TextAnalyzer record = one externally-addressable analyzer. The inner analyzer map must declare exactly one entry named default (required), and may optionally declare a second entry named default_search. Any other entry inside analyzer is rejected at create / update time. Curators who need separate analyzers (e.g. headline + body) create separate TextAnalyzer records — each TextAnalyzer record is itself shareable across SearchConfigurations and organizations, so this does not duplicate the addressing model.

analyzer.default (required): the entry SearchConfiguration / ColumnAnalyzerOverride bind to when they reference this TextAnalyzer by qualified name. When this TextAnalyzer is the defaultAnalyzer of a SearchConfiguration, this entry lands at the index's analysis.analyzer.default reserved slot (see OpenSearch index analyzers).

analyzer.default_search (optional): declare alongside analyzer.default when index-time and search-time should produce different token streams (the typical edge_ngram autocomplete pattern). When this TextAnalyzer is a SearchConfiguration's defaultAnalyzer, this entry lands at the index's analysis.analyzer.default_search reserved slot per OpenSearch's search-analyzer precedence.

Example (the settings field is a JSON object — paste OpenSearch examples directly without escaping):

{
  "organizationName": "biomed",
  "name": "publications",
  "settings": {
    "char_filter": { "strip_html": { "type": "html_strip" } },
    "tokenizer":   { "std": { "type": "standard" } },
    "filter": {
      "english_stop": { "type": "stop", "stopwords": "_english_" },
      "med_syn":      { "$ref": "biomed-medical_terms" }
    },
    "analyzer": {
      "default": {
        "type": "custom",
        "tokenizer": "std",
        "char_filter": ["strip_html"],
        "filter": ["lowercase", "med_syn", "english_stop"],
        "position_increment_gap": 100
      }
    }
  }
}

Cross-resource references — $ref: anywhere in the analyzer's filter registry map you may write {"$ref": "{organizationName}-{name}"} in place of an inline filter definition. The reference must resolve to a SynonymSet by qualified name. $ref values are resolved at index-build time; cycles are rejected. $ref entries are NOT permitted inside chain arrays (analyzer.*.filter, analyzer.*.char_filter) — those stay as plain string arrays mirroring OpenSearch.

Reference this analyzer from a SearchConfiguration's defaultAnalyzer field or from a ColumnAnalyzerOverride entry, using its qualified name {organizationName}-{name}.

Field Type Description
id STRING The unique ID of this text analyzer.
organizationName STRING The name of the Organization this resource belongs to. Immutable after creation.
name STRING The resource name. Must start with a letter and contain only letters, digits, and underscores. Unique within the organization and immutable after creation. Used as part of the qualified name ({organizationName}-{name}) when referenced by other resources.
description STRING Optional description.
settings OBJECT

Required. JSON object holding the contents of the settings.analysis block of an OpenSearch create-index request body — see also the OpenSearch text analysis overview. Do not wrap the value in an outer {"settings": {"analysis": ...}} envelope; the allowed root keys are char_filter, tokenizer, filter, and analyzer. Index-level keys such as index.max_ngram_diff, index.refresh_interval, and analysis.normalizer are out of scope here.

The inner analyzer map must declare exactly one entry named default (required), and may optionally declare a second entry named default_search; any other entry under analyzer is rejected at create / update time. See the type-level description for the rationale.

Synapse parses this object and resolves any {"$ref": "{org}-{name}"} entries against existing SynonymSets at index-build time; everything else passes through to AOSS verbatim, including custom analyzer shape, parameter names, and built-in component types. Per AOSS Serverless, file-based parameters (any *_path key) will be rejected at index-build time — use the inline equivalents (stopwords, synonyms, mappings, protected_words).

etag STRING Synapse employs an Optimistic Concurrency Control (OCC) scheme.
createdOn STRING The date this resource was created.
createdBy STRING The ID of the user that created this resource.
modifiedOn STRING The date this resource was last modified.
modifiedBy STRING The ID of the user that last modified this resource.