TextAnalyzer

org.sagebionetworks.repo.model.search.table.TextAnalyzer

A shareable, named OpenSearch custom analyzer. The settings field is an opaque JSON object that holds the contents of the settings.analysis block of an OpenSearch create-index request body — not the outer settings wrapper, and not the full create-index body. The allowed root keys are char_filter, tokenizer, filter, and analyzer; index-level keys (index.max_ngram_diff, index.refresh_interval, analysis.normalizer, etc.) are out of scope here.

Synapse parses the JSON, enforces the addressability contract below, and resolves $ref entries against existing SynonymSets at index-build time. AOSS validates the rest of the analyzer shape (component types, parameters, chain ordering) at index-build time.

One TextAnalyzer record = one externally-addressable analyzer. The inner analyzer map must declare exactly one entry named default (required), and may optionally declare a second entry named default_search. Any other entry inside analyzer is rejected at create / update time. Curators who need separate analyzers (e.g. headline + body) create separate TextAnalyzer records — each TextAnalyzer record is itself shareable across SearchConfigurations and organizations, so this does not duplicate the addressing model.

analyzer.default (required): the entry SearchConfiguration / ColumnAnalyzerOverride bind to when they reference this TextAnalyzer by qualified name. When this TextAnalyzer is the defaultAnalyzer of a SearchConfiguration, this entry lands at the index's analysis.analyzer.default reserved slot (see OpenSearch index analyzers).

analyzer.default_search (optional): declare alongside analyzer.default when index-time and search-time should produce different token streams (the typical edge_ngram autocomplete pattern). When this TextAnalyzer is a SearchConfiguration's defaultAnalyzer, this entry lands at the index's analysis.analyzer.default_search reserved slot per OpenSearch's search-analyzer precedence.

Example (the settings field is a JSON object — paste OpenSearch examples directly without escaping):

{
  "organizationName": "biomed",
  "name": "publications",
  "settings": {
    "char_filter": { "strip_html": { "type": "html_strip" } },
    "tokenizer":   { "std": { "type": "standard" } },
    "filter": {
      "english_stop": { "type": "stop", "stopwords": "_english_" },
      "med_syn":      { "$ref": "biomed-medical_terms" }
    },
    "analyzer": {
      "default": {
        "type": "custom",
        "tokenizer": "std",
        "char_filter": ["strip_html"],
        "filter": ["lowercase", "med_syn", "english_stop"],
        "position_increment_gap": 100
      }
    }
  }
}

Cross-resource references — $ref: anywhere in the analyzer's filter registry map you may write {"$ref": "{organizationName}-{name}"} in place of an inline filter definition. The reference must resolve to a SynonymSet by qualified name. $ref values are resolved at index-build time; cycles are rejected. $ref entries are NOT permitted inside chain arrays (analyzer.*.filter, analyzer.*.char_filter) — those stay as plain string arrays mirroring OpenSearch.

Reference this analyzer from a SearchConfiguration's defaultAnalyzer field or from a ColumnAnalyzerOverride entry, using its qualified name {organizationName}-{name}.

Field	Type	Description
id	STRING	The unique ID of this text analyzer.
organizationName	STRING	The name of the Organization this resource belongs to. Immutable after creation.
name	STRING	The resource name. Must start with a letter and contain only letters, digits, and underscores. Unique within the organization and immutable after creation. Used as part of the qualified name ({organizationName}-{name}) when referenced by other resources.
description	STRING	Optional description.
settings	OBJECT	Required. JSON object holding the contents of the `settings.analysis` block of an OpenSearch create-index request body — see also the OpenSearch text analysis overview. Do not wrap the value in an outer `{"settings": {"analysis": ...}}` envelope; the allowed root keys are `char_filter`, `tokenizer`, `filter`, and `analyzer`. Index-level keys such as `index.max_ngram_diff`, `index.refresh_interval`, and `analysis.normalizer` are out of scope here. The inner `analyzer` map must declare exactly one entry named `default` (required), and may optionally declare a second entry named `default_search`; any other entry under `analyzer` is rejected at create / update time. See the type-level description for the rationale. Synapse parses this object and resolves any `{"$ref": "{org}-{name}"}` entries against existing SynonymSets at index-build time; everything else passes through to AOSS verbatim, including custom analyzer shape, parameter names, and built-in component types. Per AOSS Serverless, file-based parameters (any `*_path` key) will be rejected at index-build time — use the inline equivalents (`stopwords`, `synonyms`, `mappings`, `protected_words`).
etag	STRING	Synapse employs an Optimistic Concurrency Control (OCC) scheme.
createdOn	STRING	The date this resource was created.
createdBy	STRING	The ID of the user that created this resource.
modifiedOn	STRING	The date this resource was last modified.
modifiedBy	STRING	The ID of the user that last modified this resource.