TextAnalyzer
org.sagebionetworks.repo.model.search.table.TextAnalyzer
A shareable, named OpenSearch custom analyzer. The settings field is an opaque JSON object that holds the contents of the settings.analysis block of an OpenSearch create-index request body — not the outer settings wrapper, and not the full create-index body. The allowed root keys are char_filter, tokenizer, filter, and analyzer; index-level keys (index.max_ngram_diff, index.refresh_interval, analysis.normalizer, etc.) are out of scope here.
Synapse parses the JSON, enforces the addressability contract below, and resolves $ref entries against existing SynonymSets at index-build time. AOSS validates the rest of the analyzer shape (component types, parameters, chain ordering) at index-build time.
One TextAnalyzer record = one externally-addressable analyzer. The inner analyzer map must declare exactly one entry named default (required), and may optionally declare a second entry named default_search. Any other entry inside analyzer is rejected at create / update time. Curators who need separate analyzers (e.g. headline + body) create separate TextAnalyzer records — each TextAnalyzer record is itself shareable across SearchConfigurations and organizations, so this does not duplicate the addressing model.
analyzer.default (required): the entry SearchConfiguration / ColumnAnalyzerOverride bind to when they reference this TextAnalyzer by qualified name. When this TextAnalyzer is the defaultAnalyzer of a SearchConfiguration, this entry lands at the index's analysis.analyzer.default reserved slot (see OpenSearch index analyzers).
analyzer.default_search (optional): declare alongside analyzer.default when index-time and search-time should produce different token streams (the typical edge_ngram autocomplete pattern). When this TextAnalyzer is a SearchConfiguration's defaultAnalyzer, this entry lands at the index's analysis.analyzer.default_search reserved slot per OpenSearch's search-analyzer precedence.
Example (the settings field is a JSON object — paste OpenSearch examples directly without escaping):
{
"organizationName": "biomed",
"name": "publications",
"settings": {
"char_filter": { "strip_html": { "type": "html_strip" } },
"tokenizer": { "std": { "type": "standard" } },
"filter": {
"english_stop": { "type": "stop", "stopwords": "_english_" },
"med_syn": { "$ref": "biomed-medical_terms" }
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "std",
"char_filter": ["strip_html"],
"filter": ["lowercase", "med_syn", "english_stop"],
"position_increment_gap": 100
}
}
}
}Cross-resource references — $ref: anywhere in the analyzer's filter registry map you may write {"$ref": "{organizationName}-{name}"} in place of an inline filter definition. The reference must resolve to a SynonymSet by qualified name. $ref values are resolved at index-build time; cycles are rejected. $ref entries are NOT permitted inside chain arrays (analyzer.*.filter, analyzer.*.char_filter) — those stay as plain string arrays mirroring OpenSearch.
Reference this analyzer from a SearchConfiguration's defaultAnalyzer field or from a ColumnAnalyzerOverride entry, using its qualified name {organizationName}-{name}.
| Field | Type | Description |
|---|---|---|
| id | STRING | The unique ID of this text analyzer. |
| organizationName | STRING | The name of the Organization this resource belongs to. Immutable after creation. |
| name | STRING | The resource name. Must start with a letter and contain only letters, digits, and underscores. Unique within the organization and immutable after creation. Used as part of the qualified name ({organizationName}-{name}) when referenced by other resources. |
| description | STRING | Optional description. |
| settings | OBJECT | Required. JSON object holding the contents of the The inner Synapse parses this object and resolves any |
| etag | STRING | Synapse employs an Optimistic Concurrency Control (OCC) scheme. |
| createdOn | STRING | The date this resource was created. |
| createdBy | STRING | The ID of the user that created this resource. |
| modifiedOn | STRING | The date this resource was last modified. |
| modifiedBy | STRING | The ID of the user that last modified this resource. |