Skip to main content

Metadata Template

Concept

Metadata Template are reusable definitions of metadata fields created at the Topic level. They ensure information remains consistent and predictable across your content hierarchy.

Unlike custom metadata, which exist only at the document level, metadata template are structured definitions that flow through the Topic tree. They provide a shared vocabulary for describing information—flexible enough to adapt to different contexts, yet stable enough to maintain consistency throughout the structure.

Metadata Configs represent the metadata configuration associated with a selected Topic. They define which template and custom metadata fields are active and how they behave within that Topic's scope.

Attributes

Each template defines a set of attributes that control how metadata behaves and is reused:

AttributeDescription
Key NameUnique identifier for the metadata field (e.g., importance, created_date).
TypeData type of the field: string, integer, boolean, float, url, or date.
RequiredIndicates whether all documents in the Topic must have this metadata field.
FiltrableDetermines whether this field can be used to filter search results.
In RetrievalControls whether the metadata participates in retrieval pipelines.
In EmbeddingControls whether the metadata is included when generating block embeddings.
Possible ValuesList of allowed predefined values (optional).
Default ValueDefault selection; must exist within the allowed values (required when required is true).
FormatOptional regex or format pattern (required for date types).

Inheritance and Specialization

Templates are defined once at a Topic level and automatically inherit to all descendant Topics. This inheritance mechanism ensures consistency across complex content hierarchies.

  • Child Topics cannot edit or delete inherited templates.
  • Child Topics automatically inherit all templates from their ancestor Topics.

How Filtering Works

When you create a Metadata Config for a Topic with the Filtrable option enabled, you can configure searches with filters based on that metadata.

The filter capabilities are sent to the model. When a filter is applied, the semantic search operates only on the subset of documents that match the filter criteria.

Filter behavior:

  • If the Metadata Config has predefined Values configured, the model filters strictly by those values.
  • If the field allows free values, the metadata instances must contain the filter text.
  • For Date type fields, filters can be applied by exact match or by date range.

Example: University Document Management

Consider a university that organizes academic documents in a hierarchical structure. The goal is to separate general university-wide information from course-specific details while enabling intelligent search and retrieval.

Topic Hierarchy: University → Freshman Year → Calculus 1

Template Definition

At the University level (root), we define one template that applies across the entire institution:

  • Site_URL — Type: URL | Default: university.edu

At the Freshman Year level, we inherit Site_URL and add three new templates specific to first-year courses:

  • Specialization — Type: string | Possible values: Computer Science, Robotics, Chemistry, Literature
  • Date — Type: date | Format: MM/DD/YYYY
  • Core_Course — Type: boolean | Default: true

At the Calculus 1 level, we automatically inherit all templates from above. No new templates are defined here, so all documents in Calculus 1 will use:

  • Site_URL (from University)
  • Specialization (from Freshman Year)
  • Date (from Freshman Year)
  • Core_Course (from Freshman Year)

Real Document Example

Let's say we have a document called "Exam September Answer Key" stored under the Calculus 1 topic.

Since Calculus 1 inherits all templates from its parent topics, this document would have the following metadata:

FieldValueOrigin
Site_URLuniversity.edu/freshman/calculus1University
SpecializationComputer ScienceFreshman Year
Date09/24/2024Freshman Year
Core_CoursetrueFreshman Year

Search Configuration Example

Now let's see how this works in practice with AI-powered search.

Imagine you configure a search called "Freshman Year Documents" with these settings:

  • Search scope: Freshman Year topic
  • Filterable metadata:
    • Specialization
    • Date

When you activate this search for your AI Assistant, you can ask natural language questions like:

"How many students are enrolled as freshmen in Computer Science for 2025?"

The assistant will automatically:

  1. Identify that it needs to use the Freshman Year Documents search
  2. Apply filters based on your question:
    • Specialization = Computer Science
    • Date in range 01/01/2025 - 12/31/2025
  3. Retrieve relevant documents (like "2025 Computer Science Enrollment Report")
  4. Provide you with an accurate answer based on the filtered results

Benefits

Faster Information Retrieval With the filterable option enabled, semantic search operates on a smaller, targeted subset of documents. This reduces processing time and significantly improves user experience by delivering faster, more relevant results.

Embedding Enrichment When the "In Embedding" option is marked, all instances of that metadata template are included in the embedding generation process. This enriches the semantic representation of documents, leading to more accurate and context-aware search results.

Uniformized Document Metadata Metadata template eliminates the inconsistency of custom metadata, where it's easy to forget which keys were used across different documents. By enforcing shared definitions, you build more robust and maintainable solutions with predictable metadata structures.

Transparent Retrieval with "In Retrieval" The "In Retrieval" option sends raw metadata directly to the LLM, making it visible in the Troubleshoot section. This provides clear visibility into what information is being sent to the model, improving debugging and system transparency.