Skip to content
Cloudflare Docs

Metadata filtering

Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter.

Here is an example of metadata filtering using Workers Binding but it can be easily adapted to use the REST API instead.

const answer = await env.AI.autorag("my-autorag").search({
query: "How do I train a llama to deliver coffee?",
filters: {
type: "and",
filters: [
{
type: "eq",
key: "folder",
value: "llama/logistics/",
},
{
type: "gte",
key: "timestamp",
value: "1735689600000", // unix timestamp for 2025-01-01
},
],
},
});

Metadata attributes

You can currently filter by the folder and timestamp of an R2 object. Currently, custom metadata attributes are not supported.

Folder

The directory to the object. For example, the folder of the object at llama/logistics/llama-logistics.mdx is llama/logistics/. Note that the folder does not include a leading /.

Note that folder filter only includes files exactly in that folder, so files in subdirectories are not included. For example, specifying folder: "llama/" will match files in llama/ but does not match files in llama/logistics.

Timestamp

The timestamp indicating when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). For example, 1735689600999 or 2025-01-01 00:00:00.999 UTC will be rounded down to 1735689600000, corresponding to 2025-01-01 00:00:00 UTC.

Filter schema

You can create simple comparison filters or an array of comparison filters using a compound filter.

Comparison filter

You can compare a metadata attribute (for example, folder or timestamp) with a target value using a comparison filter.

filters: {
type: "operator",
key: "metadata_attribute",
value: "target_value"
}

The available operators for the comparison are:

OperatorDescription
eqEquals
neNot equals
gtGreater than
gteGreater than or equals to
ltLess than
lteLess than or equals to

Compound filter

You can use a compound filter to combine multiple comparison filters with a logical operator.

filters: {
type: "compound_operator",
filters: [...]
}

The available compound operators are: and, or.

Note the following limitations with the compound operators:

  • No nesting combinations of and's and or's, meaning you can only pick 1 and or 1 or.
  • When using or:
    • Only the eq operator is allowed.
    • All conditions must filter on the same key (for example, all on folder)

"Starts with" filter for folders

You can use "starts with" filtering on the folder metadata attribute to search for all files and subfolders within a specific path.

For example, consider this file structure:

  • Directorycustomer-a
    • profile.md
    • Directorycontracts
      • Directoryproperty
        • contract-1.pdf

If you were to filter using an eq (equals) operator with value: "customer-a/", it would only match files directly within that folder, like profile.md. It would not include files in subfolders like customer-a/contracts/.

To recursively filter for all items starting with the path customer-a/, you can use the following compound filter:

filters: {
type: "and",
filters: [
{
type: "gt",
key: "folder",
value: "customer-a//",
},
{
type: "lte",
key: "folder",
value: "customer-a/z",
},
],
},

This filter identifies paths starting with customer-a/ by using:

  • The and condition to combine the effects of the gt and lte conditions.
  • The gt condition to include paths greater than the / ASCII character.
  • The lte condition to include paths less than and including the lower case z ASCII character.

Together, these conditions effectively select paths that begin with the provided path value.

Response

You can see the metadata attributes of your retrieved data in the response under the property attributes for each retrieved chunk. For example:

"data": [
{
"file_id": "llama001",
"filename": "llama/logistics/llama-logistics.md",
"score": 0.45,
"attributes": {
"timestamp": 1735689600000, // unix timestamp for 2025-01-01
"folder": "llama/logistics/",
},
"content": [
{
"id": "llama001",
"type": "text",
"text": "Llamas can carry 3 drinks max."
}
]
}
]