- Title: Auto-Batching
# 1. Summary
Meilisearch can automatically group consecutive asynchronous
documentPartial tasks for the same index via an automatic batching mechanism.
The user can disable this auto-batching behavior. See 3.2. Auto-batching mechanisms options section.
# 2. Motivation
We have regularly collected user pain points pointing out the slow indexing over the last year. We explained several times to users to make batches containing a maximum of documents to be updated/added to compress the indexing time of specific data structures.
To make Meilisearch easier to use, we explored the idea of automatically creating these batches within Meilisearch before indexing users’ documents.
# 3. Functional Specification
# 3.1. Explanations
A batch preserves the logical order of the tasks for a given index.
documentAdditionOrUpdate tasks for the same index can be in the same batch. All
tasks concerning other operations will also be part of a batch having only one task.
# 3.1.1. Grouping tasks to a single batch
The scheduling program that groups tasks within a single batch is triggered when an asynchronous
task currently processed reaches a terminal state as
In other words, when a scheduled
documentAdditionOrUpdate task for a given index is picked from the task queue, the scheduler fetches and groups all
documentAdditionOrUpdate tasks for that same index in a batch.
The more similar consecutive tasks the user sends in a row, the more likely the batching mechanism can group these tasks.
# 220.127.116.11. Schema
All tasks are part of a batch identified by an internal
batchUid field. A task batch preserves the logical order of the tasks for a given index. The batch identifiers are unique and strictly increasing. The
batchUid field is internal; thus not visible on a
# 3.1.2. Impacts on
task API resource
- The different tasks grouped in a batch are processed within the same transaction. But if a task fails within a batch, the whole batch does not fail, only the related task.
- Tasks within the same batch share the same values for the
durationfields, and the same
errorobject if an error occurs for a
taskduring the batch processing.
- If a batch contains many
indexedDocumentsis identical in all
tasksbelonging to the same processed
# 4. Technical Aspects
# 5. Future Possibilities
- Extends it for all consecutive payload types.
- Expose the
batchUidfield and add a filter capability on it on the
- Report the documents that could not be indexed to the user in a more precise manner.
- Optimize some tasks sequence, for example if there is a document addition followed by an index deletion, we could skip the document addition.