Cache

Node / Other

Cache

The Cache node materializes and caches the input table in a data processing workflow. This node is useful after a sequence of preprocessing steps, especially when these steps involve column transformations, such as removing, manipulating, or adding new columns.

In workflows involving multiple transformation nodes, only the modified data (e.g., added columns) is stored, while the unmodified columns reference the input data. Although this approach optimizes the execution and data caching for individual nodes, it can result in tables that are composites of multiple nested tables. Consequently, iterating over such composite tables may be less efficient compared to iterating over a single, unified table.

The Cache node addresses this by materializing the input data, creating a self-contained table that consolidates all columns. Additionally, the Cache node is useful in scenarios where portions of a workflow are executed in streaming mode, as it allows data to be staged at specific points. This staging facilitates inspection and debugging, providing a snapshot of the data at the desired point in the workflow.

Node details

Ports Options Views

Input ports

Type: Table
Input table.
Input table to cache.

Output ports

Type: Table
Cached Table
As input table, only cached.

Extension

The Cache node is part of this extension:

Go to item

Cache

Node details

Input ports

Output ports

Extension

Related workflows & nodes