AP-22457: Parallel Chunk Loop (End) is unnecessarily slow due to synchronous data write (columnar backend)
The issue was an unnecessary synchronization when writing the output in the Parallel Chunk End when the "Columnar Backend" was set on the workflow.
Performance comparisons for 50M rows (data generator), with a par-chunker containing a row filter removing about 2/3 of the rows:
Runtime comparison (on my system):
- Parallel Chunk, 5.2.3 : 202s
- Parallel Chunk, 5.3 Nightly: 65s
- Plain Row Filter: 35s
(no par-chunker, just for reference)
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 5.3.0 Note: Not all extensions may be displayed.
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Legal
By using or downloading the workflow, you agree to our terms and conditions.