Hive/Big Data - a simple "self-healing" (automated) ETL or analytics system on a big data cluster
The scenario: you have a table on a big data system with daily data ("default.db_main_table") partitioned by d_date and you want to do some reporting (the number of lines per day) stored in a new table "default.db_analytics"
The system is a big data system and you want to do this with partitioned Hive tables
The main workflow should be able to be run several times per day and then do the report if it has not been done yet. If a day is missing it will do the job again until it is finished. You could do this by hand or schedule that on the KNIME server.
Workflow
Hive/Big Data - a simple "self-healing" (automated) ETL or analytics system on a big data cluster
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.5.1
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Legal
By using or downloading the workflow, you agree to our terms and conditions.