Hive/Big Data - a simple "self-healing" (automated) ETL or analytics system on a big data cluster

Workflow

Hive/Big Data - a simple "self-healing" (automated) ETL or analytics system on a big data cluster

Draft Latest edits on

Hive/Big Data - a simple "self-healing" (automated) ETL or analytics system on a big data cluster The scenario: you have a table on a big data system with daily data ("default.db_main_table") partitioned by d_date and you want to do some reporting (the number of lines per day) stored in a new table "default.db_analytics" The system is a big data system and you want to do this with partitioned Hive tables The main workflow should be able to be run several times per day and then do the report if it has not been done yet. If a day is missing it will do the job again until it is finished. You could do this by hand or schedule that on the KNIME server.

External resources

School of Hive - with KNIME's local Big Data environment (SQL for Big Data)

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.