A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)

Workflow

A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)

Draft Latest edits on

A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark) KNIME as a powerful data anaylytics platform is able to connecct to all sorts of data bases and supports a lot of functions mainly by SQL and SQL-like languages. KNIME also supports big data platforms and related frameworks - like Spark and PySpark. There are myriads of database related topics and books and online ressources but ... What I try to do is present a collection of useful links and examples when you want to set out to use databases and KNIME abd hopefully by providing some sampel workflows that one could try out at home ease some quirks especially when it comes to dealing with big data systems and Spark (including PySpark). The 'normal' SQL examples will be with SQLite and H2 so you can easily try this at home (but they will work for other databases like Postgres or MySQL as well). The big data examples are based on KNIME's local big data environment - which is very useful if you happen not to have such a cluster at home. A good starting point in general is the official "KNIME Database Extension Guide" https://docs.knime.com/latest/db_extension_guide/index.html?u=mlauber71 => you can dive right into the examples and will see quick results, but sometimes it helps to read about the concept. ------------------------------- A - About SQL in general - a few links to refersh and learn Structured Query Language Basics https://www.nuwavesolutions.com/structured-query-language-basics/ SQL CASE WHEN - and other code in the sqltutorial https://www.sqltutorial.org/sql-case/ KNIME courses Lesson 4. Bringing Things Together - Join and Concatenation https://www.knime.com/self-paced-course/l1-dw-knime-analytics-platform-for-data-wranglers-basics/lesson4#join-concatenation?u=mlauber71 KNIME Nodeguide for Joining and Concatenating https://www.knime.com/nodeguide/etl-data-manipulation/joining-and-concatenating?u=mlauber71 ------------------------------- B - KNIME and SQL databases - going from basic to advanced Database - Simple IO (using standalone SQL-databse SQLite) https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/01_Database_Simple_IO_Example?u=mlauber71 KNIME and SQLite - simple example (with the New DB nodes) https://hub.knime.com/mlauber71/spaces/Public/latest/sqlite_knime_40/t_001_sqlite_knime_40_db_nodes?u=mlauber71 H2 - use a SQL standalone Database from Scratch or from upload https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71 H2 - Example of H2 database - handling of database structure and use of Primary Keys - insert only new lines by ID https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_primary_key?u=mlauber71 Databases - Advanced Usage (using SQLite) https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/02_Database_Advanced_Example?u=mlauber71 use H2 to produce a Position / Rank number within a group variable (window functions with new H2 JDBC drivers) https://forum.knime.com/t/sqlite-and-window-functions/31608/4?u=mlauber71 https://hub.knime.com/mlauber71/spaces/Public/latest/forum/kn_forum_31608_h2_group_rank_window_function?u=mlauber71 ------------------------------- C - KNIME and Big Data KNIME can also handle Big Data systems and databases. So you can go from handling your local or small SQL-DB right up to big enterprise systems with (eg.) Cloudera. KNIME Big Data Extensions User Guide https://docs.knime.com/latest/bigdata_extensions_user_guide/index.html?u=mlauber71 School of Hive – everything you need to know to work with Hive tables on a Big Data system https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71 => if you want to understand the concepts of Big Data tables and partitions by doing it in code steps from scratch An overview of KNIME based functions to access big data systems (with KNIME's local big data environment) https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_020_db_access_local_bigdata_tables?u=mlauber71 => see for yourself how the big data nodes work on your local computer KNIME and Hive - load multiple CSV files at once via external table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71 -- C+ Bonus Track (some more advanced big data functions with Hive like external tables demonstrated and partitions half-open code) work with Hive and external tables in CSV and Parquet KNIME and Hive - load multiple CSV files at once via external table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71 KNIME and Hive - load multiple Parquet files at once via external table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_parquet_loader/m_001_import_hive_parquet?u=mlauber71 Some more functions with Hive like adding fields to Hive table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_add_column_db_40?u=mlauber71 If you experience slow connection when working with complex (Impala) queries you could check the "Retrieve in configure" settings https://forum.knime.com/t/db-nodes-in-a-more-complex-flow-executing-very-slow/23649/2?u=mlauber71 https://docs.knime.com/latest/db_extension_guide/index.html#advanced_tab Hive - how to handle missing tables in Hive? - us a try-if combination (THX to H. Stölting for the inspiration) https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_missing_table_switch?u=mlauber71 D ------------------------------- KNIME and Spark Being Lazy is Useful — Lazy Evaluation in Spark https://medium.com/analytics-vidhya/being-lazy-is-useful-lazy-evaluation-in-spark-1f04072a3648 => you can do without and jump right into it, but it makes sense to understand this basic concept of Spark Comparison of Hive and Spark SQL - a gentle introductory example https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/07_SparkSQL_meets_HiveQL?u=mlauber71 An overview of KNIME based functions to access big data systems - use it on your own big data system (including PySpark) https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_120_db_access_bigdata_tables?u=mlauber71 => use the DB, Spark and PySpark nodes on your big data system -- going further with Spark Overview of Examples using Spark (and ML) with KNIME https://hub.knime.com/knime/spaces/Education/latest/Courses/L4-BD%20Introduction%20to%20Big%20Data%20with%20KNIME%20Analytics%20Platform/3_Spark/4_Examples/?u=mlauber71 Local Big Data Irish Meter https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/09_Big_Data_Irish_Meter_on_Spark_only?u=mlauber71 Cleaning the NYC taxi dataset on Spark https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/49_NYC_Taxi_Visualization/Data_Preparation?u=mlauber71 E ------------------------------- More database and SQL related stuff Tobias Kötter's KNIME hub space with a lot of DB related workflows https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/?u=mlauber71 If you want to transfer data between data bases without down- and uploading it all https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/DBStreamingDataTransfer?u=mlauber71 Microsoft Access is still a thing - and you can access it with KNIME https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_update_merge_ms_access?u=mlauber71 The full SQL can be found in this free eBook (not all SQL commands might work on all databases) https://goalkicker.com/SQLBook/

External resources

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.