This workflow demonstrates several methods to import one or many CSV file into Hive
Demonstrated are direct Uploads where you create a Hive table with KNIME nodes. Or you copy your file to an /upload/ folder and point an external table to them. If they all have the same structure they will be read by Hive. You can then use this external file to further handel your files.
If the fies are very large you might have to use partitions. In the KNIME installemnt of the Hive drivers there is a problem with the headers of the CSV files. It is also demonstrated how to get rid of them.
Please familiarize yourself with the concepts of big data and partitions in order to use this. And please note. KNIME's local big data environment is just there to demonstrate the usage. It might work with your large files but it is called Big Data for areason ....
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71
External resources
- A meta collection about KNIME and performance and performance tuning and some problems
- Processing hundreds of millions of records
- forum entry
- School of Hive - with KNIME's local Big Data environment (SQL for Big Data)
- A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.1
Note: Not all extensions may be displayed.
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Loading deployments
Loading ad hoc jobs
Legal
By using or downloading the workflow, you agree to our terms and conditions.