Extract Data from Bank Statements (PDF) into JSON files with the help of Ollama / Llama3 LLM
- list PDFs or other documents (csv, txt, log) from your drive that roughly have a similar layout and you expect an LLM to be able to extract data
- formulate a concise prompt (and instruction) and try to force the LLM to give back a JSON file with always the same structure (Mistral seems to be very good at that)
- convert the single document to a Vector Store either into Chroma or Meta's FAISS with the helop of Ollama and a suitable embedding model (mxbai-embed-large)
- Use Ollama wrapper (via Python and KNIME node) to put document and query before the LLM
- collect the data back from Python into KNIME
- extract the data from JSON files, either with the help of Regex or just convert the JSON with KKNIME nodes
- make sure they have the same structure
=> you need to have Python environment installed and Ollama and you need to have the models pulled locally and Ollama running!!!
If you experience problems with the model download: Check your Proxy settings and then kill all running Ollama jobs in your task manager and try again
------
Run in Terminal window to start Ollama. You can also try and use other models (https://ollama.com). You can also just pull the model
ollama pull llama3:instruct
ollama run llama3:instruct
To get the embedding model you run this command in the terminal window
ollama pull mxbai-embed-large
Ollama and Llama3 - A Streamlit App to convert your files into local Vector Stores and chat with them using the latest LLMs
https://medium.com/p/c5340fcd6ad0
Medium - Chat with local Llama3 Model via Ollama in KNIME Analytics Platform - Also extract Logs into structured JSON Files
https://medium.com/p/aca61e4a690a
- list PDFs or other documents (csv, txt, log) from your drive that roughly have a similar layout and you expect an LLM to be able to extract data
- formulate a concise prompt (and instruction) and try to force the LLM to give back a JSON file with always the same structure (Mistral seems to be very good at that)
- convert the single document to a Vector Store either into Chroma or Meta's FAISS with the helop of Ollama and a suitable embedding model (mxbai-embed-large)
- Use Ollama wrapper (via Python and KNIME node) to put document and query before the LLM
- collect the data back from Python into KNIME
- extract the data from JSON files, either with the help of Regex or just convert the JSON with KKNIME nodes
- make sure they have the same structure
=> you need to have Python environment installed and Ollama and you need to have the models pulled locally and Ollama running!!!
If you experience problems with the model download: Check your Proxy settings and then kill all running Ollama jobs in your task manager and try again
------
Run in Terminal window to start Ollama. You can also try and use other models (https://ollama.com). You can also just pull the model
ollama pull llama3:instruct
ollama run llama3:instruct
To get the embedding model you run this command in the terminal window
ollama pull mxbai-embed-large
Ollama and Llama3 - A Streamlit App to convert your files into local Vector Stores and chat with them using the latest LLMs
https://medium.com/p/c5340fcd6ad0
Medium - Chat with local Llama3 Model via Ollama in KNIME Analytics Platform - Also extract Logs into structured JSON Files
https://medium.com/p/aca61e4a690a