Hub
Pricing About
WorkflowWorkflow

JKISeason 4-29 - Preparing Your Office Equipment Data

JKISeason4-29
trj profile image
Draft Latest edits on 
Dec 8, 2025 10:42 AM
Drag & drop
Like
Download workflow
Workflow preview

Challenge 29: Preparing Your Office Equipment Data

Level: Hard

Description:
After assessing the quality of some scraped product and review data, your company decides to move forward with the office upgrade project. However, before any purchase decisions can be made, the data needs some serious cleaning and wrangling. The team wants to compare prices, identify the best products by category, and understand where the product catalog might be lacking — yet the current data is messy, inconsistent, and full of oddities. Your task is to transform this raw data into an analysis-ready format: you will clean and enrich both product and review datasets, engineer meaningful features, detect and remove outliers, and even use LLMs to help normalize product categories from titles against a curated list. You should end up with a polished dataset that a merchandising or analytics team could use to explore pricing trends, visualize category gaps, and confidently prepare for the big office upgrade.

Beginner-friendly objective(s): 1. Load the two Excel sheets (product details and product reviews) and perform an initial cleaning: extract product IDs (ASINs) from URLs, convert price and ratings to numeric format, and normalize category labels. 2. Parse review metadata: split country and date from a combined field, convert dates to a proper Date type, derive year/month information and compute per-review text length.

Intermediate-friendly objective(s): 1. Remove extreme text-length outliers. 2. Drop rows with missing price, fill missing brand values, and prepare a tidy table ready for analysis and visualization. 3. Visualize price by category and remove within-category price outliers using IQR-based rules; compute domain statistics and rank products by rating, rating count, and price. 4. Identify under-represented categories and assemble a categories list (as a flow variable) to guide reclassification of ambiguous items.

Advanced objective(s): 1. Build LLM prompts from product titles and your categories list; generate a single best-fit category response per item. 2. Replace uncertain or targeted categories with the LLM’s output and recombine with the rest of the catalog to deliver a coherent, reclassified dataset for downstream reporting.

Author: Armin Ghassemi Rudd

Dataset: Office product data on KNIME Community Hub

Remember to upload your solution with tag JKISeason4-29 to your public space on KNIME Community Hub. To increase the visibility of your solution, also post it to this challenge thread on KNIME Forum.t it to this challenge thread on KNIME Forum.

External resources

  • Just KNIME It!
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.8.0
  • Go to item
    KNIME AI ExtensionTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.2

    knime
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME ExpressionsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime
  • Go to item
    KNIME Math Expression (JEP)Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME Quick FormsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.2

    knime
  • Go to item
    KNIME Statistics NodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME ViewsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits