Hub
Pricing About
NodeNode / Manipulator

String Splitter (Regex)

ManipulationColumnSplit & CombineStreamable

This node is available for KNIME Analytics Platform version 5.3.0 or higher.

Drag & drop
Like

This node splits the string content of a selected column into logical groups using regular expressions. A capturing group is usually identified by a pair of parentheses, whereby the pattern in such parentheses is a regular expression. Optionally, a group can be named. See Pattern for more information. For each input, the capture groups are the output values. Those can be appended to the table in different ways; by default, every group will correspond to one additional output column.

A short introduction to groups and capturing is given in the Java API . Some examples are given below:

Parsing Patent Numbers

Patent identifiers such as "US5443036-X21" consisting of a (at most) two-letter country code ("US"), a patent number ("5443036") and possibly some application code ("X21"), which is separated by a dash or a space character, can be grouped by the expression ([A-Za-z]{1,2})([0-9]+)[ \-]?(.*$) . Each of the parenthesized terms corresponds to the aforementioned properties. For named output columns, we can add group names to the pattern:

  • (?<CC>[A-Za-z]{1,2}) is now identified with "CC" in the output.
  • (?<patentNumber>[0-9]+) is now identified with "patentNumber".
  • [ \-]? is and was never a capturing group so it remains unchanged.
  • (?<applicationCode>.*$) is now identified with "applicationCode".
Named and unnamed groups can also be mixed in one pattern.

Strip File URLs

This is particularly useful when this node is used to parse the file URL of a file reader node (the URL is exposed as a flow variable and then exported to a table using a Variable to Table node). The format of such URLs is similar to "file:c:\some\directory\foo.csv". Using the pattern [A-Za-z]*:(.*[/\\])(?<filename>([^\.]*)\.(.*$)) generates four groups: The first group identifies the directory and is denoted by (.*[/\\]) . It consumes all characters until a final slash or backslash is encountered; in the example, this refers to "c:\some\directory\". The second group represents the file name, whereby it encapsulates the third and fourth group. The third group ( ([^\.]*) ) consumes all characters after the directory, which are not a dot '.' (which is "foo" in the above example). The pattern expects a single dot (final which is ignored) and finally the fourth group (.*$) , which reads until the end of the string and indicates the file suffix ('csv'). The groups for the above example are

  1. Group 1 : c:\some\directory
  2. Group filename : foo.csv
  3. Group 3 : foo
  4. Group 4 : csv

Email Address Extraction

Let's consider a scenario where you have a list of email addresses. Using the pattern (?<username>.+)@(?<domain>.+) , you can extract the username and domain from the addresses. The groups for the email address "john.doe@example.com" are:

  • Group username : john.doe
  • Group domain : example.com

Node details

Input ports
  1. Type: Table
    Data Table
    Input table with string column to be split.
Output ports
  1. Type: Table
    Input with split columns
    Input table with additional column(s) and potentially duplicated rows representing the pattern groups.See "Output matched groups as" for more details.

Extension

The String Splitter (Regex) node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits