Hub
Pricing About
ComponentComponent

String Emoji and Character Class Filter

takbb profile image
Draft Latest edits on 
Sep 23, 2022 6:23 PM
Drag & drop
Like
Use or download
String Character Class and Emoji Filter Filter out emoji and other classes of characters from a string using a built in Regular Expression. This component can be used as a replacement to my original String Emoji Filter. v1 - 17 September 2022 @takbb Brian Bates v1.1 23 September 2022 - bug fix - "Connector Punctuation" was removing spaces For examples of the different "Unicode Character Classes" see https://www.compart.com/en/unicode/category This uses a Java Snippet with a java regex replaceall call, and inbuilt regular expressions to filter characters from a string based on pre-defined classes. Choose the class, or classes of characters to be filtered from ths list provided. The filter converts the selected class names into regex character classes and then removes these using a java snippet. Additional character classes and/or regex patterns may be added over time. Please let me know if specific character classes don't appear to work. This is based on a subset of the character classes described here: https://www.regular-expressions.info/unicode.html in the section "Unicode Categories" The list of categories that are currently implemented, with their unicode equivalent are here. Please see regex documentation on the internet to describe those categories. As this component uses Java, it is the java implementation of these regex patterns that is being utilised. Unassigned Characters \p{Cn} Formatting Indicators \p{Cf} Control Characters \p{C}\p{Cc} Half of UTF-16 Surrogate pair \p{Cs} Codepoints Reserved for Private Use \p{Co} Punctuation - Other \p{Po} Symbols \p{S} Symbols - Emoji and Other \p{So} Symbols - Currency \p{Sc} Symbols - Modifiers \p{Sk} Mathematical Symbols \p{Sm} Letters \p{L} Letters - Upper Case \p{Lu} Letters - Lower case \p{Ll} Numbers \p{N} Character Marks \p{M} Non Spacing Marks \p{Mn} Enclosing Marks \p{Me} Separators \p{Z} Space Separator \p{Zs} Line Separator \p{Zl} Paragraph Separators \p{Zp} Other Numbers (e.g. superscript digits) \p{No} Punctuation \p{P} Dash Punctuation \p{Pd} Connector Punctuation \p{Pc} Please contact @takbb on the forum if you have suggestions for improvements or additional useful filter-classes

Component details

Input ports
  1. Type: Table
    Data Table In
    The data table to be filtered
Output ports
  1. Type: Table
    Filtered Data
    The data table with filtering applied

Used extensions & nodes

Created with KNIME Analytics Platform version 4.6.1
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.1

    knime
  • Go to item
    KNIME JavasnippetTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    knime
  • Go to item
    KNIME Quick FormsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    knime

This component does not have nodes, extensions, nested components and related workflows

Legal

By using or downloading the component, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits