Hub
Pricing About
ComponentComponent

String Emoji Filter

takbb profile image
Draft Latest edits on 
Apr 24, 2021 11:17 AM
Drag & drop
Like
Use or download
** EXPERIMENTAL ** You are welcome to use it (at your own risk). Please check back for improvements in filters Filter out emoji characters from a string using a built in Regular Expression. This is a proof-of-concept demonstration component. A future version will possibly include the ability to update the regular expression used. 23 April 2021 @takbb Brian Bates This uses a Java Snippet with a java regex replaceall call, and the following regular expression to identify emoji. This is currently experimental with different "filter types" being used. Please contact @takbb on the forum if you have suggestions for improvements to the regex, or techniques used FILTER TYPE 1 ************** filters using the following regex expression and appears to provide limited emoji filtering: [\\p{C}\\p{So}\uFE00-\uFE0F\\x{E0100}-\\x{E01EF}] FILTER TYPE 2 ************** uses the following regex expression, and at this time is the most extensive of the filters: emojiRegex="(?:[\\u2700-\\u27bf]|" + "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" + "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?" + "(?:\\u200d(?:[^\\ud800-\\udfff]|" + "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" + "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|" + "[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]"; FILTER TYPE 3 ************** Filter does not use Regular Expressions but attempts to filter out based on "surrogate pairs" of characters to identify that this is likely to be an Emoji. It filters many usual emoji but does not find all of them

Component details

Input ports
  1. Type: Table
    Data source containing the String column to be parsed
Output ports
  1. Type: Table
    Pass through of the input data source with an additional column containing the specified column with Emoji characters removed

Used extensions & nodes

Created with KNIME Analytics Platform version 4.3.2
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.2

    knime
  • Go to item
    KNIME JavasnippetTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Quick FormsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.2

    knime

This component does not have nodes, extensions, nested components and related workflows

Legal

By using or downloading the component, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits