17 September 2022
***************
Thank you for your interest in this component. I recommend that you switch to using the String Emoji and Character Class Filter which allows you to filter emojis and other characters using a list of "class/category" names
https://hub.knime.com/takbb/spaces/Public/latest/Components/String%20Emoji%20Filter
***************
** EXPERIMENTAL ** You are welcome to use it (at your own risk). Please check back for improvements in filters
Filter out emoji characters from a string using a built in Regular Expression. This is a proof-of-concept demonstration component. A future version will possibly include the ability to update the regular expression used.
23 April 2021 @takbb Brian Bates
This uses a Java Snippet with a java regex replaceall call, and the following regular expression to identify emoji. This is currently experimental with different "filter types" being used.
Please contact @takbb on the forum if you have suggestions for improvements to the regex, or techniques used
FILTER TYPE 1
**************
filters using the following regex expression and appears to provide limited emoji filtering:
[\\p{C}\\p{So}\uFE00-\uFE0F\\x{E0100}-\\x{E01EF}]
FILTER TYPE 2
**************
uses the following regex expression, and at this time is the most extensive of the filters:
emojiRegex="(?:[\\u2700-\\u27bf]|" +
"(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
"[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?" +
"(?:\\u200d(?:[^\\ud800-\\udfff]|" +
"(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
"[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|" +
"[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]";
FILTER TYPE 3
**************
Filter does not use Regular Expressions but attempts to filter out based on "surrogate pairs" of characters to identify that this is likely to be an Emoji. It filters many usual emoji but does
not find all of them
- Type: TableData source containing the String column to be parsed