NodeCreate Bit Vector

Manipulator

Generates for each row of a given input table a bit vector. The bit vectors are either generated from multiple numerical or string columns, a string column containing the bit positions to set, hexadecimal or binary strings or a collection column. In order to adjust the node settings please select first the source column object e.g. if the bit vector should be created from multiple string/numerical columns or from a single string/collection column. Depending on the selected option the corresponding dialog elements are enabled.

Bit vectors from multiple columns

In the case of multiple columns the bit positions in the resulting bit vector correspond to the column position in the input table. For example if the second and third column of a given input table is selected and the first column is omitted the bit vectors of each row will have length 2. The first bit of the bit vector is set if the value of the second column matches the selected criterion likewise the second bit of the bit vector is set if the value of the third column matches the selected criterion. The columns to consider when creating the bit vector can be specified in the multiple column selection section. Using the enforce exclusion/inclusion option the node can be configured to handle previously unknown columns. If the enforce exclusion option is selected all unknown columns are added automatically to the include list whereas if the enforce inclusion option is selected all unknown columns are added to the exclude list. The columns to include can be also defined by a pattern if the Wildcard/Regex Selection option is selected.

Multiple string columns

The bit of a vector is set if the corresponding column value does match/does not match the specified pattern depending on the "Set bit if pattern does match/does not match" option. The pattern may contain wildcards such as '?' or '*' to match any one character or any sequence (including none) of characters. It can also be a complex regular expression.

Multiple numeric columns

There are two options to determine if the bit is set for the value in the corresponding column or not:
  • either a global threshold is defined, then all values which are above or equal to the threshold are converted into set bits, all other bit positions remain 0, or
  • a certain percentage of the mean of each column is used as a threshold, then all values which are above or equal to the percentage of the mean are converted into set bits. As an example let's say the mean percentage is set to 50% and the mean of col1 is 2 and the mean of col2 is 8. Then the corresponding bit for col1 is set if the value is above or equal to 1 and for col2 if the value is above or equal to 4.

Bit vectors from a single column

In the case of a single input column only the selected single column to be parsed is considered for the generation of the bit vectors.

Single string column

In the case of a string input only the column containing the string is considered for the generation of the bit vectors. The string is parsed and converted into a bit vector. There are three valid input formats which can be parsed and converted:
  • Hexadecimal strings: strings consisting only of the characters 0-9 and A - F (where lower- or uppercase is not important). The represented hexadecimal number is converted into a binary number which is represented by the resulting bit vector.
  • Binary strings: strings consisting only of 0s and 1s are parsed and converted into the according bit vectors.
  • ID strings: strings consisting of numbers (separated by spaces) where the numbers refer to those positions in the bit vector which should be set. (Typical input format for association rule mining).

Single collection column

In the case of a single collection column each unique collection element gets a bit position assigned. The length of the bit vectors corresponds to the number of unique elements in a collection cells. For example if the input table contains two rows with the collections {a,b} and {b,c} the corresponding bit vectors will be [110] and [011].

Missing values

For numeric data the incoming missing values will result in 0s. For multiple string columns a missing values will also result in 0s. For the string input missing values will also result in a missing value in the output table. If a string could not be parsed it will also result in a missing cell in the output table and an error message with detailed information is printed in the console. For a collection column all missing collection elements are ignored.

Input ports

  1. Input data to create bit vectors from Type: Data
    Data table with numerical data or a string column to be parsed.

Output ports

  1. Bit vector data Type: Data
    Data table with the generated bit vectors.