Challenge 34 - Word Scramble
Level - Medium
Description - One of your tasks at work is to train a model using sentences with the correct word context (i.e., words in a sentence following a meaningful and correct order). However, to train such model, you also need to create a dataset of words used in an incorrect context. You can think of this task as a version of Negative Sampling - a neat technique for training the famous Word2Vec model. Concretely, in this challenge you will create a workflow that takes a sentence and scrambles the order of its words. You can create a small sample of sentences to test your work with the Table Creator node.
Input
I like cats.
Output
cats. like I
Hint - Our simple solution only uses 5 nodes, but the permutations are not exactly random. Conversely, our more complex solution uses more than 15 nodes and 2 loops, as well as the Random Numbers Generator node, to create truly scrambled sentences.
Bonus - Create a solution with true randomization without using any loops.
-------------------------------------------------------------------------------------------------------------
4 workflows have been designed to solve this problem -
1) The simplest solution requires 5 nodes. However, this method has a small limitation in which all rows are shuffled at the same time i.e. each column has its row reordered in a similar manner, hence, it does not cover all permutations.
2) A loop-based solution, whereby each sentence (represented by an individual column) is scrambled in its own manner
3) Another loop-based solution, similar to 2 but using a Random Numbers Generator node to assign a random number to generate the scrambled word order for each sentence.
4) Another solution involves a loop-free workflow to create true randomization of each sentence. In this workflow, all the words in all the sentences are collected in a single column. Then the rows are shuffled, creating a column of words in a random order. Then using a groupby node, the words are grouped based on the original sentence they came from. The groupby output is a list of scrambled words in the appropriate sentence. Further manipulation gives the fully scrambled sentence.
Workflow
justknimeit-34 - Word Scramble
Used extensions & nodes
Created with KNIME Analytics Platform version 4.6.1
Legal
By using or downloading the workflow, you agree to our terms and conditions.