- Type: Workflow Port ObjectModelA model captured with Integrated Deployment.
- Type: TableDeployment dataDeployment data with target/ground truth and timestamp: - A table of instances gathered after the deployment where the model was previously applied. - The collected afterwards ground truth (also called target) column should also be available. - The timestamp needs to be already converted to either "Date&Time" Column Type or similar as long as it has a date and it is not String Column Type.
This Component generates a view monitoring the performance of a model captured by Integrated Deployment. The view works for machine learning classifiers for binary as well as multiclass targets. The Component requires the deployment data with timestamps (dates) and target columns in order to showcase the performance over time. In the Interactive View generated, the performance metric is plotted with respect to the time axis, and further, a trend line is plotted based on this performance. A “Retrain” button has been provided in the view. Based on the model performance the user can decide if retraining is necessary. This retraining decision is given at the output of the component via a flow variable. Connect the flow variable output to the workflow branch which retrains the model. Such branch should execute only if the user checked the box in the view. CAPTURED MODEL REQUIREMENTS (Top Port) We recommend using the "AutoML" component with this component. All you need is connect the two components via the black integrated deployment port. You can also monitor a customly trained model with this component. When providing a model not trained by the “AutoML” component, you need to satisfy the below black box requirements: - The model should be captured with Integrated Deployment and have a single input and single output of type Data. - All features columns have to be provided at the model input. - Any other additional columns that are not features can be provided at the model input. - The model output should store all the model input data (features and non-features) and present attached the output predictions columns. - The model output predictions should be one String type and “n” Double type, where “n” is the number of classes in the target column. - The String type prediction column should be named “Prediction([T])” where [T] is the name of your target class (e.g. “Prediction (Churn)”). - The Double type prediction columns should be named “P ([T]=[C1])”, “P ([T]=[C2])”, …, “P (T=[Cn])”, where [Cn] is the name of the class that probability is predicting (e.g. “P (Churn=not churned)” and ”P (Churn=churned)” in the binary case). Additionally, if you are not using the AutoML component, you need to provide a flow variable called “target_column” of type String with the name of your ground truth/target column in the top input of the “Model Monitor View“ Component. INPUT DEPLOYMENT TABLE REQUIREMENTS (Bottom Port) - All features columns that were used in the training of the captured model. - Availability of target column and timestamp column. Each record timestamp tracks the date in which the model was applied on that data row. The timestamp should be of “Date&Time” column Types. “Time” and “String” types are not supported. Use the “String to Date&Time” node. The timestamp column should be uniformly distributed across the sample: time ranges in between dates where samples are missing should be somewhat constant.
- Type: Flow VariableRetrainFlow variable to activate downstream workflow branch for retraining.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.3.0
By using or downloading the component, you agree to our terms and conditions.