Evaluation of R models in Azure Machine Learning Studio

MS Azure Lachine Learning Studio is a cloud service that enables users to build, deploy, and share predictive analytics solutions over a web interface. Microsoft currently offers an option for a Create R Model module, which allows for creating custom ML models written in R and integrating them into Azure ML Studio pipelines, hence opening up a vast area of possibilities. However, currently the Evaluate Model module cannot be used with a Create R Model module. This means that the quality of a Create R Model module cannot be assessed uniformly and cannot be directly compared to native Azure models.
1 answer

Create custom R script to change necessary Metadata

Fortunately R again can help. From the error message we can see that Evaluate Model is missing a label column from the incoming dataset. Score Model on the other hand creates all the necessary information. This indicates a possible metadata issue. The appropriate columns do exist but need to be further annotated. The solution emerges when reading the appropriate metadata documentation. Typical metadata information includes:

- Treating Boolean or numeric columns as categorical values
- Indicating which column contains the true label, the assigned label or the classification score
- Marking columns as features
- Changing date/time values to a numeric value
- Adding or changing column names

We need to indicate these special columns, i.e. true label, assigned label, and classification score in the dataset moved between the scoring and evaluation modules. The simplest way to do this is by using an Execute R Script as a bridge between the Score and the Evaluate modules.

Before we provide the code of the Execute R Script module, we must make three important observations about the R training script and the R scoring script:

Training Script:
library(e1071)
features <- get.feature.columns(dataset)
labels <- as.factor(get.label.column(dataset))
train.data <- data.frame(features, labels)
feature.names <- get.feature.column.names(dataset)
names(train.data) <- c(feature.names, "Class")
model <- naiveBayes(Class ~ ., train.data)

Scoring Script:
library(e1071)
probabilities <- predict(model, dataset, type="raw")[,2]
classes = 0.5))
scores <- data.frame(classes, probabilities)

In the ‘Training Script’ of the example we can see that the classification column (true label) is called Class
The ‘Scoring Script’ of the example ends with scores <- data.frame(classes, probabilities). The first corresponds to the assigned label and the second to the classification score
The Score Model module has the option Append score columns to output checked, so we expect the ‘Scoring Script’ to add two extra columns on the input dataset: classes and probabilities.

The final R script that will bridge the Score and Evaluation modules is as follows:


dataset1 <- maml.mapInputPort(1)
data.set <- data.frame(true_labels=dataset1$Class,
assigned_label=dataset1$classes,
calibrated_score=dataset1$probabilities)
attr(data.set$assigned_label, "feature.channel") <- "Binary Classification Scores"
attr(data.set$assigned_label, "score.type") <- "Assigned Labels"
attr(data.set$calibrated_score, "feature.channel") <- "Binary Classification Scores"
attr(data.set$calibrated_score, "score.type") <- "Calibrated Score"
names(data.set) <- c("Class" ,"Scored Labels" , "Scored Probabilities")
maml.mapOutputPort("data.set");

The R code provided above is tailored for the specific example; nevertheless, it is easily customized by identifying the corresponding columns and making the appropriate modifications.

Taggings: