There exist several strategies to overcome missing values in machine learning, like ignoring and replacing the missing value with a logical replacement. One very challenging, but extra rewarding strategy is to replace the value with the median of the attribute for the corresponding class. So given an attribute that has missing values, and obviously has a class outcome, you have to calculate the median for each class and replace it for the corresponding class in the selected attribute column. The solution has to work without any additional manual clicking, it has to run automatically, when initiated. The dataset on which this strategy shall be performed is not relevant, as long as it posses any numerical attribute columns, so that one can compute a median. There is no median for boolean or categorical values. The strategy implementation should run fast without any long running processes or possible bottlenecks.
Please provide a screenshot and a short description of the solution using WEKA or Azure Machine Learning, as they represent the relevant target systems where the strategy should be applied.

1 answer

Submitted by Mateusz Gren on Sun, 12/04/2016 - 18:36### Taggings:

This article solves the following challenge:

Replace missing value with median of attribute for the corresponding class

First take the dataset and apply several (for each class label) SQL Transformations:

select * from t1 where label = "ATTRIBUTE";

Then Compute the median for the resulting dataset for each class label seperatly and use this computed median as the paramater for "clean missing data". Then add all rows back together, as they were all split by the SQL Transformation.

Finally run your cross validation on the dataset and review the result.

Please review the attached file on how to structure your model in azure for this strategy.

Attachments:

Evaluate complexity of present statement:

Select ratingCancelGuessingPassing knowledgeKnowledgeableExpert

Your rating: 3 Average: 2.7 (3 votes)