Friday, August 12, 2016

Random Forest Regression, Negative Variance Explained mechanism

Be The First To Comment
Jeffery Evans, Senior Landscape Ecologist, The Nature Conservancy, Global Lands Science Team, Affiliate Assistant Professor, Zoology & Physiology, University of Wyoming explains a negative percent variance explained in a random forest regression in hilarious way -

I have recently been asked the question: “why do I receive a negative percent variance explained in a random forest regression”. Besides the obvious answer “because your model is crap” I thought that I would explain the mechanism at work here so the assumption is not that randomForests is producing erroneous results. For poorly supported models it is, in fact, possible to receive a negative percent variance explained.

Generally, explained variance (R²) is defined as:

R² = 1 - sum((ลท-mean(y))²) / sum((mean(y)-y)²)

However, as indicated by Breiman (2001) and the R randomForest documentation the (regression only) “pseudo R-squared” is derived as:

R² = 1 – (mean squared error) / var(y)

Which, mathematically can produce negative values. A simple interpretation of a negative R² (rsq), is that you are better off predicting any given sample as equal to overall estimated mean, indicating very poor model performance. 

© 2011 GIS and Remote Sensing Tools, Tips and more .. ToS | Privacy Policy | Sitemap

About Me