## Friday, August 12, 2016

### Random Forest Regression, Negative Variance Explained mechanism

Jeffery Evans, Senior Landscape Ecologist, The Nature Conservancy, Global Lands Science Team, Affiliate Assistant Professor, Zoology & Physiology, University of Wyoming explains a negative percent variance explained in a random forest regression in hilarious way -

I have recently been asked the question: “why do I receive a negative percent variance explained in a random forest regression”. Besides the obvious answer “because your model is crap” I thought that I would explain the mechanism at work here so the assumption is not that randomForests is producing erroneous results. For poorly supported models it is, in fact, possible to receive a negative percent variance explained.

Generally, explained variance (R²) is defined as:

R² = 1 - sum((ŷ-mean(y))²) / sum((mean(y)-y)²)

However, as indicated by Breiman (2001) and the R randomForest documentation the (regression only) “pseudo R-squared” is derived as:

R² = 1 – (mean squared error) / var(y)

Which, mathematically can produce negative values. A simple interpretation of a negative R² (rsq), is that you are better off predicting any given sample as equal to overall estimated mean, indicating very poor model performance.

## Friday, November 6, 2015

### Passing R variables dynamically to JavaScript for data visualization For a random project, I was interested to see if there a way to pass data between R and JavaScript. My purpose was to populate a Highcharts graph dynamically using preprocessed data from R. Of course, my first look was rCharts, an R package to create and publish interactive JavaScript visualizations, which also support the Highcharts for building interactive graphs and plots. The rCharts is a great package and widely popular among R users to create interactive charts without knowing the underlying JavaScript.

My requirement was not only the interactive chart but also dynamic that can create chart on near real time as shown in Highcharts demo. It seems to me that rCharts neither provide a straightforward way to add-point on series nor a way to customize the JavaScript to create such dynamic charts. As far as I know, it has limitations (or out of context of rCharts philosophy) for doing sophisticated jobs that requires JavaScript customization. At this moment it only supports the 24 functions from Highcharts API through R interface.

## Thursday, April 9, 2015

### How to update R to a new version?

A R update method proposed by Dr. Jeffrey S. Evans, University of Wyoming. Here is the method background and details:

Updating R to a new version can be difficult so, I thought that R users out there would find these R version management and configuration approaches useful. There is an R package “installr” that allows one, at the command line, to check the current R version and optionally: 1) download the new version, 2) launch the install, 3) move installed libraries to the new R install, 4) update libraries and 5) uninstall the previous R version. I also attached an “Rprofile.site” R GUI configuration file that will set the path for the R library and changes some other settings.

Following is the code that will install and require the “installr” package and run the “updateR” function with the appropriate flags. To run in Windows, right click the R icon and select "Run as administrator". This script will automatically: 1) check and download the current version of R, 2) execute the install, 3) move current packages to new install, 4) delete old packages, 5) update packages. Copy and paste the following highlighted code block and then answer the queries at the R commandline.

# set a CRAN mirror and library path
.Library.site <- file.path(chartr("\\", "/", R.home()), "library")
.libPaths(file.path(chartr("\\", "/", R.home()), "library"))
local({r <- getOption("repos")
r["CRAN"] <- "http://cran.stat.ucla.edu/"
options(repos=r)})
# If not installed, adds installr library and runs updateR function
if(!require(installr)) {
install.packages("installr", repos = "http://cran.us.r-project.org")
require(installr)
}
updateR(install_R = TRUE, copy_packages = TRUE, keep_old_packages = FALSE,
update_packages = TRUE, quit_R = FALSE, keep_install_file = FALSE)

In the R install wizard choose:
Go with install defaults until...
Under "select components", uncheck 32-bit core files       NEXT>
Under "startup options" select: Yes (customize startup)   NEXT>
Under "Display Mode" select: SDI (separate windows)      NEXT>
Then go with defaults.

You can then uninstall the previous R version(s), with this command, by choosing the previous version in the popup-box, or just specifying the version.

uninstall.R() for interactive window or uninstall.R("3.1.2") to uninstall a specific version.

The “Rprofile.site” R GUI configuration file configures the R GUI:
1) sets the default library path, so that it does not default to your windows users directory (as long as packages are installed with R running as administrator);
2) sets a few options so numbers are never displayed as scientific notation and strings are not automatically coerced into factors and;
3) adds two useful functions “ls.functions” and “unfactor”. The “ls.functions” will list all the functions, associated with the specified package, in the R namespace and the “unfactor” function will coerce factors in a data.frame into a character class.

#### example to list all available functions in the MASS package ####
require(MASS)
ls.functions(MASS)

#### coerce factor to character ####
( x <- data.frame( y=as.factor(c("1","3","5")),
x=as.factor(c("yes","no","perhaps"))) )
str(x)

# Coerce single column
class(unfactor(x)\$y)

# Coerce entire data.frame
x <- unfactor(x)
str(x)

## Monday, June 10, 2013

### How to get raster pixel values along the overlaying line?

One afternoon at Java City, my friend Eric and I were discussing about the ways to to get raster pixel values along the overlaying line. The conversation encourages me to write an quick and dirty solution to solve the issue. The following R code snippet helps to conceive an idea to extract the raster values which are intersect or touch by the overlaying straight line in easy fashion using R raster package.

#Print the raster pixel values along the overlaying line in R. The line's start and end row/col (coordinates) must be provided.

library(raster)
#Create an arbitrary raster, for instance I used a names of color as raster pixel values.
r <- as.raster(matrix(colors()[1:100], ncol = 10))

#Start coordinate of a sample line
x0=1      #row = 1
y0=3      #column = 3

r[x0,y0]

#End coordinate of a sample line
x1=10        #row =10
y1=7 #column=7

#Easy sample line generation algorithm : A naïve line-drawing algorithm

dx=x1-x0
dy=y1-y0

for( x in x0:x1)
{
y = y0 + (dy) * (x - x0)/(dx)

#Print the raster pixel values along the line
print(r[x,y])

}

Pretty simple concept. You can tweak the code & the line drawing algorithm as your requirement. There are several line drawing algorithm available in the internet. Here I used a simplest one that I found.

## Tuesday, May 21, 2013

### Generate Euclidean distance matrix from a point to its neighboring points in R

#Generate Euclidean distance matrix from a point to its neighboring  points in R
library(sp)

#Create a 2D metrix of X & Y coordinates of the neighboring  points
neighbours_point <- matrix(c(5, 6,3,5,4,8,7, 10, 60, 60,11,12), ncol=2)

neighbours_point
[,1] [,2]
[1,]    5    7
[2,]    6   10
[3,]    3   60
[4,]    5   60
[5,]    4   11
[6,]    8   12

#Create a point vector with x and y coordinates from which distance should be calculated
refrence_point<-c(2,3)

refrence_point
 2 3

#Compute the distance matrix

distmat <- spDistsN1(neighbours_point,refrence_point, longlat=FALSE)

distmat
  5.000000  8.062258 57.008771 57.078893  8.246211 10.816654

Enjoy!!

## Thursday, October 11, 2012

### Extract Raster Values from Points

The R blog article encourages me to write this solution to extract Raster values from points in R.

In geospatial analysis, extracting the raster value of a point is a common task. If you have few raster files or few points; you can extract the raster value by overlaying a point on the top of the raster using ArcGIS. What will you do, if you have hundreds of raster files and thousands of points? The easy solution is use loop in Python and ArcGIS. Is loop efficient to use? No. Can loop be avoided? Yes.

Then how?

Step 1: Create a Raster stack or Raster brick of your raster files using “raster” package in R.
For example:
rasStack = stack(raster1, raster2,raster3 …rasterN)

Step 2:  Read point data, and convert them into spatial points data frame.
Sample: pointfile.csv
 Point_ID LONGITUDE LATITUDE 1 48.765863 -94.745194 2 48.820552 -122.485318 3 48.233939 -107.857685 4 48.705964 -99.817363
For example:
coordinates(pointCoordinates)= ~ LONGITUDE+ LATITUDE

Step 3: Extract raster value by points
For example:
rasValue=extract(rasStack, pointCoordinates)

Step 4:  Combine raster values with point and save as a CSV file.
For example:
combinePointValue=cbind(pointCoordinates,rasValue)
write.table(combinePointValue,file=“combinedPointValue.csv”, append=FALSE, sep= ",", row.names = FALSE, col.names=TRUE)

Step 5: You should get the results as following table.
Result: combinedPointValue.csv
 Point_ID LONGITUDE LATITUDE raster1 raster2 raster3 1 48.765863 -94.745194 200 500 -100 2 48.820552 -122.485318 178.94 18.90 10.94 3 48.233939 -107.857685 -30.74 -30.74 -0. 4 4 48.705964 -99.817363 0 110 -0.7

## Tuesday, September 18, 2012

### R tips must know by Geospatial Analyst

1) Merge ESRI shape file with external CSV or data frame to plot the map with CSV/data frame variables

library(maptools)
library(sp)
library(shapefiles)
library(RColorBrewer)    # creates nice color schemes
library(classInt)               # finds class intervals for continuous variables

#Merge data by unique ID
shapefile@data <-data.frame(shapefile@data, csvvalues, by="ID")

attach(shapefile@data)
# Define the number of classes to be mapped
nclass <- 5

# Set the color ramp that will be used to plot these classes
cols <- brewer.pal(nclass,"YlGnBu")

# Set the class breakpoints using equal intervals
# Can also use quantiles or natural breaks - see help(classIntervals)
breaks <- classIntervals(Column_name_to_be_mapped, nclass, style="quantile")

# Based on the breakpoints and color ramp, specify a color to plot for each polygon
plotcols <- findColours(breaks, cols)

# Generate the map
plot (shapefile, col=plotcols)

2) Combine multiple data frames with similar names into a single data frame using matched column names.
Ex. Use gtools library and smart bind function.
library (“gtools”)
df_result <- smartbind(df1,df2)

## Thursday, February 16, 2012

### Interesting cheat sheets for R beginner

To clear the console: CTRL + L

To seek help: ?command_name

To view type:class(object_name)

To maximize the console print view: Options(max.print=999999)

To list active objects in R: ls()

To Remove single object:rm(object_name)

To Remove all Objects in R: rm(list=ls())

To remove all the objects - except 'a': rm(list = ls()[-a])