01. R Descriptive Statistics

4 variables
- Name
- Transmission:factor
- Cylinders:int
- Fuel.Economy:num
cars <- read.csv("Cars.csv")
Name and Transmission are factored.

Create a frequency table
Transmission is a categorical variable.
table(cars$Transmission)
Automatic Manual
19 13

Fuel.Economy means miles per gallon.
Fuel.Economy is a numeric variable, it has quantitative value.
to get the minimum by min(cars$Fuel.Economy)
Other statistics like max, mean, median, quantile, sd.

to get all the statistics for all variables, like below:

         summary(cars)
         

                   Transmission       Fuel.Economy        ....
                 Autumatic:   19       Min.    : 4.000
                 Manual:      13       1st Qu. : 4.000
                                       Median  : 6.000
                                       Mean    : 6.188
                                       3rd Qu. : 8:000
                                       Max.    : 8.000

Get the correlation coefficient
The two are both numeric variable.
The result is -0.852162
It is pretty correlated. More cylinders cost oil.
1 is 100% correlated in a positive way.
-1 is 100% correlated in a negative way.
0 means no correlated at all.


        cor(
                x = cars$Cylinders,
                y = cars$Fuel.Economy)

02. Prediction

PURPOSES

One purpose of Data Anaylysis is to know the problem domain with data.
Prediction is another purpose.

code part 1: prepare the data

data(iris) set.seed(42) indexes <- sample( x = 1:150, size = 100) indexes train <- iris[indexes, ] test <- iris[-indexes, ]

data(iris) to create an object from iris data set.
function set.seed(42) to have randonness reproducible for training purpose.
function sample to generate random number,returning 100 obs. numbers.
indexes to output the contents to the console.
The last two is to create subsets, 100 obs and 50 obs respectively.
- train is used for AI to develop the algorithm with the train data with 100 obs.
- test is used for the accuracy of the predictions between the predict values and the real values with 50 obs.

code part 2.1: Train a decision tree model

library(tree) # Train a decision tree model model <- tree( formula = Species ~ ., data = train) # Inspect the model summary(model) # Visualize the decision tree model plot(model) text(model)

Function tree analyze the data and develop the algorithm to predict a specy type.
It is a decision tree model.
summary(model) to inspect the model.
- Variable Petal.Length and variable Petal.Width are the main factors.
- There are 4 terminal nodes - setosa, versicolor, virginica, virginica
- After plotting it, you can see the picture on the left.
- That is the picture of decision tree model

code part 2.2: Present the train model with a scatter plot

library(RColorBrewer) palette <- brewer.pal(3, "Set2") plot( x = iris$Petal.Length, y = iris$Petal.Width, pch = 19, col = palette[as.numeric(iris$Species)], main = "Iris Petal Length vs. Width", xlab = "Petal Length (cm)", ylab = "Petal Width (cm)") # Plot the decision boundaries partition.tree( tree = model, label = "Species", add = TRUE) #------------------------------------- # Set working directory setwd("~/documents/peter_r") # Save the tree model for others save(model, file = "Tree.RData") # Save the training data for others save(train, file = "Train.RData")

The scatter plot is on the right hand side.
Function plot is from the basic.
Function partition.tree is from package tree.
The train data are presented in the plot.

code part 3: Predict

To use test data - 50 obs with actual Specy type.
Using their Petal.Length and Petal.Width to predict their species.
Function confusionMatric from package caret is used to evaluate the prediction results for 50 obs test data.
The accuracy of the prediction is 0.96. Very Good.
Tree.RData and Train.RData can be exported to some web app to do this kind predicts.

# Predict with the model predictions <- predict( object = model, newdata = test, type = "class") # Load the caret package library(caret) # Evaluate the prediction results confusionMatrix( data = predictions, reference = test$Species)

03. Package shiny for R HTML client and server app

features

HTML app
R language exclusively for both client and server
- During the run time, R code will be converted into HTML code.
- Also, R code will coordinate with javascript code form related packages.
- For the server-side, R is the code.
RStudio for development and play the role as server.
- Commands is not longer ok, it must be an app.
- When run an app, the server will listen on localhost:4690
- When no client is connected, the app will be shut down.

web app demo 1

# install and load package shiny ui <- fluidPage("How are you doing!") server <- function(input, output) shinyApp( ui = ui, server = server)

steps

in the script window, prepare the above 3 lines of code.
select them, click run
in the console, showing "Listening on http://127.0.0.0:4690"
A web page is open in the browser.
Open another browser, type localhost:4690
You see the same page, click view page source, you see the greeting.
The greeting is pipulated automatically from the input to the output, and is shown in returning page.
Close two browsers, the server will shut down.

client, server, app

The client is ui, object.
The server is a function.
Both are shown in the env window for persisting as ui and server.
shinyApp is a function to create Shiny app.

web app demo 2

ui <- fluidPage( titlePanel("Input and Output"), sidebarLayout( sidebarPanel( sliderInput( inputId = "num", label = "select a Number", min = 0, max = 50, value = 25)), mainPanel( textOutput( outputId = "text")))) server <- function(input, output) { output$text <- renderText({ paste("You selected ", input$num )}) } shinyApp( ui = ui, server = server)

web app demo 3 - Prediction

On the following web page, there are two parts.
On the right, the panel is divided into 4 rectangles for 3 leave species.
Based on the software analysis(training, learning, AI) with data, a iris leave's pedal width and pedal length are the factors to determine its specy with high accuracy.
The training data are presented as color dots on the panel.
On the left, there are two sliders to set the width and length
When you slide them, you can see the "x" is moving.
Based on the "X" final location, the zone will be the prediction of its specy with high accuracy.
In this scenario, you can see the interactive action in this web page.

predict

library(tree) setwd("~/documents/peter_r") load("Train.RData") load("Tree.RData") library(RColorBrewer) palette <- brewer.pal(3, "Set2") ui <- fluidPage( titlePanel("Iris Species Predictor"), sidebarLayout( sidebarPanel( sliderInput( inputId = "petal.length", label = "Petal Length (cm)", min = 1, max = 7, value = 4), sliderInput( inputId = "petal.width", label = "Petal Width (cm)", min = 0.0, max = 2.5, step = 0.5, value = 1.5)), mainPanel( textOutput( outputId = "text"), plotOutput( outputId = "plot")))) server <- function(input, output) { output$text = renderText({ # Create predictors predictors <- data.frame( Petal.Length = input$petal.length, Petal.Width = input$petal.width, Sepal.Length = 0, Sepal.Width = 0) # Make prediction prediction = predict( object = model, newdata = predictors, type = "class") # Create prediction text paste( "The predicted species is ", as.character(prediction)) }) output$plot = renderPlot({ # Create a scatterplot colored by species plot( x = iris$Petal.Length, y = iris$Petal.Width, pch = 19, col = palette[as.numeric(iris$Species)], main = "Iris Petal Length vs. Width", xlab = "Petal Length (cm)", ylab = "Petal Width (cm)") # Plot the decision boundaries partition.tree( model, label = "Species", add = TRUE) # Draw predictor on plot points( x = input$petal.length, y = input$petal.width, col = "red", pch = 4, cex = 2, lwd = 2) }) } shinyApp( ui = ui, server = server)

code review

Package tree has functions like partition.tree.
Train.RData and Tree.RData are created on last topic.
When the above are loaded, model and train objects are created.
Function predict is a generic for several different models,

my test experience

When I ran the whole code as an app, I experienced some strange problems.
Then, I divided the code several parts - first-6-lines, ui, server, app. Then, it is good.
When some code does not complete, the next starts. That could be the problems.
It is Ok for learning.

26. R descriptive, predict, shiny