## 26.    R descriptive, predict, shiny

home

### 01. R Descriptive Statistics

• 4 variables
• Name
• Transmission:factor
• Cylinders:int
• Fuel.Economy:num
• Name and Transmission are factored.

• Create a frequency table
• Transmission is a categorical variable.
• table(cars\$Transmission)
• Automatic    Manual
19              13

• Fuel.Economy means miles per gallon.
• Fuel.Economy is a numeric variable, it has quantitative value.
• to get the minimum by min(cars\$Fuel.Economy)
• Other statistics like max, mean, median, quantile, sd.

• to get all the statistics for all variables, like below:
```         summary(cars)

Transmission       Fuel.Economy        ....
Autumatic:   19       Min.    : 4.000
Manual:      13       1st Qu. : 4.000
Median  : 6.000
Mean    : 6.188
3rd Qu. : 8:000
Max.    : 8.000
```

• Get the correlation coefficient
• The two are both numeric variable.
• The result is -0.852162
• It is pretty correlated. More cylinders cost oil.
• 1 is 100% correlated in a positive way.
• -1 is 100% correlated in a negative way.
• 0 means no correlated at all.
```
cor(
x = cars\$Cylinders,
y = cars\$Fuel.Economy)
```

### 02. Prediction

PURPOSES
• One purpose of Data Anaylysis is to know the problem domain with data.
• Prediction is another purpose.

code part 1: prepare the data

data(iris) set.seed(42) indexes <- sample( x = 1:150, size = 100) indexes train <- iris[indexes, ] test <- iris[-indexes, ]
• data(iris) to create an object from iris data set.
• function set.seed(42) to have randonness reproducible for training purpose.
• function sample to generate random number,returning 100 obs. numbers.
• indexes to output the contents to the console.
• The last two is to create subsets, 100 obs and 50 obs respectively.
• train is used for AI to develop the algorithm with the train data with 100 obs.
• test is used for the accuracy of the predictions between the predict values and the real values with 50 obs.

code part 2.1: Train a decision tree model

library(tree) # Train a decision tree model model <- tree( formula = Species ~ ., data = train) # Inspect the model summary(model) # Visualize the decision tree model plot(model) text(model)
• Function tree analyze the data and develop the algorithm to predict a specy type.
• It is a decision tree model.
• summary(model) to inspect the model.
• Variable Petal.Length and variable Petal.Width are the main factors.
• There are 4 terminal nodes - setosa, versicolor, virginica, virginica
• After plotting it, you can see the picture on the left.
• That is the picture of decision tree model

code part 2.2:   Present the train model with a scatter plot

library(RColorBrewer) palette <- brewer.pal(3, "Set2") plot( x = iris\$Petal.Length, y = iris\$Petal.Width, pch = 19, col = palette[as.numeric(iris\$Species)], main = "Iris Petal Length vs. Width", xlab = "Petal Length (cm)", ylab = "Petal Width (cm)") # Plot the decision boundaries partition.tree( tree = model, label = "Species", add = TRUE) #------------------------------------- # Set working directory setwd("~/documents/peter_r") # Save the tree model for others save(model, file = "Tree.RData") # Save the training data for others save(train, file = "Train.RData")
• The scatter plot is on the right hand side.
• Function plot is from the basic.
• Function partition.tree is from package tree.
• The train data are presented in the plot.  code part 3:   Predict

• To use test data - 50 obs with actual Specy type.
• Using their Petal.Length and Petal.Width to predict their species.
• Function confusionMatric from package caret is used to evaluate the prediction results for 50 obs test data.
• The accuracy of the prediction is 0.96. Very Good.
• Tree.RData and Train.RData can be exported to some web app to do this kind predicts.
# Predict with the model predictions <- predict( object = model, newdata = test, type = "class") # Load the caret package library(caret) # Evaluate the prediction results confusionMatrix( data = predictions, reference = test\$Species)

### 03. Package shiny for R HTML client and server app

features
• HTML app
• R language exclusively for both client and server
• During the run time, R code will be converted into HTML code.
• Also, R code will coordinate with javascript code form related packages.
• For the server-side, R is the code.
• RStudio for development and play the role as server.
• Commands is not longer ok, it must be an app.
• When run an app, the server will listen on localhost:4690
• When no client is connected, the app will be shut down.

#### web app demo 1

# install and load package shiny ui <- fluidPage("How are you doing!") server <- function(input, output) shinyApp( ui = ui, server = server)
steps
• in the script window, prepare the above 3 lines of code.
• select them, click run
• in the console, showing "Listening on http://127.0.0.0:4690"
• A web page is open in the browser.
• Open another browser, type localhost:4690
• You see the same page, click view page source, you see the greeting.
• The greeting is pipulated automatically from the input to the output, and is shown in returning page.
• Close two browsers, the server will shut down.
client, server, app
• The client is ui, object.
• The server is a function.
• Both are shown in the env window for persisting as ui and server.
• shinyApp is a function to create Shiny app.

#### web app demo 2

ui <- fluidPage( titlePanel("Input and Output"), sidebarLayout( sidebarPanel( sliderInput( inputId = "num", label = "select a Number", min = 0, max = 50, value = 25)), mainPanel( textOutput( outputId = "text")))) server <- function(input, output) { output\$text <- renderText({ paste("You selected ", input\$num )}) } shinyApp( ui = ui, server = server)

#### web app demo 3 - Prediction

• On the following web page, there are two parts.
• On the right, the panel is divided into 4 rectangles for 3 leave species.
• Based on the software analysis(training, learning, AI) with data, a iris leave's pedal width and pedal length are the factors to determine its specy with high accuracy.
• The training data are presented as color dots on the panel.
• On the left, there are two sliders to set the width and length
• When you slide them, you can see the "x" is moving.
• Based on the "X" final location, the zone will be the prediction of its specy with high accuracy.
• In this scenario, you can see the interactive action in this web page. library(tree) setwd("~/documents/peter_r") load("Train.RData") load("Tree.RData") library(RColorBrewer) palette <- brewer.pal(3, "Set2") ui <- fluidPage( titlePanel("Iris Species Predictor"), sidebarLayout( sidebarPanel( sliderInput( inputId = "petal.length", label = "Petal Length (cm)", min = 1, max = 7, value = 4), sliderInput( inputId = "petal.width", label = "Petal Width (cm)", min = 0.0, max = 2.5, step = 0.5, value = 1.5)), mainPanel( textOutput( outputId = "text"), plotOutput( outputId = "plot")))) server <- function(input, output) { output\$text = renderText({ # Create predictors predictors <- data.frame( Petal.Length = input\$petal.length, Petal.Width = input\$petal.width, Sepal.Length = 0, Sepal.Width = 0) # Make prediction prediction = predict( object = model, newdata = predictors, type = "class") # Create prediction text paste( "The predicted species is ", as.character(prediction)) }) output\$plot = renderPlot({ # Create a scatterplot colored by species plot( x = iris\$Petal.Length, y = iris\$Petal.Width, pch = 19, col = palette[as.numeric(iris\$Species)], main = "Iris Petal Length vs. Width", xlab = "Petal Length (cm)", ylab = "Petal Width (cm)") # Plot the decision boundaries partition.tree( model, label = "Species", add = TRUE) # Draw predictor on plot points( x = input\$petal.length, y = input\$petal.width, col = "red", pch = 4, cex = 2, lwd = 2) }) } shinyApp( ui = ui, server = server)
code review
• Package tree has functions like partition.tree.
• Train.RData and Tree.RData are created on last topic.
• When the above are loaded, model and train objects are created.
• Function predict is a generic for several different models,
my test experience
• When I ran the whole code as an app, I experienced some strange problems.
• Then, I divided the code several parts - first-6-lines, ui, server, app. Then, it is good.
• When some code does not complete, the next starts. That could be the problems.
• It is Ok for learning.