02. R data structure and plots

Dimensions	single type	multiple type
1	vector c()	list list()
2	matrix matrix()	data frame data.frame()

vector

1 dimension(x-axis only), defined for one data type
If the types are different, the types will be coerced. R uses weak data type.

x <- c(1,2,3) # num[1:30] 1 2 3 # In R, it is not called an array. x # [1] 1 2 3 # One vector output result, 1 2 3 typeof(x) # double class(x) # numeric y <- c(1. "2", TRUE) # different types y # [1] "1" "2" "TRUE" typeof(y) # character class(y) # character

The following example is to use a vector when plotting a histogram.

hist(ChickWeight$weight, breaks = fivenum(ChickWeight$weight)) # plotting with one line. #Without argument breaks, the default values will be created. #find out what is returned from function fivenum five.values <- fivenum(ChickWeight$weight) #from the env window, five.values num[1:5]. It is a vector. #They are min, 1st quartile, median, 3rd quartile, max. #There are 4 bins.

hist vs boxplot

The above function hist creates 4 bins, contacted one anothers.
not the case for function boxplot.

list

1 dimension(x-axis only), multiple data types
list(...) is to create a list.
in the env window, expand the list, there are 3 elements
- chr "Lucky"
- num 32
- logi TRUE
You can get the same result using str(l) function.
str(...) is to get the structure of the data.

l <- list("Luck", 32, TRUE) class(l) # list typeof(l) # list

matrix

2 dimensions(x-axis and y-axis), one data type

data frame

2 dimensions(x-axis and y-axis), multiple data types
like excel worksheets, sql results
Usually, you use a data frame from csv file or sql result set.
In the following code, iris data frame is used.

    ?datasets                #package, Base R datasets
    ?iris                    #about data frame iris 
    ?head(iris)              #the contents of the first six rows
                             #There are five column names.
                             #no row name is defined, use default, 1,2,3...
    class(iris)              #data.frame
                             #taking care of the whole data.
    typeof(iris)             #list         
    str(iris)                #structure of a row  
                             #iris data frame has 5 elements.Each column is for each element.

Another example is data set diamonds in package ggplot2

    data(package=’ggplot2’)     #list  the datasets in ‘ggplot2’
    ?diamonds                   #get to know one data set, diamonds
    View(diamonds)              #in script window
    summary(diamonds)           #min ..mean..3rd qunatile...for all variables.
    s <- subset(diamonds,   cut %in% ‘Fair’  &  price < 1000)$price    #subset and select
    mean(s)                     #get one of its statistical data.

data frame example 1

ChickWeight is a data frame.
Two variables are involved - Time, weight
plot(ChickWeight$Time, ChickWeight$weight)
scatter plotting, x-coordiate for Time, y-coordinate for weight
Both variables are num.

data frame example 2

ChickWeight is a data frame.
Two variables are involved.
Formula is involved.
Plot function expects its argument as below.
The output is four distributions of their weights for each Diet.
Two variables are for the plotting arguments, not their data.

        boxplot(weight ~ Diet, data = ChickWeight)

data frame example 3

Two variables are involved.
The data of one variable is for x-cooridate.
The data of another variable is for y-cooridate.

        library(ggplot2)
        g <- ggplot(diamonds, aes(x = carat, y = price))
        g <- g + geom_point(aes(color=clarity))
        g

        # see the trend
        library(mgcv)
        c = g + geom_smooth(color='yellow')
        c

        # see the trend in linear model
        l =  g + geom_smooth(method='lm', color='red')
        l

data frame index and combining data frame

#data frame index #The index in R data frame is 1-based. # create a data frame name <- c('happy', 'lucky', 'joy') age <- c(1, 3, 5) my.df <- cbind(name, age) #access the data- frame my.df #output [1,] happy 1 # [2,] lucky 3 # [3,] joy 5 my.df[,1] # happy, luck, joy my.df[2,2] # 3 #merge for inner join, full join, left join, right join df1 <- data.frame(LETTERS, share.keys = 1:26) #26 rows df2 <- data.frame(letters, share.keys = c(1:9, 11, 12,13, 14, 22:34)) #26 rows merge(df1,df2) # inner join 18 rows merge(df1,df2, all = TRUE) # full join 34 rows, <NA> for mistmatch merge(df1,df2, all.x = TRUE) # left join 26 rows, all the left + matched right merge(df1,df2, all.y = TRUE) # right join 26 rows, all the right + matched left #combine the rows from two data frames name <- c('John', 'Mary', "Mike") age <- c(20, 30, 40) df1 <- data.frame(name, age) df1 name <- c('Wiwi', 'Tairo', "Emi") age <- c(5, 6, 7) df2 <- data.frame(name, age) df2 two <- rbind(df1, df2) two

table function

data ChickWeight has 578 occurrences, 4 variables
Four variables are weight, time, chick, diet.
Function table creates a tabular data as below:
The function output is in class table.
The output are the counts of factor, categorical variable, like diet type.
Some plot functions require table as their arguments for input.

data(ChickWeight) t <- table(ChickWeight$Diet) class(t) t ----- result ---------------------- table 1 2 3 4 220 120 120 118

04. debug

If your R code has no function of your own,
- You can select one R statement, and run it.
- see the result.
- Then, select the next statement...
- step thru the all.
- No debug is needed.
If you has any function of your own, you need a debugging process.
If you experience some code problems, a debugging process is needed.
If you want to learn any open-source packages, a debugging process is helpful.

how to debug

using the code for function
add browser() before statement sum = a + b
add browser() after statement sum = a + b
Function browser is a R function.
It sets break points when executing the code.
You can browse the data at the location.
In the top of the script window, click icon Source, not run, to execute in the debug mode.
The first browser() is highlighted.
In the console, the prompt is changed to Browse[1]
- enter a, you can see the value in the env window. a is 2
click next twice, you reach the second break point.
In the console, the prompt is changed to Browse[2]
- enter a, you can see the value in the env window. sum is 5
click cion stop to leave the debug mode.

28. More on R - data structure, function, R files, debug

November 21, 2018

Contents

01. R simple data

02. R data structure and plots

03. function

04. debug

05. R files