Coursera: An Introduction to R (My Course Notes)

Setting up your Working Directory(WD)
  • To get your WD, use getwd()
  • To list the contents of your WD, use ls().
  • To change WD, look around your GUI.
  • R scripts can be created in any editor.
    • To import to R
      • Copy the text and paste in the Console
      • Create the RScript and import it to the session
        • Save the script as a .r file in the WD.
        • Use source(@.rFile) to import the file to the session.
    • To see the contents of a session, use ls()
  • To see the contents of the WD, use
R Input and Evaluation
  • Entering Input
    • We type expressions at the R prompt.
    • <- is the assignment operator
    • # indicates a comment. Anything after # is ignored
  • Evaluation
    • This occurs when you press enter after an expression and causes the result of the evaluation to be displayed.
    • To print the contents of a vector
      • Auto-printing: Enter the name of the vector and it’s contents are shown on the screen
      • use print(@vector)
    • When a vector is printed, the value in [@numberOfChar] at the beginning of the line shows the ordinal position of the first value on that line in the vector.
R Data types and operations
  • R has 5 basic classes of objects
    • Character
    • numeric(real numbers)
    • integers
    • complex: 1+4i
    • logical: TRUE/FALSE
  • The most basic object is a vector
    • It contains objects of the same class
    • Created by vector() if it should be empty, else the assignment operator (<- ) can be used.
  • A list is a vector that can have objects of multiple classes.
  • Numbers
    • Numbers are numeric objects(double precision real numbers).
    • If you want to input an integer, you must specify by putting L after the value
      • x <- 1L
    • The number Inf represents infinity. it can be used in calculation
    • The value NaN(undefined value/ not a number).
      • Also used to represent a missing value
  • Attributes
    • Every object has attributes
      • names/ dimnames
      • Dimensions(e.g. matrices, arrays)
      • class
      • length
      • other user-defined attributes
    • The attributes of an object can be accessed using attributes()
R data types( vectors and lists)
  • Creating Vectors
    • c() can be used.
      • c() concatenates
        • e.g. c(@value1, @value2)
      • A shortcut for inputting a series of numbers is @firstnumber:@lastnumber
        • e.g. @vector <- 9:19
    • vector()
      • Used to create an empty vector
      • Has the format vector(@class, length= @LengthOfVector)
      • All the values of such a vector are 0.
  • Creating a vector with mixed objects
    • R coerces so every element of the vector is of the same class.
    • The order is logical > number > character.
      • TRUE is 1 and false is 0.
    • Explicit coercion
      • as.@destinationClass() is used to convert an object to a specific class.
      • Nonsensical coercion results in NAs
        • e.g. coercing a vector of characters to numeric.
        • This results in a warning
  • Elements of a vector have [@Element] format
  • Lists
    • A vector that can contain elements of different classes
      • G <- list(1, "a", TRUE, 1+4i)
    • Elements of a list have [[@Element]] format
    • Created using list()
    • The elements of a list can be named when they’re populated
      • list(@ElementName1 = @Element1, @ElementName2 = @Element2, @ElementName3 = @Element3)
Matrices
  • They’re vectors with a dimension attribute.
    • The dimension attribute is a vector of length 2 (nrow, ncol)
    • They’re created using
      • matrix()
        • matrix(nrow = @NumberOfRows, ncol = @NumberOfColumns): This creates an empty matrix
        • matrix(@listOfValues, nrow = @NumberOfRows, ncol = @NumberOfColumns)
        • e.g. matrix(1:6, nrow = 2, ncol = 3)
      • Transforming a vector using dim()
        • dim(@vectorToBeConverted) <- c(@NumberOfRows, @NumberOfColumns)
      • Column binding and row binding
        • done using cbind() or rbind()
          • cbind(@VectorToBeCol1, @VectorToBeCol2 ):
            • The vectors are converted to columns
          • rbind(@VectorToBeRow1, @VectorToBeRow2):
            • The vectors are converted to rows
    • dim(@matrix) is used to show the dimensions of a matrix
  • Every Attribute must be of the same class
  • Matrix are constructed column wise
    • Entries start in the upper-left and go down the column. When the column is filled, they start on the next column
  • Matrices can be named using dimnames()
    • dimnames(@matrix) <- list(c(@row1Name, @row2Name), c(@element1Name , @element2Name))
Factors
  • They’re vectors used to represent categorical data.
    • Ordered factors: Factors where their values have orders: e.g. small, medium, large
    • Unordered factors: Factors where their values do not have orders e.g. Male and Female
  • They’re treated specially by modeling functions like lm() and glm()
  • Factors are best represented with labels instead of values
    • E.g. use Male and Female instead of 1 and 2
  • Created using
    • factor(c(@value1, @value2, @value3))
  • Functions
    • Levels(@Factor) is used to show the distinct values of the factor
    • table(@Factor) is used to show the count of the distinct values of the factor
    • unclass(@Factor) is used to show the numeric value of the values of the factor. The numeric value is the same for each level.
  • A factor is basically a numeric vector over the levels attributes.
  • The order of the levels can be set using the levels argument to factor()
    • This is important because in linear modeling, the first level is the baseline level. Otherwise, the baseline level is the first level if the levels are ordered.
    • factor(c(@value1, @value2, @value3), levels = c(@value1, @value2, @value3))
Missing values
  • Denoted by either NA or NaN( For undefined mathematical operation)
  • is.na() is used to test objects to see if they’re NA or NaN
  • is.nan() is used to test for NaN
  • Na values have a class also
    • They can be integer NA, character NA etc
  • A NaN value is also NA but NA is not always true
Data Frames
  • Used to store tabular data
  • Represented as a special type of list where every element has the same length
  • Each element of the list can be thought of as the column, and the length of an element shows the number of rows
  • Unlike a matrix, data frames can store a different class of objects in each column
  • They have a special attribute row.names
    • It is used to annotate the data.
  • Created using
    • read.table()
    • read.csv()
    • data.frame(@FirstElement = @Values, @SecondElement = @Values)
  • A data frame can be converted to a matrix using data.matrix()
    • Implicit conversion is applied to convert the values to the same class
Names
  • Objects in R can have names
    • Useful for writing self-describing objects and readable code
  • created by populating names(@object)
    • names(@object) <- c(@element1Name, @element2Name)
  • names(@object) prints the names of the elements of the object
Reading and writing Data in R
For reading Data into R
  • read.table: Read text files that contains data in rows and columns and return a data frame
    • For small and moderate data sizes, Usually called with only the file parameter. some of the other parameters assume their default values.
    • For large data sets,
      • The size of the data frame musn’t exceed the size of your system’s memory
      • Some parameters must be specified as they help read operations perform faster and more efficiently.
        • nrows
          • Enable R allocate memory better
        • colClasses
          • This prevents R from scanning every column to determine their data type
          • An easy method to get this is to read in the first 100 rows, let R calculate the classes for those and save it to a vector, and then use the vector as colClasses
                                                       Data<- read.table("@File", nrows =100)
                                                       classes <- sapply(Data, classes)
                                                       DataFrame <- read.table("@File", colClasses = classes)
  • You can calculate the memory requirements for a data frame
    • using the formula nrows *( (@ncolsOfColClassA * @SizeOfColClassA) +  (@ncolsOfColClassB * @SizeOfColClassB))
      • This gives the size in bytes
        • To convert to MB, divide by 220
          • To convert MB to GB, divide by 210
      • You need 2 times the amount of memory as the size of the data frame to import a data set
    • Has the following params
      • file: name of a file or a text-mode connection
      • header: logical param indicating if the file has a header line or not. if true, the first line is used to set variable Names
      • sep: string param indicating how the columns are separated
      • colClasses: character vector, length is the number of elements in the data set. indicates the class of each column in the dataset. default is for R to calculate.
      • nrows: the number of rows in the data set. default is for R to calculate it
      • comment.char: string param indicating the comment character. default is #
      • skip: the number of lines to skip from the beginning
      • stringsAsFactors: logical param stating whether variables should be coded as factors. The default is TRUE.
  • read.csv
    • same notes as read.csv, except the sep value is ","
  • scan: For reading matrices
  • readLines:
    • Reads a file and returns text
    • Can also be used to read a webpage
  • source: For reading R code e.g. functions
  • dget: For reading R codes for R objects that have been deparsed to text files
  • load: For reading in saved workspaces
  • unserialize: For reading single R objects in binary form.
For Writing Data from R
  • write.table
  • writeLines: used to write to a file
    • Each element is written to the file one line at a time
  • Dump and Dput
    • dput
      • see notes for dump.
      • Used to deparse a single R object.
        • The output of this is the code required to reconstruct the R object
          • dput(@vector, file = @destinationFile)
        • dget is used to read the data back.
          • dget(@OutputFileOfDput)
    • Dump
      • Used to deparse multiple R objects
        • dump(c("@vector1","@vector2"), file = @destinationFile)
        • source is used to read the data back
          • source(@OutputFileOfDump)
    • They’re text formats so it’s editable
    • They preserve the metadata of the data set so it doesn’t have to be specified again
    • They’re not very space efficient
  • save
  • serialize
Connection
  • Used to interface external objects.
  • Data is read in using connection interfaces.
    • file: standard uncompressed file
      • accepts the following parameters
        • description: The name of the file
        • open: a code indicating what operations are allowed on the file
          • r : read only
          • w: writing
          • a: appending
          • rb, wb, ab: reading, writing or appending in binary mode
            • This work only in windows
    • gzfile: file of gzip compression
    • bzfile: file of bzip2 compression
    • url: webpage
  • A connection is usually inferred most times e.g. when a read.csv is performed
  • Connections have to be closed after they’ve been used
               con<- file(@FIle, @allowedOperations)
               close(con)
  • Connections can be used to read part of a file
                    con <- gzfile(@File)
                    x<- readLines(con, 10)
Subsetting
Subsetting are operations to extract subsets of R objects
  • There are 3 operators for performing subsetting
    • []
      • This returns an object of the same class as the original
      • can be used to select more than one element except when referencing a matrix where it returns a vector
    • [[]]
      • is used to extract elements of a list or a data frame
      • Can only extract a single element
      • the class of the returned object will not necessarily be a list or a data frame
    • $
      • used to extract elements of a list or a data frame by name
      • similar to [[]]
  • There are several ways to perform subsetting
    • Use a numeric index
      • Enter the ordinal position of the element in the object to extract that element
        • @vector[3]
      • can be used to extract a sequence of objects
        • @vector[1:5]
    • Use a logical index
      • Enter a logical proposition to select elements in the object that satisfy the proposition
        • @vector[@vector > "a"]
        • @newLogicalVector <- @vector > "a"
          • The output of this is a logical vector that shows a logical element for all the elements of the object based on the proposition.
          • @vector[@newLogicalVector] returns an object of all the elements that satisfy the proposition
  • Subsetting lists
    • Reference list is x<- list(food=1:4, bar = 0.6, baz = c(3.14, 2.81))
    • [] returns a list
      • x[1] returns list
        • $food 1,2,3,4
      • x["elementName"] returns a list
      • Allows you return more than one element of the list
        • x[c(1,3)] returns food and baz
    • [[]] returns a sequence without the name
      • x[[1]] returns 1,2,3,4
      • x[["bar"]] returns an object of class numeric
      • Allows you work with computed index(where the name of the element in the object is the result of a computation)
        • e.g. if name <- "food"
          • x[[name]] returns the food object.
      • Can be used to subset nested elements of a list
        • x[[c(Element,subelement)]s
          • E.g. x[[c(1,3)]] or x[[1]][[3]] returns 3( the third element of the first element)
    • $@elementName: returns a sequence without the name
      • x$food returns a numeric vector object
  • Subsetting Matrices
    • Reference matrix is x<- matrix(1:6,2,3) a 2 row by 3 column matrix
                         1     3     5
                         2     4     6
    • Format is @matrix[@row, @column]
    • To select the whole row use @matrix[@row,]
      • e.g. x[1,] returns 1,3,5
      • To select the whole column use @matrix [,@column]
        • e.g. x[,1] returns 1,2
    • The Drop parameter
      • The drop parameter drops the matrix orientation when a subset operation is carried out on a matrix.
        • Default is drop = TRUE
        • When a single element of a matrix is retrieved, it returns a vector of length 1 instead of a 1X1 matrix
          • x[1,2 ,drop = FALSE]
        • When a single row or column is retrieved, it returns a vector not a matrix
          • x[1, , drop = FALSE]
  • Subsetting data Frames
    • To subset only observations(rows) 
      • use subset(@vector. @column @operator @value)
        • E.g. subset(iris, Species == ‘virginica’)#This returns all rows where species is virginica
      • use the [] operator
        • @vector[@column @operator @value,]
        • E.g. iris[iris$species ==’viriginica’]
    • To subset only variables(columns)
      • @vector[, @vectorOfColumnsNumbersToReturn]
        • E.g. iris[, 1:3]
        • iris[, c(1,2)]
  • Subsetting with partial matching of names
    • Partial matching of name is allowed with [[]] and $
    • It allows you return objects with a name that start with the characters you supply and saves you typing in the command line
    • Reference object is x<- list(aardvark = 1:5)
    • $ looks for a name in the list that matches the element
      • x$a returns aardvark
    • [[]]
      • using the exact = FALSE parameter on [[]] allows it to carry out partial matching
      • x[["a", exact = FALSE]] returns aardvark
        • Exact = TRUE by defaul t
  • Removing NA values
    • The most common way is to create a logical vectors showing what elements are NA
      • use is.na() on the object
      • return the non NA objects
        • e.g.
                                                  x<-c(1,2,NA,4,NA,5)
                                                  bad <- is.na(x) #bad is a logical vector F,F,T,F,
                                                  x[!bad] #returns 1,2,4,5
                                   good <- complete.cases(@DataFrame)
                                   @dataFrame[good,][1:6,] # this does it for only rows
Vectorized operations
  • Operations are vectorized to make coding more efficient and easier to read
    • Carrying out operation on vectors of the same length
      • Elements at the same position in both vectors interact based on the operation applied to the vector
        • E.g.
                                             x<- 1:4; y<-6:9
                                             x+y # returns 7     8       9      10
                                             x*y # returns 6     14     24     36
  • Carrying out comparison operations
    • Compares all the vectors returning a logical vector
      • E.g.
                                             x<- 1:4; y<-6:9
                                             x>2 # returns a logical vector F     F     T
  • Carrying out operations on vectors of different lengths
  • Vectorized matrix operations
    • Reference is x<- matrix(1:4,2,2); y<-matrix(rep(10,4),2,2)
    • x*y or x.y carries out element-wise multiplication
                         x*y #multiplies all elements in the same position together and returns a matrix
  • To carry out matrix multiplication
    • x %*% y
seq_along(): takes a vector as input and creates an integer sequence equal to the length of that vector
seq_len(): takes an integer and creates a sequence from 1 to the value of the integer
Control Structures in R
  • Used to control the flow of execution of a program, depending on runtime conditions
  • The types of control structures are
    • If…else: Allows you to test for a condition
      • If(@Condition) { Do something} else if(@Condition) {Do something} else{ Do something}
        • E.G.
                                   y <- if(x>3){10} else {0}
  • for: execute a loop a fixed number of times
    • Takes an iterator variable and assigns it successive values from a sequence or vector.
      • For loops can be nested.
    • while: Execute a loop while a condition is true
      • Begin by testing a condition. If it is true, then execute the loop body. Once the loop body is executed, test the condition again.
      • Can execute infinitely if the condition never gets met.
      • conditions are evaluated from left to right
    • repeat: execute an infinite loop
      • Initiates an infinite loop. Can only be exited with break
    • break: break the execution of a loop
    • next: skip an iteration of a loop
    • return: exit a function
      • Signals that a function should exit and return a given value.
  • The above are primarily for writing programs, for interactive work, use the *apply functions
Writing functions
  • Functions are usually prefaced with the keyword function
               function(@Argument1, @Argument2)
               { @Argument1 @Operator @Argument2 }
  • Functions are R objects
    • Functions can be passed as arguments to other functions
    • Functions can be nested.
  • Functions have named arguments which potentially have default values.
    • Formal arguments are the arguments included in the function definition
      • Formals() returns a list of all the formal arguments of a function
    • Function arguments can be missing or might have default values
  • Function arguments can be matched positionally or by name
    • You can mix positional matching with matching by name.
      • When an argument is matched by name, it is taken out of the argument list and the remaining unnamed arguments are matched in the order they are listed in the function definition
  • Function arguments can be partially matched( only first few characters are used to match the arguments)
    • The order of operations when given an argument is
      • Check for exact match for a named argument
      • Check for a partial match
      • Check for a positional match
  • Arguments value can be set to NULL
  • Arguments in R are evaluated lazily
    • They are evaluated only as needed. If the argument is never used, it’s presence is never tested.
    • No error is thrown until the compiler notices a problem with an argument.
  • The … Argument
    • Used to indicate a variable number of arguments that are usually passed on to other arguments
    • used when extending another function so you don’t copy the entire function list of the original function
                         myplot<- function(x,y, type = "l",…){
                              plot(x, y, type=type,…)} ##This copies the remaining argument list of plot into myplot
  • Generic functions use … so that extra arguments can be passed to methods.
    • Necessary when the number of arguments passed to the function cannot be known in advance
      • E.g see args(paste), args(cat)
    • Any arguments that appear after … on the argument list must be named explicitly and cannot be positionally or partially matched.
Scoping rules
  • How R binds values to symbols
  • R searches through a series of environments(list of objects/symbols and values) to find the appropriate value.
    • In command line, when attempting to retrieve the value of an R object, the order is
      • Search the global environment for a symbol name matching the one requested
      • Search the namespaces of each of the packages on the search list(see search())
    • The global environment or the user’s workspace is always the first element of the search list and the base package is always the last( with regards to namespaces, see search()).
      • The order of packages on the search list matters
      • User’s can configure which packages get loaded on startup so you cannot assume a set of packages are available
      • When a user loads a package with library, the package is moved to position 2 in search() and everything gets shifted down.
      • R has separate namespaces for functions and non-functions.
        • you can have an object and function named the same.
  • Scoping Rules
    • They’re the main difference between S and R
    • They determine how a value is associated with a free variable in a function
    • R uses lexical/static scoping
      • The scoping rules of a language determine how values are assigned to free variables.
        • Free variables are not formal arguments and are not local variables( assigned inside the function body).
      • E.g.
                                   f<-function(x,y)
                                   {
                                        g <-9 ##g is a local variable
                                        g*(x^2) + y/z ##x and y are formal arguments, z is a free variable. The scoping rules state how we assign a value to z.
                                   }
    • Lexical scoping states that the values of free variables are searched for in the environment in which the function was defined.
      • An environment is a collection of symbol,value pairs
      • An environment has a parent environment which can have multiple children. The only exception is an empty environment
      • A function + an environment = a closure or function closure.
      • To search for the value of a free variable
        • If the value of a symbol is not found in the environment in which a function was defined, then the search is continued in the parent environment.
        • The search continues down the sequence of parent environments until we hit the top-level environment; this is usually the global environment (workspace) or the namespace of a package.
        • After the top-level environment, the search continues down the search list until we hit the empty environment is reached and an error is thrown
      • Typically a function is defined in the global environment, so that the values of free variables are just found in the user’s workspace
        • In R you can have functions inside other functions 
          • In this case, the environment in which a function is defined is the body of another function.
            • E.G.

                                                                 

                                                                   n is a free variable with respect to pow. hence it’s value is found in the parent environment.

          • To get what’s in a function’s environment( formal variables). 
            • run ls(environment(function))
              • e.g. ls(environment(cube))
          • To get the closure environment,
            • run get("@variable", environment(function))
    • Difference between Lexical and dynamic scoping
      • In lexical scoping, the value of a variable is looked up in the environment in which the function was defined
      • In dynamic scoping, the value of a variable is looked up in the environment from which the function was called( calling environment). 
      • E.G.

                                   y<- 10

                                   f<- function(x)
                                   {
                                        y<-2
                                        y^2 + g(x)
                                   }
                                   g<- function(x)
                                   {
                                        x*y
                                   }
                                   
                                   #Using lexical scoping, the value of y in f() is 10. y is looked up in the global environment
                                   # Using dynamic scoping, the value of y in f() is 2. 

    • Consequence of Lexical scoping
      • All object must be stored in memory
      • All functions must carry a pointer to their respective defining environments, which could be anywhere
      • Makes it easy for objective functions to be built
        • Functions that carry around all their data and you’re sure the values used are always the same
Dates and times in R

  • Dates are represented by the date class
    • Dates are stored internally as the number of days since 1970-01-01
    • Format is YY-MM-DD
    • Dates can be coerced from a character string using as.Date()
  • Times are represented by the POSIXct and POSIXlt class
    • Times are stored internally as the number of seconds since 1970-01-01
    • They keep track of leap year, seconds, daylight savings and time zones
    • POSIXct is a very large integer. useful to store times in a data frame
    • POSIXlt is a list underneath and stores a other useful information e.g. day of week , day of year, month, day of month.
      • A component can be extracted using $.
        • @POSIXltVariable$sec extracts the value of seconds in the variable.
    • Times can be coerced form a character string using as.POSIXlt() or as.POSIXct()
    • Sys.time() is used to print the system time.
  • There are generic functions that work on dates and times
    • weekdays(): gives the day of the week
    • months(): give the month name
    • quarters(): gives the quarter number("Q1", "Q2","Q3","Q4")
  • unclass() can be used to show the content of date and time classed variables in R
    • Date class
    • POSIXct
      • the output is an integer showing the number of seconds since 1970-01-01
    • POSIXlt
      • The output is a list showing name() for the contents of the variable.
  • strptime() is used to correct the format of a date/time variable if it’s written in a different format from YY-MM-DD
    • A format string is used to indicate the format the date is stored in.
    • To get the format string, get the format from ?strptime.
  • Operations on dates and times
    • You can perform mathematical operations + and –
    • You can perform comparison operations 
    • For time objects, They both have to be of the same class
Loop Functions
  • They are usually in the *apply format
    • lapply(): loop over a list and evaluate a function on each element of the list.
      • Always returns a list
      • implemented using
        • lapply(@list, @function, …)
          • if the first argument is not a list, the the function applies as.list, else an error occurs.
        • To pass an argument to @function, specify them in the …
          • lapply(x, runif, min=0, max=10) ## this passes arguments min and max to runif.
    • sapply(): same as lapply except it simplifies the result
      • Simplifies the result of lapply if possible.
        • If the result is a list where every element is length 1, then a vector is returned
        • If the result is a list where every element is a vector of the same length(>1), a matrix is returned
        • if it can’t figure things out, a list is returned.
        • implemented using
          • lapply(@list, @function, …)
    • apply(): apply a function over the margins of an array
      • Used to apply a function to the rows or columns of a matrix
      • Can be used with general arrays e.g. taking the average of an array of matrices
      • Implemented using
        •  apply(@array, @marginToBeRetained, @function, …)
          • @marginToBeRetained is a vector that is gotten from the dimensions of the array i.e. @array(@Dimension1, @Dimension2)
            • When @marginToBeRetained = 1, It applies the function over the second dimension of the array and returns a list with length(@Dimension1) number of items
            • When @marginToBeRetained = 2, It applies the function over the first dimension of the array and returns a list with length(@Dimension2) number of items
            • If the array has more than 2 dimensions, @marginToBeRetained can return an array of the dimensions specified. 
              • e..g. apply(x, c(1,2),mean) returns a 2 dimensional array(matrix).
      • Shortcuts already exists for calculating the sum and mean of arrays, and they operate much faster than apply()
        • rowSums, rowMeans, colSums, colMeans.
    • tapply(): apply a function over subsets of a vector
      • implemented using 
        • tapply(@Vector, @listOfFactors, @function, …,[SIMPLIFY= TRUE])
        • len(@Vector) must be == len(@listOfFactors)
    • mapply(): multivariate version of lapply()
      • It applies a function in parallel over a set of arguments.
      • Can be used to apply a function to multiple lists.
      • implemented using 
        • mapply(@function, …, @MoreArgsToFunction, [SIMPLIFY = TRUE])
          • … contains the arguments to apply over
          • the number of arguments that @function takes must be equal to the number of lists passed to mapply()
      • Used to vectorize a function 
        • Allows a function that doesn’t usually take vectors to do so.
  • split(): used to split objects in groups determined by a factor or a list of factors
    • used in conjunction with lapply() and sapply()
    • implemented using
      • split(@vector, @ListOfFactors,[drop = FALSE], …)
      • the split vector splits @vector into len(@ListOfFactors) groups
  • loop functions use anonymous functions
    • Anonymous functions are functions without names
      • implemented in format
        • @LoopFunction(@variable, @FunctionDefinition)
        • e.g. 
          • lapply(x, function(elt) elt[,1]) #This creates a function that accepts the argument elt and returns the first column of the argument
Debugging
  • For figuring out what’s wrong.
  • Indications that something is not right
    • message: A generic notification/diagnostic message produced by the message function; execution of the function continues
    • warning: An indication that something is wrong but not necessarily fatal; execution of the function continues; generated by the warning function
    • error: An indication that a fatal problem has occurred; execution stops; produced by the stop function
    • condition: Programmer created.
  • Debugging tools
    • traceback(): Prints out the function call stack after an error occurs; does nothing if there’s no error. 
    • debug
      • Flags a function for debug mode which allows you to step through execution of a function one line at a time
    • browser
      • suspends the execution of function wherever it is called and puts the function in debug mode
    • trace: allows you to insert debugging code into a function at specific places
    • recover: allows you to modify the error behavior so that you can browse the function call stack.
  • str()
    • A diagnostic function and an alternative to summary().
    • Answers the question what’s in this object? in a compact format.
    • str means structure.

Simulation

  • Generating random numbers
    • Functions for probability distributions in R
      • rnorm(): generate random normal variates with a given mean and standard variation
      • dnorm(): evaluate the normal probability density( with a given mean/SD) at a point(or vector of points)
      • pnorm(): evaluate the cumulative distribution function for a normal distribution
      • rpois(): generates random poisson variates with a given rate
    • probability distribution functions usually have four functions associated with them. Their prefix shows their use
      • d for density
      • r for random number generation
      • p for cumulative distribution
      • q for quantile function
    • set.seed(@integer) is used to ensure reproducibility
      • It is best to set this so the randomization is reproducible.
  • Generating random numbers from a linear model.
    • to generate binary number, use rbinom()
  • Generating random numbers from a generalized linear model
  • Random sampling
    • sample() allows you draw randomly from arbitrary distributions
      • implemented using sample(@vector, @NumberOfSamples, [replace = FALSE])
        • the replace argument allows you to simulate simulation with replacement of a factor. 
R profiler
  • Useful for understanding why an operation is taking time to complete
  • A way to examine how much time is spent in different parts of a program
  • Useful when trying to optimize code.
  • system.time()
    • Takes a R expression as input and returns the amount of time taken to evaluate the expression
      • returns an object of class proc_time that includes 
        • user: the time charged to the CPU(s) for this expression
        • elapsed: duration the function ran for.
    • Usually user time and elapsed time are close for straight computing tasks
      • elapsed time may be greater than user time if the CPU spends a lot of time waiting around
      • elapsed time may be smaller than user time if your machine has multiple cores and is capable of using them
        • multi-threaded libraries
        • parallel processing via the parallel processing
    • implemented using system.time(@Expression)
  • Rprof()
    • Start the profiler in R
    • summaryRprof() summarizes the output from Rprof()
      • tabulates the output and calculates how much time is spent in which function
      • There are two methods for normalizing the data
        • by.total: divides the time spent in each function by the total run time
        • by.self: subtracts time spent in functions below in the call stack then does by.total after
          • shows the actual time spent per function
    • Do not use system.time() and Rprof() together 
    • Keeps track of the function call stack at regularly sampled intervals and tabulates how much time is spent in each function.
      • default sampling interval is 0.02 seconds

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s