Thursday, 13 April 2017

Introduction to R


NOTE: WORK IN PROGRESS - THIS IS AN UNFINISHED ARTICLE

Types of objects: vector, matrix, table, data frame, function.

Variables


To assign a value to some variable we have to use assignment operator (<- or =):




If variable table contains rows and columns from some table (matrix) and columns have names like column1, column2 etc...we can access columns as variables if we use dollar sign notation:

table$column1

We can also use dollar sign notation to add a new column to the table:

table$column1_log <- apply(column1, log, table)


> typeof(credit_samples[1])
[1] "list"
> typeof(credit_samples)
[1] "list"
> typeof(credit_samples[[1]])
[1] "environment"


File System


Tilde (~) is a symbol of Home (User) directory in Linux and expands to /home/username. This shortcut is very convenient as it hides absolute path (and user name).

To expand tilde and get absolute path we can use base::path.expand function:

> base::path.expand('~/projectA/file1')
[1] "/home/some_user_name/projectA/file1


String operations


To concatenate two or more strings we can use base::paste which inserts SPACE character between strings or base::paste0 which does not insert anything between concatenated strings:


cat


Data Exploration


print

colnames

To print first 6 rows from some particular column in a data frame, use column's name:


To get summary for each column in the data frame (table) use base::summary function:


For numerical types summary contains the following values:
  • minimum
  • maximum
  • mean 
  • median
  • 1st quantile
  • 3rd quantile
  • number of Not Available values (NAs) 

To get first n rows (6 by default) of vector, matrix, table, data frame or function use utils:head:


To specify number of rows, set n argument:


Use utils::tail to display last n rows.

If n is negative number, these two methods will return all rows apart from first/last n rows.

nrow

To find out elements which belong to one but not to another set we can use setdiff:
> a <- 1:5=""> a
[1] 1 2 3 4 5
> b <- 3:8=""> b
[1] 3 4 5 6 7 8
> setdiff(a, b)
[1] 1 2
> setdiff(b, a)
[1] 6 7 8

Data Manipulation


c - combines its arguments to form a vector:


To transform specific elements from data frame or elements in bulk (entire row or column) use apply(X, MARGIN, FUN, ...). X is vector or matrix, MARGIN is a vector with indices determining on which rows, columns or elements function FUN shall be applied. Set MARGIN to 1 to denote rows, 2 to denote columns, c(1, 2) to denote specific element in 1st row and 2nd column.

apply(data[, "Credit Amount"], 1, log)
C1
1 6.955593
2 7.937017
3 6.734592
4 7.660114
5 7.682943
6 7.714677

[1000 rows x 1 column]

setdiff

To get help on any function type double question mark in front of its name. If package is not specified RStudio will list in Help tab all functions with given name from all packages:
> ??apply

It is also possible to specify the package name before the name of the function:
> ??base::apply

https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html
https://wiki.mobilizingcs.org/rstudio/examining_data
https://www.stat.berkeley.edu/~spector/R.pdf
https://stat.ethz.ch/R-manual/R-devel/library/base/html/c.html

No comments: