NOTE: WORK IN PROGRESS - THIS IS AN UNFINISHED ARTICLE
Types of objects: vector, matrix, table, data frame, function.
Variables
To assign a value to some variable we have to use assignment operator (<- or =):
If variable table contains rows and columns from some table (matrix) and columns have names like column1, column2 etc...we can access columns as variables if we use dollar sign notation:
table$column1
We can also use dollar sign notation to add a new column to the table:
table$column1_log <- apply(column1, log, table)
> typeof(credit_samples[1])
[1] "list"
> typeof(credit_samples)
[1] "list"
> typeof(credit_samples[[1]])
[1] "environment"
[1] "list"
> typeof(credit_samples)
[1] "list"
> typeof(credit_samples[[1]])
[1] "environment"
File System
Tilde (~) is a symbol of Home (User) directory in Linux and expands to /home/username. This shortcut is very convenient as it hides absolute path (and user name).
To expand tilde and get absolute path we can use base::path.expand function:
> base::path.expand('~/projectA/file1')
[1] "/home/some_user_name/projectA/file1
[1] "/home/some_user_name/projectA/file1
String operations
To concatenate two or more strings we can use base::paste which inserts SPACE character between strings or base::paste0 which does not insert anything between concatenated strings:
cat
Data Exploration
print
colnames
To print first 6 rows from some particular column in a data frame, use column's name:
To get summary for each column in the data frame (table) use base::summary function:
For numerical types summary contains the following values:
- minimum
- maximum
- mean
- median
- 1st quantile
- 3rd quantile
- number of Not Available values (NAs)
To get first n rows (6 by default) of vector, matrix, table, data frame or function use utils:head:
To specify number of rows, set n argument:
Use utils::tail to display last n rows.
If n is negative number, these two methods will return all rows apart from first/last n rows.
nrow
To find out elements which belong to one but not to another set we can use setdiff:
> a <- 1:5=""> a
[1] 1 2 3 4 5
> b <- 3:8=""> b
[1] 3 4 5 6 7 8
> setdiff(a, b)
[1] 1 2
> setdiff(b, a)
[1] 6 7 8
[1] 1 2 3 4 5
> b <- 3:8=""> b
[1] 3 4 5 6 7 8
> setdiff(a, b)
[1] 1 2
> setdiff(b, a)
[1] 6 7 8
Data Manipulation
c - combines its arguments to form a vector:
To transform specific elements from data frame or elements in bulk (entire row or column) use apply(X, MARGIN, FUN, ...). X is vector or matrix, MARGIN is a vector with indices determining on which rows, columns or elements function FUN shall be applied. Set MARGIN to 1 to denote rows, 2 to denote columns, c(1, 2) to denote specific element in 1st row and 2nd column.
apply(data[, "Credit Amount"], 1, log)
C1
1 6.955593
2 7.937017
3 6.734592
4 7.660114
5 7.682943
6 7.714677
[1000 rows x 1 column]
setdiff
To get help on any function type double question mark in front of its name. If package is not specified RStudio will list in Help tab all functions with given name from all packages:
> ??apply
It is also possible to specify the package name before the name of the function:
> ??base::apply
https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html
https://wiki.mobilizingcs.org/rstudio/examining_data
https://www.stat.berkeley.edu/~spector/R.pdf
https://stat.ethz.ch/R-manual/R-devel/library/base/html/c.html
No comments:
Post a Comment