Thursday, 13 April 2017

Introduction to R


Types of objects: vector, matrix, table, data frame, function.


To assign a value to some variable we have to use assignment operator (<- or =):

If variable table contains rows and columns from some table (matrix) and columns have names like column1, column2 etc...we can access columns as variables if we use dollar sign notation:


We can also use dollar sign notation to add a new column to the table:

table$column1_log <- apply(column1, log, table)

> typeof(credit_samples[1])
[1] "list"
> typeof(credit_samples)
[1] "list"
> typeof(credit_samples[[1]])
[1] "environment"

File System

Tilde (~) is a symbol of Home (User) directory in Linux and expands to /home/username. This shortcut is very convenient as it hides absolute path (and user name).

To expand tilde and get absolute path we can use base::path.expand function:

> base::path.expand('~/projectA/file1')
[1] "/home/some_user_name/projectA/file1

String operations

To concatenate two or more strings we can use base::paste which inserts SPACE character between strings or base::paste0 which does not insert anything between concatenated strings:


Data Exploration



To print first 6 rows from some particular column in a data frame, use column's name:

To get summary for each column in the data frame (table) use base::summary function:

For numerical types summary contains the following values:
  • minimum
  • maximum
  • mean 
  • median
  • 1st quantile
  • 3rd quantile
  • number of Not Available values (NAs) 

To get first n rows (6 by default) of vector, matrix, table, data frame or function use utils:head:

To specify number of rows, set n argument:

Use utils::tail to display last n rows.

If n is negative number, these two methods will return all rows apart from first/last n rows.


To find out elements which belong to one but not to another set we can use setdiff:
> a <- 1:5=""> a
[1] 1 2 3 4 5
> b <- 3:8=""> b
[1] 3 4 5 6 7 8
> setdiff(a, b)
[1] 1 2
> setdiff(b, a)
[1] 6 7 8

Data Manipulation

c - combines its arguments to form a vector:

To transform specific elements from data frame or elements in bulk (entire row or column) use apply(X, MARGIN, FUN, ...). X is vector or matrix, MARGIN is a vector with indices determining on which rows, columns or elements function FUN shall be applied. Set MARGIN to 1 to denote rows, 2 to denote columns, c(1, 2) to denote specific element in 1st row and 2nd column.

apply(data[, "Credit Amount"], 1, log)
1 6.955593
2 7.937017
3 6.734592
4 7.660114
5 7.682943
6 7.714677

[1000 rows x 1 column]


To get help on any function type double question mark in front of its name. If package is not specified RStudio will list in Help tab all functions with given name from all packages:
> ??apply

It is also possible to specify the package name before the name of the function:
> ??base::apply

1 comment:

micheal pan said...

BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
you can be to get the new PROGRAMMED blank ATM card that is capable of
hacking into any ATM machine,anywhere in the world. I got to know about 
this BLANK ATM CARD when I was searching for job online about a month 
ago..It has really changed my life for good and now I can say I'm rich and 
I can never be poor again. The least money I get in a day with it is about 
$50,000.(fifty thousand USD) Every now and then I keeping pumping money 
into my account. Though is illegal,there is no risk of being caught 
,because it has been programmed in such a way that it is not traceable,it 
also has a technique that makes it impossible for the CCTVs to detect 
you..For details on how to get yours today, email the hackers on : ( ). Tell your 
loved once too, and start to live large. That's the simple testimony of how 
my life changed for good...Love you all ...the email address again is ;