Friday, 28 April 2017

How to build Chromium on Ubuntu


I followed official instructions and, with minor modifications, managed to build and run Chromium on 64-bit Ubuntu 16.04. Entire process on PC with Intel i7 with 16GB RAM takes couple of hours. I had Git and Python installed already so went straight into the process:

Clone depot_tools repository (this was in my ~/dev directory):

$ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git


The next step was to append depot_tools path to to PATH. I first tried:

$ export PATH="$PATH:~/dev/depot_tools"


This made running install-build-deps.sh script to fail as described here. I therefore applied suggestion from that forum thread and used the path in the following form:

$ export PATH="$PATH:${HOME}/dev/depot_tools"


Create directory for Chromium source code

$ mkdir chromium
$ cd chromium


Check out the code (without the full repo history) and its dependencies:

$ fetch --nohooks --no-history chromium


Fetch creates scr directory:

$ cd src


Install additional build dependencies:

$ ./build/install-build-deps.sh


Run hooks which fetch additional binaries:

$ gclient runhooks


Create a build directory:

$ gn gen out/Default


Set gn arguments in order to speed up build. The following command will open a config file in vi editor:

$ gn args out/Default


I added the following lines:

enable_nacl=false
symbol_level=1
emove_webcore_debug_symbols=true


Build Chromium:

$ ninja -C out/Default chrome


Run built Chromium executable:

$ ./out/Default/chrome


Et voila! Chromium opens:



NOTE: I didn't install Google API keys and this is the reason for the notification which appears in the browser.

Wednesday, 26 April 2017

How to add a NuGet package to C# project in VSCode on Ubuntu

In VSCode open Terminal and use dotnet CLI command package and specify the name of the desired package (e.g. Newtonsoft.Json):


$ dotnet add package Newtonsoft.Json



This will add a reference to that NuGet package in project:



Output in VSCode:


This can be verified in the project file:



VSCode notifies us that there are unresolved dependencies:


If we click on Restore newtonsoft.json library gets downloaded to ~/.nuget/packages directory.

After resolving them we can start using object from the newly added package. Intellisense also works for new dependency assembly:


Sunday, 23 April 2017

Strategy Pattern

Problem:

Context has to be able to apply different Algorithms (strategies, actions, behaviours) in the runtime but is coupled with their implementations. It contains all possible concrete implementations of an Algorithm family and has to change if:
  • implementation of some Algorithm has to change
  • a new Algorithm has to be added or some existing has to be removed
This breaks Single Responsibility Principle (Context has to change for more than one reason) and Open-Closed Principle (Context has to be modified if list of Algorithms gets extended).

Solution:

Remove Algorithm implementations out of the Context and separate them in their own classes which implement new interface IStrategy with method DoAlgorithm(). Introduce lookup table (Dictionary) which keeps all IStrategy implementations. When Context receives key from the input, it looks up the Dictionary and calls IStrategy implementation which matches the given key.

References:

Strategy pattern
Applying Strategy Pattern Instead of Using Switch Statements
Strategy

Saturday, 22 April 2017

How to run .NET Core console application in VSCode on Ubuntu


In the previous article I demonstrated how to create simple .NET Core "Hello, world!" console application and here I want to show how can we load, run and debug that project in VSCode.

In VSCode, open TestProject directory. All generated files are shown in the left pane. VSCode downloads and installs required packages:




Required assets are standard VSCode JSON files so after we answer Yes to the first question .vscode directory appears:


Clicking on Restore triggers restoring packages:


If we hit F5, VSCode will execute the program:


We can set the breakpoints as well:


How to create .NET Core Console application on Ubuntu


To create .NET project of desired type we can use .NET Core command line tool (dotnet). Let's see the list of all possible project types:

$ dotnet new

Template Instantiation Commands for .NET Core CLI.

Usage: dotnet new [arguments] [options]

Arguments:
template The template to instantiate.

Options:
-l|--list List templates containing the specified name.
-lang|--language Specifies the language of the template to create
-n|--name The name for the output being created. If no name is specified, the name of the current directory is used.
-o|--output Location to place the generated output.
-h|--help Displays help for this command.
-all|--show-all Shows all templates


Templates Short Name Language Tags
----------------------------------------------------------------------
Console Application console [C#], F# Common/Console
Class library classlib [C#], F# Common/Library
Unit Test Project mstest [C#], F# Test/MSTest
xUnit Test Project xunit [C#], F# Test/xUnit
ASP.NET Core Empty web [C#] Web/Empty
ASP.NET Core Web App mvc [C#], F# Web/MVC
ASP.NET Core Web API webapi [C#] Web/WebAPI
Solution File sln Solution

Examples:
dotnet new mvc --auth None --framework netcoreapp1.1
dotnet new classlib
dotnet new --help

To create Console application project we have to use console:
$ dotnet new console -o TestProject -n HelloWorld
Content generation time: 54.4945 ms
The template "Console Application" created successfully.

This creates a directory TestProject and in it project named HelloWorld and intial source code file:
$ ls
TestProject

$ cd TestProject/

/TestProject$ ls
HelloWorld.csproj Program.cs

HelloWorld.csproj:
/TestProject$ cat HelloWorld.csproj


Program.cs:
/TestProject$ cat Program.cs


Let's now update dependencies (NuGet packages) and tools specified in the project:
/TestProject$ dotnet restore
Restoring packages for /home/bojan/Downloads/test/TestProject/HelloWorld.csproj...
Generating MSBuild file /home/bojan/Downloads/test/TestProject/obj/HelloWorld.csproj.nuget.g.props.
Generating MSBuild file /home/bojan/Downloads/test/TestProject/obj/HelloWorld.csproj.nuget.g.targets.
Writing lock file to disk. Path: /home/bojan/Downloads/test/TestProject/obj/project.assets.json
Restore completed in 492.24 ms for /home/bojan/Downloads/test/TestProject/HelloWorld.csproj.

NuGet Config files used:
/home/bojan/.nuget/NuGet/NuGet.Config

Feeds used:
https://api.nuget.org/v3/index.json

This creates obj directory and various config files:
/TestProject$ ls
HelloWorld.csproj obj Program.cs

/TestProject$ cd obj/

/TestProject/obj$ ls
HelloWorld.csproj.nuget.g.props HelloWorld.csproj.nuget.g.targets project.assets.json

HelloWorld.csproj.nuget.g.props:
/TestProject/obj$ cat HelloWorld.csproj.nuget.g.props


HelloWorld.csproj.nuget.g.targets:
/TestProject/obj$ cat HelloWorld.csproj.nuget.g.targets


project.assets.json:
/TestProject/obj$ cat project.assets.json


We can now build the project and run the binary output:
/TestProject$ dotnet run
Hello World!

This command built the project and placed binary output and other build artifacts in newly created bin directory:
/TestProject$ ls
bin HelloWorld.csproj obj Program.cs

/TestProject$ cd bin

/TestProject/bin$ ls
Debug

/TestProject/bin$ cd Debug/

/TestProject/bin/Debug$ ls
netcoreapp1.1

/TestProject/bin/Debug$ cd netcoreapp1.1/

/TestProject/bin/Debug/netcoreapp1.1$ ls
HelloWorld.deps.json HelloWorld.dll HelloWorld.pdb HelloWorld.runtimeconfig.dev.json HelloWorld.runtimeconfig.json

.deps.json (dependencies JSON file) lists dependencies of the application:
/TestProject/bin/Debug/netcoreapp1.1$ cat HelloWorld.deps.json


.runtimeconfig.dev.json:
/TestProject/bin/Debug/netcoreapp1.1$ cat HelloWorld.runtimeconfig.dev.json


.runtimeconfig.json file specifies the shared runtime and its version for the application:
/TestProject/bin/Debug/netcoreapp1.1$cat HelloWorld.runtimeconfig.json


It might seem unexpected that the binary output is not .exe but .dll. This is because default .NET Core's deployment model is Framework-dependent deployment, where output assembly contains only compiled source and 3rd party dependencies but not .NET Core dependencies - assembly assumes that .NET Core Framework and runtime are installed on the target machine. This is why we have to use dotnet tool to run it:

/TestProject/bin/Debug/netcoreapp1.1$ dotnet HelloWorld.dll
Hello World!

The other type of deployment is Self-contained deployment in which case the output assembly is .exe and contains .NET Core dependencies and runtime - nothing else is necessary to be installed on the target system.


References:

.NET Core application deployment
.NET Core command-line interface (CLI) tools

How to install .NET Core on Ubuntu 16.04


.NET Core Installation


It is enough to follow .NET Core installation guide. Select Linux and distro Ubuntu, Mint and then follow instructions for Ubuntu 16.04.

Add .NET Core repository to the local repository list:
$ sudo sh -c 'echo "deb [arch=amd64] https://apt-mo.trafficmanager.net/repos/dotnet-release/ xenial main" > /etc/apt/sources.list.d/dotnetdev.list'

Import key from the key server:
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 417A0893
Executing: /tmp/tmp.fX22g9wfIT/gpg.1.sh --keyserver
hkp://keyserver.ubuntu.com:80
--recv-keys
417A0893
gpg: requesting key 417A0893 from hkp server keyserver.ubuntu.com
gpg: key 417A0893: public key "MS Open Tech " imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)

Update package info:
$ sudo apt-get update

Install .NET Core:
$ sudo apt-get install dotnet-dev-1.0.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
linux-headers-4.4.0-66 linux-headers-4.4.0-66-generic linux-image-4.4.0-66-generic
linux-image-extra-4.4.0-66-generic
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
dotnet-host dotnet-hostfxr-1.0.1 dotnet-hostfxr-1.1.0 dotnet-sharedframework-microsoft.netcore.app-1.0.4
dotnet-sharedframework-microsoft.netcore.app-1.1.1 liblldb-3.6 libllvm3.6v5 liblttng-ust-ctl2 liblttng-ust0 liburcu4
The following NEW packages will be installed
dotnet-dev-1.0.1 dotnet-host dotnet-hostfxr-1.0.1 dotnet-hostfxr-1.1.0
dotnet-sharedframework-microsoft.netcore.app-1.0.4 dotnet-sharedframework-microsoft.netcore.app-1.1.1 liblldb-3.6
libllvm3.6v5 liblttng-ust-ctl2 liblttng-ust0 liburcu4
0 to upgrade, 11 to newly install, 0 to remove and 7 not to upgrade.
Need to get 113 MB of archives.
After this operation, 341 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://gb.archive.ubuntu.com/ubuntu xenial/main amd64 libllvm3.6v5 amd64 1:3.6.2-3ubuntu2 [8,075 kB]
Get:2 https://apt-mo.trafficmanager.net/repos/dotnet-release xenial/main amd64 dotnet-host amd64 1.1.0-preview1-001100-00-1 [33.7 kB]
Get:3 https://apt-mo.trafficmanager.net/repos/dotnet-release xenial/main amd64 dotnet-hostfxr-1.0.1 amd64 1.0.1-1 [123 kB]
Get:4 https://apt-mo.trafficmanager.net/repos/dotnet-release xenial/main amd64 dotnet-sharedframework-microsoft.netcore.app-1.0.4 amd64 1.0.4-1 [22.6 MB]
Get:5 http://gb.archive.ubuntu.com/ubuntu xenial/main amd64 liblldb-3.6 amd64 1:3.6.2-3ubuntu2 [7,303 kB]
Get:6 http://gb.archive.ubuntu.com/ubuntu xenial/universe amd64 liburcu4 amd64 0.9.1-3 [47.3 kB]
Get:7 http://gb.archive.ubuntu.com/ubuntu xenial/universe amd64 liblttng-ust-ctl2 amd64 2.7.1-1 [72.2 kB]
Get:8 http://gb.archive.ubuntu.com/ubuntu xenial/universe amd64 liblttng-ust0 amd64 2.7.1-1 [127 kB]
Get:9 https://apt-mo.trafficmanager.net/repos/dotnet-release xenial/main amd64 dotnet-hostfxr-1.1.0 amd64 1.1.0-1 [124 kB]
Get:10 https://apt-mo.trafficmanager.net/repos/dotnet-release xenial/main amd64 dotnet-sharedframework-microsoft.netcore.app-1.1.1 amd64 1.1.1-1 [22.9 MB]
Get:11 https://apt-mo.trafficmanager.net/repos/dotnet-release xenial/main amd64 dotnet-dev-1.0.1 amd64 1.0.1-1 [51.4 MB]
Fetched 113 MB in 13min 21s (141 kB/s)
Selecting previously unselected package libllvm3.6v5:amd64.
(Reading database ... 307849 files and directories currently installed.)
Preparing to unpack .../libllvm3.6v5_1%3a3.6.2-3ubuntu2_amd64.deb ...
Unpacking libllvm3.6v5:amd64 (1:3.6.2-3ubuntu2) ...
Selecting previously unselected package liblldb-3.6.
Preparing to unpack .../liblldb-3.6_1%3a3.6.2-3ubuntu2_amd64.deb ...
Unpacking liblldb-3.6 (1:3.6.2-3ubuntu2) ...
Selecting previously unselected package liburcu4:amd64.
Preparing to unpack .../liburcu4_0.9.1-3_amd64.deb ...
Unpacking liburcu4:amd64 (0.9.1-3) ...
Selecting previously unselected package liblttng-ust-ctl2:amd64.
Preparing to unpack .../liblttng-ust-ctl2_2.7.1-1_amd64.deb ...
Unpacking liblttng-ust-ctl2:amd64 (2.7.1-1) ...
Selecting previously unselected package liblttng-ust0:amd64.
Preparing to unpack .../liblttng-ust0_2.7.1-1_amd64.deb ...
Unpacking liblttng-ust0:amd64 (2.7.1-1) ...
Selecting previously unselected package dotnet-host.
Preparing to unpack .../dotnet-host_1.1.0-preview1-001100-00-1_amd64.deb ...
Unpacking dotnet-host (1.1.0-preview1-001100-00-1) ...
Selecting previously unselected package dotnet-hostfxr-1.0.1.
Preparing to unpack .../dotnet-hostfxr-1.0.1_1.0.1-1_amd64.deb ...
Unpacking dotnet-hostfxr-1.0.1 (1.0.1-1) ...
Selecting previously unselected package dotnet-sharedframework-microsoft.netcore.app-1.0.4.
Preparing to unpack .../dotnet-sharedframework-microsoft.netcore.app-1.0.4_1.0.4-1_amd64.deb ...
Unpacking dotnet-sharedframework-microsoft.netcore.app-1.0.4 (1.0.4-1) ...
Selecting previously unselected package dotnet-hostfxr-1.1.0.
Preparing to unpack .../dotnet-hostfxr-1.1.0_1.1.0-1_amd64.deb ...
Unpacking dotnet-hostfxr-1.1.0 (1.1.0-1) ...
Selecting previously unselected package dotnet-sharedframework-microsoft.netcore.app-1.1.1.
Preparing to unpack .../dotnet-sharedframework-microsoft.netcore.app-1.1.1_1.1.1-1_amd64.deb ...
Unpacking dotnet-sharedframework-microsoft.netcore.app-1.1.1 (1.1.1-1) ...
Selecting previously unselected package dotnet-dev-1.0.1.
Preparing to unpack .../dotnet-dev-1.0.1_1.0.1-1_amd64.deb ...
Unpacking dotnet-dev-1.0.1 (1.0.1-1) ...
Processing triggers for libc-bin (2.23-0ubuntu7) ...
/sbin/ldconfig.real: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5 is not a symbolic link
/sbin/ldconfig.real: /usr/lib/nvidia-375/libEGL.so.1 is not a symbolic link
/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 is not a symbolic link
Processing triggers for man-db (2.7.5-1) ...
Setting up libllvm3.6v5:amd64 (1:3.6.2-3ubuntu2) ...
Setting up liblldb-3.6 (1:3.6.2-3ubuntu2) ...
Setting up liburcu4:amd64 (0.9.1-3) ...
Setting up liblttng-ust-ctl2:amd64 (2.7.1-1) ...
Setting up liblttng-ust0:amd64 (2.7.1-1) ...
Setting up dotnet-host (1.1.0-preview1-001100-00-1) ...
Setting up dotnet-hostfxr-1.0.1 (1.0.1-1) ...
Setting up dotnet-sharedframework-microsoft.netcore.app-1.0.4 (1.0.4-1) ...
Setting up dotnet-hostfxr-1.1.0 (1.1.0-1) ...
Setting up dotnet-sharedframework-microsoft.netcore.app-1.1.1 (1.1.1-1) ...
Setting up dotnet-dev-1.0.1 (1.0.1-1) ...
Processing triggers for libc-bin (2.23-0ubuntu7) ...
/sbin/ldconfig.real: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5 is not a symbolic link
/sbin/ldconfig.real: /usr/lib/nvidia-375/libEGL.so.1 is not a symbolic link
/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 is not a symbolic link

To verify installation query .NET Core version:
$ dotnet --version
1.0.1

.NET Core Uninstallation


Verify which version has been installed:
$ sudo apt --installed list | grep "dotnet-dev"
dotnet-dev-1.0.1/xenial,now 1.0.1-1 amd64 [installed]

Uninstall it:
sudo apt-get remove dotnet-dev-1.0.1

Thursday, 13 April 2017

Introduction to H2O with R


H2O is scalable, open-source Machine Learning framework with interfaces is Python, R, Java, Scala and C++. It lays on the top of other major ML Frameworks (MXNet, Caffe, TensorFlow, etc.) and adds a layer of abstraction unifying and simplifying API for client/consumer applications. H2O can run in standalone mode, on Hadoop, or within a Spark cluster.


Prerequisities


RStudio
Installed package: R interface for H2O

Installation in RStudio:
install.packages("h2o")


Launching


To load h2o package and its namespace:
library(h2o)

To start and connect to H2O instance running on localhost and listening on port 54321:
h2o.init()
Connection successful!

R is connected to the H2O cluster:
H2O cluster uptime: 3 days 6 hours
H2O cluster version: 3.10.3.6
H2O cluster version age: 1 month and 19 days
H2O cluster name: H2O_started_from_R_bojan_lsr768
H2O cluster total nodes: 1
H2O cluster total memory: 3.46 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 2
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.2.3 (2015-12-10)

This command will start H2O on maximum 2 CPUs. If we want to use all CPUs on the system, we have to specify nthreads argument to have value -1:
h2o.init(nthreads = -1)


Importing Data


To import data from a file into H2O cloud we can use h2o.importFile or h2o.uploadFile functions.

If file resides on the sever, we have to use h2o.importFile and specify file's absolute path (on the server):
frame <- h2o.importFile(file_absolute_path)

This file can be e.g. CSV (Comma Separated Value) file.
The output type is an instance of H2OFrame class which represents a table (2D array).
If CSV file does not have specified column names, H2O will automatically assign names C1, C2... to such columns.

If we want to push file from a client onto the server, we have to use h2o.uploadFile and specify file's absolute path (on the client):
h2o.uploadFile(file_absolute_path)


Data Exploration


To get a string vector containing column names from the H2OFrame object:
h2o.colnames(frame)
[1] "Creditability" "Account Balance" "Duration of Credit (month)"
[4] "Payment Status of Previous Credit" "Purpose" "Credit Amount"
[7] "Value Savings/Stocks" "Length of current employment" "Instalment per cent"
[10] "Sex & Marital Status" "Guarantors" "Duration in Current address"
[13] "Most valuable available asset" "Age (years)" "Concurrent Credits"
[16] "Type of apartment" "No of Credits at this Bank" "Occupation"
[19] "No of dependents" "Telephone" "Foreign Worker"

To print first 6 rows from the H2OFrame object we can use h2o.head:
h2o.head(frame)

To get detailed report on each column (type, number of missing values etc...), use h2o.describe:
h2o.describe(frame)
Label Type Missing Zeros PosInf NegInf Min Max Mean Sigma Cardinality
1 Creditability enum 4 300 0 0 0 1 0.698795180722892 0.459011997978603 2
2 Account Balance enum 0 274 0 0 0 3 4
3 Duration of Credit (month) int 0 0 0 0 4 72 20.903 12.0588144527564
4 Payment Status of Previous Credit enum 0 40 0 0 0 4 5
5 Purpose enum 0 234 0 0 0 9 10
6 Credit Amount int 0 0 0 0 250 18424 3271.248 2822.75175989565
7 Value Savings/Stocks enum 0 603 0 0 0 4 5
...

h2o.summary prints information for each column. It treats differently factor and columns of non-enum type. For factor columns it prints the statistics how many times each enum value occurs and how many values are missing ("NA"). For other columns it shows minimum, maximum, median, mean 1st and 3rd quantile:
h2o.summary(frame)
Creditability Account Balance Duration of Credit (month) Payment Status of Previous Credit Purpose Credit Amount
1 :696 4:394 Min. : 4.0 2:530 3:280 Min. : 250
0 :300 1:274 1st Qu.:12.0 4:293 0:234 1st Qu.: 1359
NA: 4 2:269 Median :18.0 3: 88 2:181 Median : 2304
3: 63 Mean :20.9 1: 49 1:103 Mean : 3271
3rd Qu.:24.0 0: 40 9: 97 3rd Qu.: 3958
Max. :72.0 6: 50 Max. :18424
...

h2o.str displays the structure of an H2OFrame object:
h2o.str(frame)
Class 'H2OFrame'
- attr(*, "op")= chr ":="
- attr(*, "eval")= logi TRUE
- attr(*, "id")= chr "RTMP_sid_a051_45"
- attr(*, "nrow")= int 1000
- attr(*, "ncol")= int 21
- attr(*, "types")=List of 21
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "int"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "int"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "int"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "enum"
..$ : chr "int"
- attr(*, "data")='data.frame': 10 obs. of 21 variables:
..$ Creditability : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2
..$ Account Balance : Factor w/ 4 levels "1","2","3","4": 1 1 2 1 1 1 1 1 4 2
..$ Duration of Credit (month) : num 18 9 12 12 12 10 8 6 18 24
..$ Payment Status of Previous Credit: Factor w/ 5 levels "0","1","2","3",..: 5 5 3 5 5 5 5 5 5 3
..$ Purpose : Factor w/ 10 levels "0","1","2","3",..: 3 1 9 1 1 1 1 1 4 4
..$ Credit Amount : num 1049 2799 841 2122 2171 ...
..$ Value Savings/Stocks : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 1 1 1 1 1 3
..$ Length of current employment : Factor w/ 5 levels "1","2","3","4",..: 2 3 4 3 3 2 4 2 1 1
..$ Instalment per cent : Factor w/ 4 levels "1","2","3","4": 4 2 2 3 4 1 1 2 4 1
..$ Sex & Marital Status : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 3 3 3 3 2 2
..$ Guarantors : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1
..$ Duration in Current address : Factor w/ 4 levels "1","2","3","4": 4 2 4 2 4 3 4 4 4 4
..$ Most valuable available asset : Factor w/ 4 levels "1","2","3","4": 2 1 1 1 2 1 1 1 3 4
..$ Age (years) : num 21 36 23 39 38 48 39 40 65 23
..$ Concurrent Credits : Factor w/ 3 levels "1","2","3": 3 3 3 3 1 3 3 3 3 3
..$ Type of apartment : Factor w/ 3 levels "1","2","3": 1 1 1 1 2 1 2 2 2 1
..$ No of Credits at this Bank : Factor w/ 4 levels "1","2","3","4": 1 2 1 2 2 2 2 1 2 1
..$ Occupation : Factor w/ 4 levels "1","2","3","4": 3 3 2 2 2 2 2 2 1 1
..$ No of dependents : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 1
..$ Telephone : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1
..$ Foreign Worker : num 1 1 1 2 2 2 2 2 1 1

To draw a histogram of values of some column:
h2o.hist(data[, "Height"])

We can also use dollar notation to specify desired column:
h2o.hist(data$Height)

This will divide range of all possible values in columns "Height" into 10 equal sub-ranges and for each of them draw a vertical bar showing occurrence frequency. Instead of specifying column name, we can specify column number:
h2o.hist(data[, 3])


Data Manipulation


Factor column is the one whose possible values belong to some infinite set of predefined values (like enum type in some programming languages). If we want to convert type of some H2OFrame data set column i (which is of type i.e. int) into enum we can use h2o.asfactor:
data[, i] <- h2o.asfactor(data[, i])


To split a single data set into multiple smaller ones, use h2o.splitFrame(frame, ratios, destination_frames, seed). frame is source data set (H2OFrame object). ratios is scalar or vector of percentages of parts; if scalar, it denotes percentage of the first part; if vector, sum of its elements must be equal to 1. seed is a random number.

frame.split = h2o.splitFrame(frame.hex, 0.7)

frame.split = h2o.splitFrame(frame.hex, ratios = c(0.2, 0.5))

Result is list of list of H2OFrame objects:
> typeof(credit_samples)
[1] "list"
> typeof(credit_samples[1])
[1] "list"
> typeof(credit_samples[[1]])
[1] "environment"

To extract H2OFrame object we can use double squared bracket notation:
frame.training_set <- frame.split[[1]] frame.test_set <- frame.split[[2]]


To create a new frame which contains rows grouped by values in some specific column we can use h2o.group_by (similar to SQL's GROUP BY):
h2o.group_by(frame, by="Creditability", nrow("Creditability"))
Creditability nrow_Creditability
1 4
2 0 300
3 1 696

h2o.group_by's arguments are: name of the original frame, column whose values are used for grouping and the aggregate function which is using values from the chosen column to map multiple rows into aggregate values - one per each group.

To calculate natural logarithm of values in specific column in the H2OFrame object we can use h2o.log. The output is a new column-vector with the same number of elements as the source vector:
h2o.log(data[, "Velocity"])
log(Velocity)
1 6.955593
2 7.937017
3 6.734592
4 7.660114
5 7.682943
6 7.714677

[1000 rows x 1 column]


Machine Learning Algorithms


Generalized Linear Model


For Generalized Linear Model use h2o.glm. Arguments are:
y - dependent variable. This is a string, name of the column in the frame.
x - list of predictors (independent/random variables). This is a vector of strings where each string is a name of the (independent variable) column in the table.
training_frame - training data set; H2OFrame object which represents table containing columns mentioned above.
family - response's distribution family which is a type of exponential family. Supported values are: "gaussian", "poisson", "binomial", "multinomial", "gamma", "tweedie".

model <- h2o.glm(y = "VOL", x = c("AGE", "RACE", "PSA", "GLEASON"), training_frame = frame, model_id = "glm_model1", family = "binomial") |==============================================================================================================================| 100%


Return value is a GLM model, an object of type H2OBinomialModel:

summary(model)
Model Details:
==============

H2OBinomialModel: glm
Model Key: glm_model1
GLM Model: summary
family link regularization number_of_predictors_total number_of_active_predictors
1 binomial logit Elastic Net (alpha = 0.5, lambda = 0.02103 ) 71 24
number_of_iterations training_frame
1 5 RTMP_sid_be14_25

H2OBinomialMetrics: glm
** Reported on training data. **

MSE: 0.1604356
RMSE: 0.4005442
LogLoss: 0.4897464
Mean Per-Class Error: 0.275817
AUC: 0.8095359
Gini: 0.6190719
R^2: 0.2323731
Null Deviance: 736.5862
Residual Deviance: 592.5932
AIC: 642.5932

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 104 76 0.422222 =76/180
1 55 370 0.129412 =55/425
Totals 159 446 0.216529 =131/605

Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.601181 0.849598 266
2 max f2 0.412327 0.926251 373
3 max f0point5 0.671558 0.848516 221
4 max accuracy 0.605270 0.783471 264
5 max precision 0.947226 1.000000 0
6 max recall 0.258939 1.000000 394
7 max specificity 0.947226 1.000000 0
8 max absolute_mcc 0.671558 0.470868 221
9 max min_per_class_accuracy 0.683051 0.744444 213
10 max mean_per_class_accuracy 0.671558 0.750196 221

Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`



Scoring History:
timestamp duration iteration negative_log_likelihood objective
1 2017-04-13 08:03:21 0.000 sec 0 368.29309 0.60575
2 2017-04-13 08:03:21 0.007 sec 1 308.02649 0.54301
3 2017-04-13 08:03:21 0.010 sec 2 305.03591 0.54149
4 2017-04-13 08:03:21 0.013 sec 3 304.82054 0.54148
5 2017-04-13 08:03:21 0.023 sec 4 296.50608 0.53809
6 2017-04-13 08:03:21 0.027 sec 5 296.29658 0.53809

Variable Importances: (Extract with `h2o.varimp`)
=================================================

Standardized Coefficient Magnitudes: standardized coefficient magnitudes
names coefficients sign
1 Account Balance.4 0.669816 POS
2 Account Balance.1 0.419182 NEG
3 Duration of Credit (month) 0.294272 NEG
4 Purpose.3 0.273682 POS
5 Payment Status of Previous Credit.4 0.269195 POS

---
names coefficients sign
66 Guarantors.2 0.000000 POS
67 Guarantors.3 0.000000 POS
68 Concurrent Credits.2 0.000000 POS
69 No of dependents.1 0.000000 POS
70 No of dependents.2 0.000000 POS
71 credit_amount_trnsf 0.000000 POS


Neural networks


h2o.deeplearning


Random Forest


rfHex <- h2o.randomForest(x=features, y="logSales", ntrees = 500, max_depth = 30, nbins_cats = 1115, training_frame=trainHex, validation_frame=validHex)




Model Analasys


Once model is trained, we can calculate its performance on a new (unseen) dataset by using h2o.performance. This new dataset has to have the same column names, types and dimensions as the data set used for training. Arguments are:
model - one of H2O objects representing trained model (e.g. H2OBinomialModel)
newdata - H2OFrame object representing table with unseen data
train, valid, xval - logical (boolean) values indicating whether function shall return training, validation and the cross-validation metrics (all constructed during training)

Return value is an object of one of H2O metrics types. E.g. if model is of type H2OBinomialModel then metrics is of type H2OBinomialMetyrics.

performance <- h2o.performance(model, newdata = test_frame)
> performance
H2OBinomialMetrics: glm

MSE: 0.1747882
RMSE: 0.4180768
LogLoss: 0.5196604
Mean Per-Class Error: 0.3554121
AUC: 0.7768143
Gini: 0.5536285
R^2: 0.1782966
Null Deviance: 482.3467
Residual Deviance: 406.3744
AIC: 456.3744

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 44 76 0.633333 =76/120
1 21 250 0.077491 =21/271
Totals 65 326 0.248082 =97/391

Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.547567 0.837521 325
2 max f2 0.283012 0.918644 390
3 max f0point5 0.613870 0.811103 278
4 max accuracy 0.558243 0.751918 319
5 max precision 0.968788 1.000000 0
6 max recall 0.283012 1.000000 390
7 max specificity 0.968788 1.000000 0
8 max absolute_mcc 0.613870 0.387921 278
9 max min_per_class_accuracy 0.692120 0.683333 223
10 max mean_per_class_accuracy 0.772244 0.705028 159

Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`

To calculate the accuracy of the model (the only supported model at the moment is H2OBinomialModel), we can use h2o.accuracy. Arguments are:
object - H2OModelMetrics object (H2OBinomialMetrics is currently the only one supported)
thresholds - a value or a list of values between 0.0 and 1.0

h2o.accuracy(performance, 0.95)
[[1]]
[1] 0.314578

To use the trained model on a test set in order to make predictions, we can use h2o.predict.
pred_creditability <- h2o.predict(glm_model1,credit_test)
|==============================================================================================================================| 100%
> pred_creditability
predict p0 p1
1 1 0.3593716 0.6406284
2 1 0.2807624 0.7192376
3 1 0.2209632 0.7790368
4 1 0.1332073 0.8667927
5 1 0.3779753 0.6220247
6 1 0.3902468 0.6097532

[392 rows x 3 columns]


References


https://www.rdocumentation.org/packages/h2o/versions/3.10.3.6/topics/h2o.init
https://www.rdocumentation.org/packages/h2o/versions/3.10.3.6/topics/h2o.importFile
https://www.rdocumentation.org/packages/h2o/versions/3.10.3.6/topics/h2o.str
https://www.rdocumentation.org/packages/h2o/versions/3.10.3.6/topics/h2o.group_by
https://www.rdocumentation.org/packages/h2o/versions/3.10.3.6/topics/h2o.log
https://www.rdocumentation.org/packages/h2o/versions/3.10.3.6/topics/h2o.colnames
https://rdrr.io/cran/h2o/man/h2o.splitFrame.html
http://h2o-release.s3.amazonaws.com/h2o/master/3574/docs-website/h2o-docs/data-munging/splitting-datasets.html
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/GLMBooklet.pdf
https://h2o-release.s3.amazonaws.com/h2o/rel-slater/9/docs-website/h2o-py/docs/frame.html

Introduction to R


NOTE: WORK IN PROGRESS - THIS IS AN UNFINISHED ARTICLE

Types of objects: vector, matrix, table, data frame, function.

Variables


To assign a value to some variable we have to use assignment operator (<- or =):




If variable table contains rows and columns from some table (matrix) and columns have names like column1, column2 etc...we can access columns as variables if we use dollar sign notation:

table$column1

We can also use dollar sign notation to add a new column to the table:

table$column1_log <- apply(column1, log, table)


> typeof(credit_samples[1])
[1] "list"
> typeof(credit_samples)
[1] "list"
> typeof(credit_samples[[1]])
[1] "environment"


File System


Tilde (~) is a symbol of Home (User) directory in Linux and expands to /home/username. This shortcut is very convenient as it hides absolute path (and user name).

To expand tilde and get absolute path we can use base::path.expand function:

> base::path.expand('~/projectA/file1')
[1] "/home/some_user_name/projectA/file1


String operations


To concatenate two or more strings we can use base::paste which inserts SPACE character between strings or base::paste0 which does not insert anything between concatenated strings:


cat


Data Exploration


print

colnames

To print first 6 rows from some particular column in a data frame, use column's name:


To get summary for each column in the data frame (table) use base::summary function:


For numerical types summary contains the following values:
  • minimum
  • maximum
  • mean 
  • median
  • 1st quantile
  • 3rd quantile
  • number of Not Available values (NAs) 

To get first n rows (6 by default) of vector, matrix, table, data frame or function use utils:head:


To specify number of rows, set n argument:


Use utils::tail to display last n rows.

If n is negative number, these two methods will return all rows apart from first/last n rows.

nrow

To find out elements which belong to one but not to another set we can use setdiff:
> a <- 1:5=""> a
[1] 1 2 3 4 5
> b <- 3:8=""> b
[1] 3 4 5 6 7 8
> setdiff(a, b)
[1] 1 2
> setdiff(b, a)
[1] 6 7 8

Data Manipulation


c - combines its arguments to form a vector:


To transform specific elements from data frame or elements in bulk (entire row or column) use apply(X, MARGIN, FUN, ...). X is vector or matrix, MARGIN is a vector with indices determining on which rows, columns or elements function FUN shall be applied. Set MARGIN to 1 to denote rows, 2 to denote columns, c(1, 2) to denote specific element in 1st row and 2nd column.

apply(data[, "Credit Amount"], 1, log)
C1
1 6.955593
2 7.937017
3 6.734592
4 7.660114
5 7.682943
6 7.714677

[1000 rows x 1 column]

setdiff

To get help on any function type double question mark in front of its name. If package is not specified RStudio will list in Help tab all functions with given name from all packages:
> ??apply

It is also possible to specify the package name before the name of the function:
> ??base::apply

https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html
https://wiki.mobilizingcs.org/rstudio/examining_data
https://www.stat.berkeley.edu/~spector/R.pdf
https://stat.ethz.ch/R-manual/R-devel/library/base/html/c.html

Sunday, 9 April 2017

Building and debugging C++ code in VSCode on Ubuntu

We have "Hello, world!" example in main.cpp file and want to build it with g++ compiler and debug it with gdb debugger in VSCode on Ubuntu.

Packages


C/C++ for Visual Studio Code (ms-vscode.cpptools)
C++ Intellisense (austin.code-gnu-global) (optional)

Building


We have to create and set up a task.

Open Command Palette (CTRL+SHIFT+P).
Type in task and select Tasks: Configure Task Runner and then Others.
tasks.json gets created in workspace's .vscode directory and shows up in the editor.

VSCode defines variables that can be used in tasks.json.

We can verify what will each of them expand into if we simply change args value for echo command in the default version of the config file.

tasks.json:
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "0.1.0",
"command": "echo",
"isShellCommand": true,
"args": ["${file}"],
"showOutput": "always"
}

If we press combination CTRL+SHIFT+B, the output will be something like:
/home/user/dev/cpp/my_project/.vscode/tasks.json

Similarly, ${workspaceRoot} expands to the path to the workspace's root directory so we just have to append the name of the .cpp file to get its full path:

tasks.json:
...
"args": ["${workspaceRoot}/main.cpp"],
...

Running the task now gives the following output:
/home/user/dev/cpp/my_project/main.cpp

If we change command to g++...

tasks.json:
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "0.1.0",
"command": "g++",
"isShellCommand": true,
"args": ["${workspaceRoot}/main.cpp"],
"showOutput": "always"
}

...after hitting CTRL+SHIFT+B g++ compiler compiles main.cpp and a.out appears in the my_project directory.

Running


If we hit F5, this launches the program.

launch.json:
{
"version": "0.2.0",
"configurations": [
{
"name": "C++ Launch",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceRoot}/a.out",
"args": [],
"stopAtEntry": false,
"cwd": "${workspaceRoot}",
"environment": [],
"externalConsole": true,
"linux": {
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
},
"osx": {
"MIMode": "lldb"
},
"windows": {
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
}
},
{
"name": "C++ Attach",
"type": "cppdbg",
"request": "attach",
"program": "enter program name, for example ${workspaceRoot}/a.out",
"processId": "${command:pickProcess}",
"linux": {
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
},
"osx": {
"MIMode": "lldb"
},
"windows": {
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
}
}
]
}

Debugging


In order to enable adding breakpoints we have to enable creation of debug information when building the source code. It is enough if we add -g to g++ arguments:

tasks.json:
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "0.1.0",
"command": "g++",
"isShellCommand": true,
"args": ["-g", "${workspaceRoot}/main.cpp"],
"showOutput": "always"
}