My Public Notepad

Thursday, 21 February 2019

Concurrency in Go - Notes on Coursera course

Week 1 - Why Use Concurrency?

Parallel Execution

big property of Go language is that concurrency is built in the language
concurrency and parallelism are two closely related ideas
in C, Python etc...you can do concurrent programming but you need to use some 3rd party library

Parallel Execution

parallel is not the same as concurrent
why do we need concurrency?
parallel execution is when two programs are executed at the same time
at some point in time instructions from two programs are executing in parallel

at time t and instruction is being performed for both P1 and P2

one processor core is executing 1 instruction at a time
if we want to have parallel execution, we need two processors or at least two processor cores
we need replicated hardware: e.g. CPU1 and CPU2
or if we have quad-core CPU then we can run 4 instructions in parallel, at the same time

Why use Parallel Execution?

tasks may be completed more quickly
you get a better throughput overall
example: two piles of dishes to wash

two dishwashers can complete twice as fast as one

BUT: some tasks must be performed sequentially

example: wash dish then dry dish; it has to be in this order

must wash before you can dry

some tasks are more parallelizable then other

we can't parallelize washing and drying of the same dish - even if we have hundred dishwashers and sinks available

some things can't be parallelized

Von Neumann Bottleneck

running concurrent product is hard
students usually learn sequential programming at least at undergraduate curriculum
concurrent programming is difficult
can we achieve speedup without Parallelism?
Solution 1: Design faster processors

get speedup without changing software
this is what used to be case till recently
hardware has been getting faster and faster but this really stopped now
faster = processor clock rate; this used to get doubled every year or so

limitation on a speed that we have now is called Von Neumann Bottleneck: delayed access to memory

CPU has to access memory to get instructions and data
memory is always slower than CPU
even if CPU has high clock rate, it has to wait on memory access; lots of time is wasted just waiting for memory access

Solution 2: design processors with more memory

build cache, fast memory on chip
that's what's traditionally been done till now
again, the same software, the same code would run faster

But this is not the case anymore

Moore's Law

it is not a physical law but more an observation of trends which used to be actual in past
Predicted that transistor density would double every two years
smaller transistors switch faster
exponential increase in density would lead to exponential increase in software speed
this is not the case anymore so software engineers had to do something

Power Wall

Power/Temperature Problem

the speedup that we get from Moore's Law can't continue because transistors consume power and power is becoming a critical issue - Power Wall
transistors consume power whenever they switch
as you increase number of transistors, power used also increases

smaller transistors use less power but transistor density scaling is much faster

nowadays many devices are portable and are using batteries; it is a limited power available
even if devices is plugged to a socket, the issue is a temperature

we need fans and heat sink above processors e.g. i7; with out them it would melt
high power leads to high temperature
air cooling (fans) can only remove so much heat

Dynamic Power

generic equation for Power: P = alpha * C * F * V^2

alpha - percent of time switching; transistors consume dynamic power when they switch (go from 0 to 1 or from 1 to 0; when they don't switch they don't use dynamic power)
C - capacitance (related to size; goes down as transistor shrinks)
F - clock frequency (we want to increase frequency)
V - voltage swing (from low to high) (0 is 0V and 1 is 5V; we want to reduce the voltage in order to reduce power e.g. 1 to be 0.3V)

Dennard Scaling

it is paired together with Moore's Law
Voltage should scale with transistor size

smaller transistor => voltage should be smaller

We want to scale down voltage to keep power consumption law
We want to have Dennard Scaling
but it can't go forever:
Problem: Voltage can't go too low

must stay above threshold voltage otherwise transistor won't switch
noise problems occur; if there is a noise in a system, voltage can go + or - a volt or two and we can get wrong readings/errors

Problem: doesn't consider leakage power

leakage happens when you have thin insulators
as you scale everything down, insulators get thinner and leakage increases

Dennard scaling must stop

Multi-Core Systems

generic equation for Power: P = alpha * C * F * V^2

we can't increase frequency
frequency still goes up but with very slow rate

we then increase cores on chips, without increasing frequency => multi-core systems

they are still increasing density

if you have 4 core system but are running program only on one core, you are wasting 3 other cores who are idling
parallel execution is needed to exploit multi-core systems
code made to execute on multiple cores

we want to divide our program into smaller chunks which are executed in parallel, on different cores

different programs on different cores
parallel compilers:

they take sequential code and parallelize it
they chop it in tasks that can be run in parallel
this is extremely complex problem
this does not work that well

concurrent programming: it is actually a programmer who divides code into tasks that will be run in parallel

Concurrent vs Parallel

Concurrent Execution

concurrent execution is not necessarily the same as parallel execution
concurrent: start and end times overlap

they don't literally execute at the same time
if we take any point in time we can see that either task1 or task2 is executing
concurrent: tasks are competing for chunks of time (CPU clocks) which would execute their own instruction, not of the other competitor task
task1: -------- ------- (time -->)
task2: -----

parallel: execute at exactly the same time

if we take any point in time we can see that both task1 and task2 are executing
task1: ---------------- (time -->)
task2: -----

completion time for both tasks is longer in case of concurrent case
why do concurrent execution? why just not do task2 after task1 if joint time of execution is the same?

Concurrent vs Parallel

parallel tasks need to be executed on different hardware (different cores)
concurrent tasks may be executed on the same hardware (e.g. on one core)

only one task is actually executed at a time

Mapping from tasks to hardware (which task is executed on which core) is not directly controlled by the programmer

at least not in Go

Concurrent Programming

which tasks are executed on which core
programmer determines which tasks can be executed in parallel; e.g. task1 has to do some task before before task2 etc...
programmer describes what can be (not what will be) executed in parallel

programmer defines possible concurrency

what will be executed in parallel depends on how tasks are mapped to hardware
mapping task to hardware is done by:

operating system
Go runtime scheduler

Hiding Latency

let's say we do concurrent programming and have one core; so why bother doing concurrency at all?
even though we can't do parallel execution we can still get significant performance improvements because tasks typically have to periodically wait for some slow events (e.g. input/output)
when CPU's have to communicate with memory, network, file system, video card...this is all slow
example: reading from memory

X = Y + Z
CPU has to read Y and Z from memory, do operation and write result back to memory
to read/write from/to memory CPU has to wait hundreds of cycles but in the meantime CPU could use those clocks to execute some useful tasks

Other concurrent tasks can operate while one task is waiting

Hardware Mapping

Parallel Execution:

task1 --> Core1
task2 --> Core2

Concurrent Execution:

task1 ---/--> Core1
task2 --/

Hardware Mapping in Go

not under direct control of programmer in Go or any other standard language
programmer makes parallelism possible
this is very hard task which would slow down programming
hard task for it depends on many factors

e.g. underlying hardware architecture - programmers should not care about details about it

simple arbitrary multi-core system:

core core

| |

cache ------------- cache

shared memory

cache ------------- cache

| |

core core

cache - a local cache for each core
one big consideration when figuring out hardware mapping is where is the data?

if core 1 has to perform some task we'd like data to be in its cache
if that data is in cache of core 2 then core 2 should perform that task

task has to be performed on that core where data is

in Go we define which task can be executed in parallel

Week 2 - CONCURRENCY BASICS

Processes

a lot of concurrent execution ideas came from operating systems

Processes

an instance of a running program
things unique to a process

Memory

virtual address space
code
stack - region of memory which usually handles function calls
heap - for memory allocations
shared libraries - shared between processes

Registers - they store 1 value; tiny, super-fast, memory

Program counter - tells what instruction is executed now or the next one
data registers
stack pointer
...

Operating System

Allows many processes to execute concurrently

makes sure that virtual address space is not overlapping
makes sure that all processes get fair share of processor time and resources
these processes are run concurrently; they switch quickly...like 20ms
from user's perspective it looks all processes run in parallel although they don't - OS ensures the illusion of parallelism

Task Manager

shows all running processes

foreground
background

Scheduling

one of the main tasks of OS
OS schedules processes for execution
Gives the illusion of parallel execution

...

process1

process2

process3

process1

process2

...

OS gives fair access to CPU, memory, etc...
there are many different scheduling algorithms

one is called Round Robin - processes simply alternate in a round fashion (1, 2, 3, 1, 2, 3, 1, 2, 3...) so every process gets the same chunk of time
if processes don't have the same priority - processes with high priority would get more CPU time, they would be scheduled more frequently

embedded systems: some tasks are critical e.g. breaking - that would be considered a high-priority task while playing stereo music would be a low-priority task

Context Switch

control flow changes from one process to another
e.g. switching from processA to processB
before each switch OS has to save the state of currently running process and restore it when next time its execution gets resumed
this state is called the context - all the stuff unique to the process we listed before
process "context" must be swapped

...

processA

context switch

processB

context switch

processA

...

during context switch periods kernel of the OS is running
context switch usually happens after a timer times out

Threads vs Processes

there used to be only Processes
downsize of processes: process switching time is long (switching between processes is slow because memory access)
to speed up this: threads
thread is like a process but ig has less contexts; it shares some of the context with other threads in a process
Parts of Process context shared among process threads:

Virtual memory
File descriptors

Specific (unique) parts of Process context for each thread:

Stack
Data registers
Code (PC)

Switching between threads is faster because there is less context - less data that has to be read/written from/to memory

Goroutines

goroutine is basically a like a thread in Go
many goroutines execute within a single OS thread
Go takes a process with main thread and schedules / switches goroutines within that thread

Go Runtime Scheduler

schedules goroutines inside an OS thread (main thread)
like a little OS inside a single OS thread
Go runs on a main thread and switches goroutines on one thread
from OS point of view - there is only a single thread

Main thread

Logical processor

Go runtime

|--------------------------

/ | \

Goroutine1 Goroutine2 Goroutine3

Go runtime scheduler uses Logical Processor which is mapped to a thread
typically there is one Logical Processor which is mapped to a main thread
since all these goroutines are running on one thread, we don't have parallelism but concurrency
we can increase number of Logical Processors - mapped to different threads and OS can map those threads to different cores
program can determine how many Logical Processors will be there; default is 1 (so we'll have concurrent execution of routines) but can be increased (so we might have parallel goroutines execution - if OS schedules running different threads on different cores)

Interleavings

writing concurrent code is hard as it's difficult mentally to keep track what's happening on which thread
the overall state of the machine is not deterministic
in case of crash, it can happen at different places
order of execution within task is known
order of execution between concurrent tasks is unknown
Let's look instructions in two tasks:

Task1

1: a = b+ c

2: d = e + f

3: g = h + i

Task2

1: r = s + t

2: u = v + w

3: x = y + z

Possible Interleavings

1: a = b+ c

1: r = s + t

2: d = e + f

2: u = v + w

3: g = h + i

3: x = y + z

1: a = b+ c

2: d = e + f

3: g = h + i

1: r = s + t

2: u = v + w

3: x = y + z

many interleavings are possible
must consider all possibilities
ordering is non-deterministic

Race Conditions

problem that can happen because of these interleavings that can happen
the outcome of the program depends on the interleaving; interleavings are indeterministic => outcome is undeterministic
the outcome of the program depends on non-deterministic ordering
interleaving can change every time we run a program
we want to have determinism: for the same set of inputs we want the same set of outputs
we want outcome of the program does not depend on interleavings

1st running - 1st interleaving combination

x = 1

print x

x = x + 1

Output: 1

2nd running - 2nd interleaving combination

x = 1

x = x + 1

print x

Output: 2

This needs to be avoided, prevented
Races occur due to communications: two tasks are communicating through the share variable x
if we didn't have this communication there would not be race condition
communication between tasks is often unavoidable

Communication Between Tasks

Threads are largely independent but not completely independent
Web server, one thread per client

Web Client 1

page --> Web server --> Client 2

Data Client 3

Clients are coming at the same time
Example: webpage shows visits counter
Image processing example: 1 thread per pixel block

thread1 --> [][] --> thread2

image processing is parallelizable but there must be some level of communication between threads e.g. in case of blurring

Week 3 - THREADS IN GO

Goroutines

to create threads of execution we have to use some of constructs built in Go

Creating a goroutine

one goroutine is created automatically to execute the main()
other goroutines are created using the go keyword
code snippet from some main() function with only one goroutine (the one for main())

a is assigned 2 only after foo() returns
foo() blocks main()

a = 1

foo()

a = 2

code snippet from some main() function with two goroutines

with go foo() we create a new goroutine
a might be (but also might not be) assigned before foo() returns - this depends on how the scheduler would schedule concurrent goroutine
foo() is non-blocking for main()- it continues execution without waiting for foo() to return

a = 1

go foo()

a = 2

Exiting a goroutine

goroutine exits when code of the associated with its function is complete
when the main goroutine is complete, all other goroutines exit, even if they are not finished
a goroutine may not complete its execution because main completes early

Exiting goroutines

goroutines are forced to exit when main goroutine exits

Early Exit

func main() {

go fmt.Printf("New routine")

fmt.Printf("Main routine")

}

we don't know the order of the execution of these routines - this is undeterministic
we'd expect to see both messages but actually only the 2nd message is printed (almost always)
this is because the scheduler seems to give preference to main routine (this might not be completely true) and also (very true) main() does not block after first message is scheduled to be printed in new goroutine
main() is finished before the new goroutine has a chance to start
we want main() to wait for other goroutines to complete

Delayed Exit

the following snippet shows a hacky/bad solution which might work:

func main() {

go fmt.Printf("New routine")

time.Sleep(100 * time.Millisecond)

fmt.Printf("Main routine")

}

we put main goroutine to sleep so another goroutine can resume or start
now is printed 1st and then 2nd message
this is a hack, bad solution because we assumed that 100ms would be enough for 2nd goroutine to be completed/executed. But this might be not the case, we don't know how much time it would take. Maybe it would take 100ms for OS to schedule main Go application thread to some other thread and we assume that this would not happen which is bad! Maybe the Go runtime schedules another goroutine - and we assumed it won't - again, bad!
timing is non-deterministic
we need formal synchronization constructs

Basic Synchronization

Synchronization

synchronization is when multiple threads agree on a timing of an event
there are global events whose execution is viewed by all threads, simultaneously
one goroutine does not know the timing of other goroutines
synchronization breaks that - it introduces some global events that every thread sees at the same time
this is important in order to restrict interleavings
e.g. two possible interleavings showing a rave condition: output depends on the interleaving; interleavings/schedules are non-deterministic

1st running - 1st interleaving combination

Task 1 Task 2

------ ------

x = 1

print x

x = x + 1

Output: 1

2nd running - 2nd interleaving combination

x = 1

x = x + 1

print x

we want to know what is the intention of the programmer
let's assume they want printing to happen after x in incremented
synchronization is used to restrict bad interleavings
synchronization:

we need some global event (GLOBAL EVENT) so both tasks/threads/gorotuines can see at the same time
Task 1: event happens
Task 2: we need a conditional execution:

if that event happened, goroutine will wait or run;
if GLOBAL EVENT happened, we'll execute printing

Task 1 Task 2

------ ------

x = 1

x = x + 1

GLOBAL EVENT

if GLOBAL EVENT

print x

synchronization is the opposite of concurrency
synchronization makes some threads/goroutines to wait
synchronization is reducing effective use of hardware
but in some cases it is necessary - when things have to happen in order
synchronization is a necessary evil

Wait Groups

type of synchronization that are common

Sync WaitGroup

sync package contains functions to synchronize between goroutines
sync.WaitGroup forces a goroutine to wait for other goroutines
wait group is like a group of goroutines that our goroutine has to wait for

our goroutine will not continue until all goroutines from WaitGroup finish

We can wait on 1 or more other goroutines
in this example: we want to make main goroutine to wait for 2nd goroutine

func main() {

go fmt.Printf("New routine")

fmt.Printf("Main routine")

}

WaitGroup contains an internal counter

increment a counter for each goroutine we want to wait for

if there are 3 gorountines, we'll increase it by 3

decrement a counter when each goroutine completes
waiting goroutine has to wait till this counter becomes 0

Using WaitGroup

main() runs in main goroutine
foo() runs in 2nd (worker) goroutine

Main thread:

var wg sync.WaitGroup

wg.Add(1)

go foo(&wg)

wg.Wait()

Foo thread:

wg.Done()

WaitGroup methods:

Add() increments the counter
Done() decrements the counter
Wait() blocks until counter == 0

WaitGroup Example

func foo(wg *sync.WaitGroup) {

fmt.Printf("New routine")

wg.Done()

}

func main() {

var wg sync.WaitGroup

wg.Add(1)

go foo(&wg)

wg.Wait()

fmt.Printf("Main routine")

)

Output: "New routine" and then "Main routine"

Communication

Goroutine Communication

goroutine can wait for each other
but they can also communicate with each other
generally, goroutines work together to perform a bigger task
these goroutines are not completely independent
they are doing a small piece of a bigger task
e.g. web server

makes sense to create a new thread for each new browser connection
all these threads share some data; e.g. if some data is sent from one browser, can be seen by another browser => these threads have to cooperate

example: find the product of 4 integers

make 2 goroutines, each multiplies a pair
main goroutine multiplies the 2 results
need to send ints from main goroutine to two go subroutines
need to send results from subroutines back to main routine

Channels

used for communication between goroutines
used to transfer data between goroutines
channels are typed

one channel can handle integers, another strings etc...

use make() to create a channel:

c := make(chan int)

send and receive data using arrow operator (<-)

send data on a channel:

c <- 3

receive data from a channel:

x := <- c

Channel Example

func prod(v1 int, v2 int, c chan int) {

c <- v1 * v2

}

func main() {

c := make(chain int)

go prod(1, 2, c)

go prod(3, 4, c)

a := <- c

b := <- c

fmt.Println(a * b)

}

a gets whatever first comes out of the channel
b gets whatever second comes out of the channel
There is also another way to send data between goroutines - via passing arguments to it when it is starting

Blocking in Channels

Unbuffered Channel

by default, when channel is created (with make()), it is unbuffered

default is unbuffered

unbuffered channels can't hold data in transit
the implications are:

sending blocks until the data is received
receiving blocks until data is sent

Task 1: c <- 3

one hour later...

Task 2: x := <- c

Blocking and Synchronization

this is also doing a synchronization, just like a WaitGroup

Task2 has to wait till Task1 sends data

channel communication is synchronous
Blocking is the same as waiting for communication
this kind of communication can be used for pure synchronization - we can freely drop the data - throw away received result:

Task 1: c <- 3

one hour later...

Task 2: <- c

Task 2 is receiving the data but is throwing it away
All we do here is synchronizing two tasks: Task 2 has to wait for Task 1 to send the data
this is another way to implement WaitGroup's Wait()

Buffered Channel

Channel Capacity

channels by default are unbuffered

they have no capacity to hold the data
unbuffered channels have capacity 0

channels can have some capacity
capacity is the number of objects channel can hold in transit
to make channel a buffered, we can use an optional argument of make() function - it's second argument would define a channel capacity

c := make(chan int, 3)

default size is 0
channel with some capacity still blocks under some conditions
sending only blocks if buffer is full

e.g. if we have 3 sends with no receives, new sends will be blocked
as soon as a new receive happens, the next, 4th send will unblock

receiving only blocks if buffer is empty - if there is nothing in a buffer, channel read operation will block until there is something to be read from the channel

Channel Blocking, Receive

channel with capacity 1

Task 1 -------> [ ] --------> Task 2

Task 1:

c <- 3

Task2:

a := <- c

b := <- c

first receive blocks (in Task 2) until send occurs (in Task 1)
second receive blocks forever (in Task 2)

Channel Blocking, Send

channel with capacity 1

Task 1 -------> [ ] --------> Task 2

Task 1:

c <- 3

c <- 4

Task2:

a := <- c

second send blocks till first receive is done

Use of Buffering

buffering is used when producer and consumer work in different speeds
sender and receiver do not need to operate at exactly the same speed
Producer is generating data

e.g. reading sensors; taking audio samples
can do it continuous

Consumer is processing data

Producer -------> [|||||||||] --------> Consumer

Buffer & blocking help equalizing speeds of producer(sender) and consumer(receiver)
if producer is producing data faster than consumer can consume, producer has to block, to slow down producing data - when buffer is full
if consumer is consuming data faster than producer can produce, consumer has to block - when there is no data in a buffer, when buffer is empty

Week 4 - SYNCHRONIZED COMMUNICATION

Blocking on Channels

Iterating through a Channel

common operation on channel is to iteratively read from channel
this would happen when we have producer and consumer

consumer wants to continuously receive data from a channel and process it

there is a construct in Go which is made specifically to do this:

for i:= range c {

fmt.Println(i)

}

continues to read from channel c
one iteration each time a new value is received (is available in channel)
i is assigned to the read value
this for loop could be an infinite loop

to quit the loop sender can close the channel
loop quits when sender calls close(c)
this is another method that can be performed on a channel
when sender closes channel that gives a signal to the receiver - for loop ends
if we use range to read from channel then we need to call close(c)

Receiving from Multiple Goroutines

another common scenario is reading from multiple goroutines or multiple channels that can be associated with multiple goroutines
multiple channels may be used to receive from multiple sources
let's say we have 3 goroutines that are communicating with 2 channels

e.g. task3 tries to compute a product of two numbers, each coming from a different channel

task1 ----- c1 -----> task3 <----- c2 ------ task2

data from both sources might be needed
read sequentially

a := <- c1

b := <- c2

fmt.Println(a * b)

this is blocking

T3 first had to wait data to appear on c1 and then for data to appear on c2

eventually, T3 will both data and complete task

Select Statement

sometimes task need data from either channel, from either c1 OR c2,
if we have a choice of multiple channels and want to use the data that comes first ("first come, first served"), no matter which channel from

we don't want to read from all channels
we don't want to block (wait on data) on some channel e.g. c1 as it might never happen - in the meantime data might be available on some other channel e.g. c2
we don't know on which channel data will come first
in this case we'll use the select statement:

select {

case a = <- c1:

fmt.Println(a)

case b = <- c2:

fmt.Println(b)

}

whichever case happens first, its print will be executed

Select Send or Receive

select allows choosing data from several channels
we don't have to block on all channels, we practically block only on channel which will first return data
we are blocking here on receiving data but we can also block on sending data
with select, case can be on receiving or sending data:

select {

case a <- inchan:

fmt.Println(a)

case outchan <- b:

fmt.Println("sent b")

}

if something comes on inchan, a gets that value
we also want to write b value to outchan
inchan is blocked if noone is writing to it
outchan is blocked if noone is reading from it
either of these two actions (cases) can happen first e.g. inchan might have available data before outchan is emptied so can receive new value
whichever thing happens first that case will be executed
if some data comes to inchan before data on outchan becomes available then case 1 will be executed

Select with Abort Channel

one common use of select is to have a separate abort channel
producer-consumer scenario
use select with a separate abort channel
may want to receive data until an abort signal is received

for { // infinite loop

select {

case a <- c:

fmt.Println(a) // process data

case <-abort:

return // abort signal received

}

if anything comes to an abort channel we quit the loop
we don't pay attention which data is coming on abort channel - we dismiss it

Dafault Select

we have regular cases - e.g. waiting for some channels e.g. c1 and c2
default case: executed if no other cases are satisfied
in this case it will not block at all!

select {

case a = <- c1:

fmt.Println(a)

case b = <- c2:

fmt.Println(b)

default:

fmt.Println("nop")

}

Mutual Exclusion

Goroutines Sharing Variables

sharing variables between goroutines (concurrently) can cause problems
two goroutnes writing to the same shared variable can interfere with each other
function/goroutine is said to be concurrency-safe if can be executed concurrently with other goroutines without interfering improperly with them

e.g. it will not alterate variables in other goroutines in some unexpected/unintended/unsafe way

Variable Sharing Example

var i int = 0

var wg sync.WaitGroup

func inc() {

i = i + 1

wg.Done()

}

func main() {

wg.Add(2)

go inc()

wg.Wait()

fmt.Println(i)

}

two goroutines write to i
i should equal 2
BUT this doesn't always happen

Possible Interleavings

i = 0

Task1: i = i + 1

i = 1

Task2: i = i + 1

i = 2

i = 0

Task2: i = i + 1

i = 1

Task1: i = i + 1

i = 2

seems like there is no problem
BUT that is deceiving as there are more interleavings than we think

Granularity of Concurrency

concurrency is at the machine code level, NOT the source code level!

intreleavings are not of Go source code instructions, what actually gets interleaved is underlying machine code

Go source code is compiled to machine code

machine instructions get interleaved

interleaving can start in the middle of some Go source code instruction

i = i + 1 might be mapped into three machine code instructions:

read i (read value from memory and place it in the registry)

increment (in registry)

write i (write it back to memory)

interleaving happens at this level

interleaving machine instructions causes unexpected problems

Interleaving Machine Instructions

Both tasks read 0 for i value
each task is using its own registry
both tasks are sharing variable i

i == 0

Task1: read i // 0

Task2: read i // 0

Task1: inc // 1

Task1: write // 1

i == 1

Task2: inc // 1

Task2: write // overwrites 1 with the same value - 1

i == 1

Mutex

how to we do sharing of data correctly between two goroutines?
don't let two goroutines write to a shared variable at the same time
we need to restrict possible interleavings in such way that they don't write to shared variable at the same time
access to shared variables cannot be interleaved
Mutual Exclusion

declare code segments in different goroutines which cannot execute concurrently => they cannot be interleaved

writing to shared variables should be mutually exclusive

Sync.Mutex

A Mutex ensures mutual exclusion
uses a binary semaphore

if flag is up

shared variable is in use by somebody
when one goroutine is writing into it
only one goroutine can write into variable at a time
once goroutine is using shared variable it has to put the flag up
once goroutine is done with using shared variable it has to put the flag down

if flag is down

shared variable is available
if another goroutine see that flag is down it knows it can use the shared variable but first it has to put the flag up

Mutex Methods

putting flag up and down are implemented in methods lock and unlock
Lock()

method puts the flag up (if none of other goroutines has already put the flag up)
notifies others that shared variable is in use
if second goroutine also calls Lock() it will be blocked, it has to wait until first goroutine releases the lock
we can have any number of goroutines (not just two) competing to put the flag up

Unlock() method puts the flag down

notifies others that it is done with using shared variable

When Unlock() is called, a blocked Lock() can proceed
so in a goroutine we have to put Lock() at the beginning of the mutually exclusive region and call Unlock() at the end of it; this will ensure that only one goroutine will be in this mutually exclusive region

Using Sync.Mutex

Let's fix the code with double increment
Increment operation is now mutually exclusive

var i int = 0

var mut sync.Mutex

func int() {

mut.Lock()

i = i + 1

mut.Unlock()

}

this ensures that reading i from memory, incrementing it and writing the result will be done in one go for each task, context won't be switched in the middle of this Go command line

Once Synchronization

sync package gives a set of methods that can be used for synchronization between goroutines

Synchronous Initialization

useful idiom
let's assume we have multi-threaded program and we need to perform initialization
initialization

must happen once
must happen before everything else

the order of the execution of goroutines is not known so how can we determine which goroutine shall perform the initialization?
How do you perform initialization with multiple goroutines?
Could perform initialization before starting the goroutines

put it at the beginning of main() - the first goroutine
sometimes this might not be an option

another option is using once.Do()

Sync.Once

has one method, once.Do(f)
once.Do(f) can be put (called) in many goroutines but function f (e.g. some initialization) will be executed only once
all calls to once.Do(f) block until the first returns

this ensures that initialization is executed first

Sync.Once Example

make two goroutines, initialization only once
each goroutine executes doStuff()
doStuff() performs initialization at the beginning of it
only one goroutine has to perform initialization within doStuff()
setup() should execute only once

only one goroutine will executed it, although it is called in both

"hello" should not print until setup() returns

var wg sync.WaitGroup

var on sync.Once

func Setup() {

fmt.Println("Init")

}

func doStuff() {

on.Do(Setup)

fmt.Println("hello")

wg.Done()

}

func main() {

wg.Add(2)

go doStuff()

wg.Wait()

}

Output:

Init // result of Setup()

hello // from one goroutine

hello // from another goroutine

Deadlock

it should be avoided when coding
comes from synchronization dependencies
if we have multiple goroutines and synchronization can cause one goroutine's execution to depend on another's

Synchronization Dependencies

synchronization causes the execution of different goroutines (e.g. G1 and G2) to depend on each other

G1:

ch <- 1
mut.Unlock()

G2:
x := <- ch
mut.Lock()

there is a blocking dependencies here:

G1 writes to channel and G2 reads from it so G2 is dependent on G1 as G2 blocks until G1 is executed; G2 cannot continue until G1 does something
G2 can acquire a lock only after G1 releases it

Deadlock

circular dependencies cause all involved goroutines to block

G1 waits for G2
G2 waits for G1

can be caused by waiting on channels too

Deadlock Example

func dostuff(c1 chan int, c2 chan int) {

<- c1

c2 <- 1

wg.Done()

}

read from first channel

wait for write onto first channel

write to second channel

wait for read from second channel

func main() {

ch1 := make(chan int)

ch2 := make(chan int)

wg.Add()

go doStuff(ch1, ch2)

go doStuff(ch2, ch1)

wg.Wait()

}

doStuff() argument order is swapped
Each goroutine blocked on a channel read
nothing can progress => deadlock!

Deadlock Detection

Golang runtime automatically detects when all goroutines are deadlocked:

"fatal error: all goroutines are asleep - deadlock!"

however, it cannot detect when a subset of goroutines are deadlocked

Dining Philosophers Problem

classic concurrency problem, with a deadlock
Problem:

5 philosophers sitting at a round table
each one has a plate of rice
1 chopstick is placed between each adjacent pair
to eat, a philosopher need two chopsticks, one from the left and one from the right of his plate
only one philosopher can hold a chopstick at a time
not enough chopsticks for everyone to eat at once

Dining Philosophers Issues

\ O | O /

O O

/ O \

each chopstick is a mutex because philosophers have mutually exclusive access to it
each philosopher is associated with a goroutine and two chopsticks

Chopsticks and Philosophers

type ChopS structure {

sync.Mutex

}

type Philo struct {

leftCS, rightCS *ChopS

}

Philosopher Eat Method

func (p Philo) eat() {

for {

p.leftCS.Lock()

p.rigthCS.Lock()

fmt.Println("eating")

p.rigthCS.Unlock()

p.leftCS.Unlock()

}

Initialization in Main

CSticks := make([]*ChopS, 5) // create slice

for i:=0; i<5; i++ {

CSticks[i] = new(ChopS) // BK: why not using &ChopS?

}

philos := make([]*Philo, 5) // create slice

for i:=0; i<5; i++ {

philos[i] = &Philo{

CSticks[i],

CSticks[(i + 1)%5]

}

Initialize chopsticks and philosophers
Notice (i + 1)%5

this is because we can't use simply i + 1 as for i = 4, we'd have i = 5 but that should be 0 (5 % 5 = 0)

Start the dining in Main

for i := 0; i < 5; i++ {

go philos[i].eat()

}

start each philosopher eating
would also need Wait in main() so main() does not return before philosophers complete eating
that was the code written in naive way; naive for we'd have a deadlock!

Deadlock Problem

p.leftCS.Lock()

p.rigthCS.Lock()

fmt.Println("eating")

p.rigthCS.Unlock()

p.leftCS.Unlock()

this sequence is executed in each of 5 goroutines
these goroutines are ordered in non-deterministic way
in one possible interleaving all philosophers might lock their left chopsticks concurrently
all chopsticks would be locked => none can lock their right chopsticks cause someone's right chopstick is someone else's left one and is locked
in such interleaving we are in a deadlock!

Deadlock Solution

Dijkstra's solution: each philosopher picks up lowest numbered chopstick first
our code is NOT doing that:

philos[i] = &Philo{

CSticks[i],

CSticks[(i+1)%5]

}

In our current code:

Philosopher 0 picks first CS0 as left and CS1 as right

this is OK as it picked CS with lowest number first

Philosopher 1 picks first CS1 as left and CS2 as right

this is OK as it picked CS with lowest number first

Philosopher 2 picks first CS2 as left and CS3 as right

this is OK as it picked CS with lowest number first

Philosopher 3 picks first CS3 as left and CS4 as right

this is OK as it picked CS with lowest number first

Philosopher 4 picks first CS4 as left and CS0 as right

this is NOT OK as it picked CS with higher number first

Philosopher 4 picks up chopstick 4 before chopstick 0

this violates Dijkstra's law

With Dijkstra's solution:

in one interleaving we may have:

Philosopher 0 picks first CS0 as left and CS1 as right
Philosopher 1 waits for PH0 to finish so can grab CS1 first; once PH0 finishes eating: Philosopher 1 picks first CS1 as left and CS2 as right
PH2 waits for PH1 to finish so can grab CS2 first; once PH1 finishes eating: PH2 picks first CS2 as left and CS3 as right
PH3 waits for PH2 to finish so can grab CS3 first; once PH2 finishes eating: PH3 picks first CS3 as left and CS4 as right
PH4 waits for PH3 to finish so can grab CS4 first; once PH3 finishes eating: PH4 picks first CS4 as left and CS0 as right

Philospher 4 would first pick CS0 and then CS4 BUT Philosopher 0 blocks allowing Philosopher 4 to eat as PH0 has already picked SC0 so PH4 can't pick it; PH4 is blocked to pick CS0 so it won't pick CS4 either allowing Philosopher3 to pick both CS
No deadlock, but philosopher 4 may starve
Philospher 4 gets the lowest priority most of the time; he has to wait for the other most of the time
starvation: some goroutines may not be scheduled to be executed as often as others
this happens when we have circular dependency
starvation is not ideal but deadlock is the worst

Pages

Thursday, 21 February 2019

Concurrency in Go - Notes on Coursera course

Week 1 - Why Use Concurrency?

Parallel Execution

Parallel Execution

Why use Parallel Execution?

Von Neumann Bottleneck

Moore's Law

Power Wall

Power/Temperature Problem

Dynamic Power

Dennard Scaling

Multi-Core Systems

Concurrent vs Parallel

Concurrent Execution

Concurrent vs Parallel

Concurrent Programming

Hiding Latency

Hardware Mapping

Hardware Mapping in Go

Week 2 - CONCURRENCY BASICS

Processes

Processes

Operating System

Task Manager

Scheduling

Context Switch

Threads vs Processes

Goroutines

Go Runtime Scheduler

Interleavings

Possible Interleavings

Race Conditions

Communication Between Tasks

Week 3 - THREADS IN GO

Goroutines

Creating a goroutine

Exiting a goroutine

Exiting goroutines

Early Exit

Delayed Exit

Basic Synchronization

Synchronization

Wait Groups

Sync WaitGroup

Using WaitGroup

WaitGroup Example

Communication

Goroutine Communication

Channels

Channel Example

Blocking in Channels

Unbuffered Channel

Blocking and Synchronization

Buffered Channel

Channel Capacity

Channel Blocking, Receive

Channel Blocking, Send

Use of Buffering

Week 4 - SYNCHRONIZED COMMUNICATION

Blocking on Channels

Iterating through a Channel

Receiving from Multiple Goroutines

Select Statement

Select Send or Receive

Select with Abort Channel

Dafault Select

Mutual Exclusion

Goroutines Sharing Variables

Variable Sharing Example

Possible Interleavings

Granularity of Concurrency

Interleaving Machine Instructions

Mutex

Sync.Mutex

Mutex Methods

Using Sync.Mutex

Once Synchronization

Synchronous Initialization