Thursday, 21 February 2019

Concurrency in Go - Notes on Coursera course

Week 1 - Why Use Concurrency?

Parallel Execution


  • big property of Go language is that concurrency is built in the language
  • concurrency and parallelism are two closely related ideas
  • in C, Python etc...you can do concurrent programming but you need to use some 3rd party library 

Parallel Execution

  • parallel is not the same as concurrent
  • why do we need concurrency?
  • parallel execution is when two programs are executed at the same time
  • at some point in time instructions from two programs are executing in parallel
    • at time t and instruction is being performed for both P1 and P2
  • one processor core is executing 1 instruction at a time
  • if we want to have parallel execution, we need two processors or at least two processor cores
  • we need replicated hardware: e.g. CPU1 and CPU2
  • or if we have quad-core CPU then we can run 4 instructions in parallel, at the same time

Why use Parallel Execution?


  • tasks may be completed more quickly
  • you get a better throughput overall
  • example: two piles of dishes to wash
    • two dishwashers can complete twice as fast as one
  • BUT: some tasks must be performed sequentially
    • example: wash dish then dry dish; it has to be in this order
      • must wash before you can dry
  • some tasks are more parallelizable then other
    • we can't parallelize washing and drying of the same dish - even if we have hundred dishwashers and sinks available
  • some things can't be parallelized

Von Neumann Bottleneck

  • running concurrent product is hard
  • students usually learn sequential programming at least at undergraduate curriculum
  • concurrent programming is difficult
  • can we achieve speedup without Parallelism?
  • Solution 1: Design faster processors
    • get speedup without changing software
    • this is what used to be case till recently 
    • hardware has been getting faster and faster but this really stopped now
    • faster = processor clock rate; this used to get doubled every year or so
  • limitation on a speed that we have now is called Von Neumann Bottleneck: delayed access to memory
    • CPU has to access memory to get instructions and data
    • memory is always slower than CPU
    • even if CPU has high clock rate, it has to wait on memory access; lots of time is wasted just waiting for memory access
  • Solution 2:  design processors with more memory
    • build cache, fast memory on chip
    • that's what's traditionally been done till now
    • again, the same software, the same code would run faster
  • But this is not the case anymore

Moore's Law

  • it is not a physical law but more an observation of trends which used to be actual in past
  • Predicted that transistor density would double every two years
  • smaller transistors switch faster
  • exponential increase in density would lead to exponential increase in software speed
  • this is not the case anymore so software engineers had to do something

Power Wall

Power/Temperature Problem

  • the speedup that we get from Moore's Law can't continue because transistors consume power and power is becoming a critical issue - Power Wall
  • transistors consume power whenever they switch 
  • as you increase number of transistors, power used also increases
    • smaller transistors use less power but transistor density scaling is much faster
  • nowadays many devices are portable and are using batteries; it is a limited power available
  • even if devices is plugged to a socket, the issue is a temperature
    • we need fans and heat sink above processors e.g. i7; with out them it would melt
    • high power leads to high temperature
    • air cooling (fans) can only remove so much heat

Dynamic Power

  • generic equation for Power: P = alpha * C * F * V^2
    • alpha - percent of time switching; transistors consume dynamic power when they switch (go from 0 to 1 or from 1 to 0; when they don't switch they don't use dynamic power)
    • C - capacitance (related to size; goes down as transistor shrinks)
    • F - clock frequency (we want to increase frequency)
    • V - voltage swing (from low to high) (0 is 0V and 1 is 5V; we want to reduce the voltage in order to reduce power e.g. 1 to be 0.3V)

Dennard Scaling

  • it is paired together with Moore's Law
  • Voltage should scale with transistor size
    • smaller transistor => voltage should be smaller
  • We want to scale down voltage to keep power consumption law
  • We want to have Dennard Scaling
  • but it can't go forever: 
  • Problem: Voltage can't go too low
    • must stay above threshold voltage otherwise transistor won't switch
    • noise problems occur; if there is a noise in a system, voltage can go + or - a volt or two and we can get wrong readings/errors
  • Problem: doesn't consider leakage power
    • leakage happens when you have thin insulators
    • as you scale everything down, insulators get thinner and leakage increases
  • Dennard scaling must stop

Multi-Core Systems

  • generic equation for Power: P = alpha * C * F * V^2
    • we can't increase frequency
    • frequency still goes up but with very slow rate
  • we then increase cores on chips, without increasing frequency => multi-core systems
    • they are still increasing density
  • if you have 4 core system but are running program only on one core, you are wasting 3 other cores who are idling
  • parallel execution is needed to exploit multi-core systems
  • code made to execute on multiple cores
    • we want to divide our program into smaller chunks which are executed in parallel, on different cores
  • different programs on different cores
  • parallel compilers: 
    • they take sequential code and parallelize it 
    • they chop it in tasks that can be run in parallel
    • this is extremely complex problem
    • this does not work that well
  • concurrent programming: it is actually a programmer who divides code into tasks that will be run in parallel

Concurrent vs Parallel

Concurrent Execution

  • concurrent execution is not necessarily the same as parallel execution
  • concurrent: start and end times overlap
    • they don't literally execute at the same time
    • if we take any point in time we can see that either task1 or task2 is executing
    • concurrent: tasks are competing for chunks of time (CPU clocks) which would execute their own instruction, not of the other competitor task
    • task1: --------       -------                           (time -->)
    • task2:           ----- 
  • parallel: execute at exactly the same time
    • if we take any point in time we can see that both task1 and task2 are executing
    • task1: ----------------                                 (time -->)
    • task2:         ----- 
  • completion time for both tasks is longer in case of concurrent case
  • why do concurrent execution? why just not do task2 after task1 if joint time of execution is the same?

Concurrent vs Parallel

  • parallel tasks need to be executed on different hardware (different cores)
  • concurrent tasks may be executed on the same hardware (e.g. on one core)
    • only one task is actually executed at a time
  • Mapping from tasks to hardware (which task is executed on which core) is not directly controlled by the programmer
    • at least not in Go

Concurrent Programming

  • which tasks are executed on which core
  • programmer determines which tasks can be executed in parallel; e.g. task1 has to do some task before before task2 etc...
  • programmer describes what can be (not what will be) executed in parallel
    • programmer defines possible concurrency
  • what will be executed in parallel depends on how tasks are mapped to hardware
  • mapping task to hardware is done by:
    • operating system
    • Go runtime scheduler

Hiding Latency

  • let's say we do concurrent programming and have one core; so why bother doing concurrency at all?
  • even though we can't do parallel execution we can still get significant performance improvements because tasks typically have to periodically wait for some slow events (e.g. input/output)
  • when CPU's have to communicate with memory, network, file system, video card...this is all slow
  • example: reading from memory
    • X = Y + Z
    • CPU has to read Y and Z from memory, do operation and write result back to memory
    • to read/write from/to memory CPU has to wait hundreds of cycles but in the meantime CPU could use those clocks to execute some useful tasks
  • Other concurrent tasks can operate while one task is waiting

Hardware Mapping 

  • Parallel Execution:
    • task1 --> Core1
    • task2 --> Core2
  • Concurrent Execution:
    • task1 ---/--> Core1
    • task2 --/

Hardware Mapping in Go

  • not under direct control of programmer in Go or any other standard language
  • programmer makes parallelism possible
  • this is very hard task which would slow down programming
  • hard task for it depends on many factors
    • e.g. underlying hardware architecture - programmers should not care about details about it
  • simple arbitrary multi-core system:
core                      core 
   |                            |
cache ------------- cache
                  |
       shared memory
                  |
cache ------------- cache
   |                            |
core                      core 

  • cache - a local cache for each core
  • one big consideration when figuring out hardware mapping is where is the data?
    • if core 1 has to perform some task we'd like data to be in its cache 
    • if that data is in cache of core 2 then core 2 should perform that task
      • task has to be performed on that core where data is
  • in Go we define which task can be executed in parallel


Week 2 - CONCURRENCY BASICS

Processes

  • a lot of concurrent execution ideas came from operating systems

Processes

  • an instance of a running program
  • things unique to a process
    • Memory
      • virtual address space 
      • code 
      • stack - region of memory which usually handles function calls
      • heap - for memory allocations
      • shared libraries - shared between processes
    • Registers - they store 1 value; tiny, super-fast, memory
      • Program counter - tells what instruction is executed now or the next one
      • data registers
      • stack pointer
      • ...

Operating System

  • Allows many processes to execute concurrently
    • makes sure that virtual address space is not overlapping
    • makes sure that all processes get fair share of processor time and resources
    • these processes are run concurrently; they switch quickly...like 20ms
    • from user's perspective it looks all processes run in parallel although they don't - OS ensures the illusion of parallelism

Task Manager

  • shows all running processes
    • foreground
    • background

Scheduling

  • one of the main tasks of OS
  • OS schedules processes for execution
  • Gives the illusion of parallel execution
...
process1
process2
process3
process1
process2
...

  • OS gives fair access to CPU, memory, etc...
  • there are many different scheduling algorithms
    • one is called Round Robin - processes simply alternate in a round fashion (1, 2, 3, 1, 2, 3, 1, 2, 3...) so every process gets the same chunk of time
    • if processes don't have the same priority - processes with high priority would get more CPU time, they would be scheduled more frequently
  • embedded systems: some tasks are critical e.g. breaking - that would be considered a high-priority task while playing stereo music would be a low-priority task

Context Switch

  • control flow changes from one process to another
  • e.g. switching from processA to processB
  • before each switch OS has to save the state of currently running process and restore it when next time its execution gets resumed
  • this state is called the context - all the stuff unique to the process we listed before
  • process "context" must be swapped
...
processA
context switch
processB
context switch
processA
...
  • during context switch periods kernel of the OS is running
  • context switch usually happens after a timer times out

Threads vs Processes

  • there used to be only Processes
  • downsize of processes: process switching time is long (switching between processes is slow because memory access)
  • to speed up this: threads
  • thread is like a process but ig has less contexts; it shares some of the context with other threads in a process
  • Parts of Process context shared among process threads:
    • Virtual memory
    • File descriptors
  • Specific (unique) parts of Process context for each thread:
    • Stack
    • Data registers
    • Code (PC)
  • Switching between threads is faster because there is less context - less data that has to be read/written from/to memory

Goroutines

  • goroutine is basically a like a thread in Go
  • many goroutines execute within a single OS thread
  • Go takes a process with main thread and schedules / switches goroutines within that thread

Go Runtime Scheduler

  • schedules goroutines inside an OS thread (main thread)
  • like a little OS inside a single OS thread
  • Go runs on a main thread and switches goroutines on one thread
  • from OS point of view - there is only a single thread
    Main thread
         |
 Logical processor
         |
    Go runtime
         |
         |-------------------------- 
         /              |           \
Goroutine1         Goroutine2        Goroutine3


  • Go runtime scheduler uses Logical Processor which is mapped to a thread
  • typically there is one Logical Processor which is mapped to a main thread
  • since all these goroutines are running on one thread, we don't have parallelism but concurrency
  • we can increase number of Logical Processors - mapped to different threads and OS can map those threads to different cores
  • program can determine how many Logical Processors will be there; default is 1 (so we'll have concurrent execution of routines) but can be increased (so we might have parallel goroutines execution - if OS schedules running different threads on different cores)

Interleavings

  • writing concurrent code is hard as it's difficult mentally to keep track what's happening on which thread
  • the overall state of the machine is not deterministic
  • in case of crash, it can happen at different places
  • order of execution within task is known
  • order of execution between concurrent tasks is unknown
  • Let's look instructions in two tasks:
Task1 

1: a = b+ c 
2: d = e + f
3: g = h + i

Task2

1: r = s + t
2: u = v + w
3: x = y + z

Possible Interleavings



1: a = b+ c 
             1: r = s + t
2: d = e + f
             2: u = v + w
3: g = h + i
             3: x = y + z

OR

1: a = b+ c 
2: d = e + f
3: g = h + i
             1: r = s + t
             2: u = v + w
             3: x = y + z

  • many interleavings are possible
  • must consider all possibilities
  • ordering is non-deterministic

Race Conditions

  • problem that can happen because of these interleavings that can happen
  • the outcome of the program depends on the interleaving; interleavings are indeterministic => outcome is undeterministic
  • the outcome of the program depends on non-deterministic ordering
  • interleaving can change every time we run a program
  • we want to have determinism: for the same set of inputs we want the same set of outputs
  • we want outcome of the program does not depend on interleavings
1st running - 1st interleaving combination

x = 1
         print x
x = x + 1

Output: 1

2nd running - 2nd interleaving combination

x = 1
x = x + 1
         print x

Output: 2
  • This needs to be avoided, prevented
  • Races occur due to communications: two tasks are communicating through the share variable x
  • if we didn't have this communication there would not be race condition
  • communication between tasks is often unavoidable 

Communication Between Tasks

  • Threads are largely independent but not completely independent
  • Web server, one thread per client
Web                       Client 1
page  --> Web server -->  Client 2
Data                      Client 3
  • Clients are coming at the same time
  • Example: webpage shows visits counter
  • Image processing example: 1 thread per pixel block
thread1 --> [][] --> thread2
  • image processing is parallelizable but there must be some level of communication between threads e.g. in case of blurring

Week 3 - THREADS IN GO

Goroutines

  • to create threads of execution we have to use some of constructs built in Go

Creating a goroutine

  • one goroutine is created automatically to execute the main()
  • other goroutines are created using the go keyword
  • code snippet from some main() function with only one goroutine (the one for main())
    • a is assigned 2 only after foo() returns
    • foo() blocks main()
a = 1
foo()
a = 2 
  • code snippet from some main() function with two goroutines
    • with go foo() we create a new goroutine
    • a might be (but also might not be)  assigned before foo() returns - this depends on how the scheduler would schedule concurrent goroutine
    • foo() is non-blocking for main()- it continues execution without waiting for foo() to return
a = 1
go foo()
a = 2 

Exiting a goroutine

  • goroutine exits when code of the associated with its function is complete
  • when the main goroutine is complete, all other goroutines exit, even if they are not finished
  • a goroutine may not complete its execution because main completes early

Exiting goroutines

  • goroutines are forced to exit when main goroutine exits

Early Exit

func main() {
   go fmt.Printf("New routine")
   fmt.Printf("Main routine")
}

  • we don't know the order of the execution of these routines - this is undeterministic
  • we'd expect to see both messages but actually only the 2nd message is printed (almost always)
  • this is because the scheduler seems to give preference to main routine (this might not be completely true) and also (very true) main() does not block after first message is scheduled to be printed in new goroutine
  • main() is finished before the new goroutine has a chance to start
  • we want main() to wait for other goroutines to complete

Delayed Exit

  • the following snippet shows a hacky/bad solution which might work:
func main() {
   go fmt.Printf("New routine")
   time.Sleep(100 * time.Millisecond)
   fmt.Printf("Main routine")
}
  • we put main goroutine to sleep so another goroutine can resume or start 
  • now is printed 1st and then 2nd message
  • this is a hack, bad solution because we assumed that 100ms would be enough for 2nd goroutine to be completed/executed. But this might be not the case, we don't know how much time it would take. Maybe it would take 100ms for OS to schedule main Go application thread to some other thread and we assume that this would not happen which is bad! Maybe the Go runtime schedules another goroutine - and we assumed it won't - again, bad!
  • timing is non-deterministic
  • we need formal synchronization constructs

Basic Synchronization

Synchronization

  • synchronization is when multiple threads agree on a timing of an event
  • there are global events whose execution is viewed by all threads, simultaneously
  • one goroutine does not know the timing of other goroutines
  • synchronization breaks that - it introduces some global events that every thread sees at the same time
  • this is important in order to restrict interleavings
  • e.g. two possible interleavings showing a rave condition: output depends on the interleaving; interleavings/schedules are non-deterministic

1st running - 1st interleaving combination

Task 1    Task 2  
------    ------

x = 1
         print x
x = x + 1

Output: 1

2nd running - 2nd interleaving combination

x = 1
x = x + 1
         print x

  • we want to know what is the intention of the programmer
  • let's assume they want printing to happen after x in incremented
  • synchronization is used to restrict bad interleavings
  • synchronization:
    • we need some global event (GLOBAL EVENT) so both tasks/threads/gorotuines can see at the same time
    • Task 1: event happens
    • Task 2: we need a conditional execution:
      • if that event happened, goroutine will wait or run;
      • if GLOBAL EVENT happened, we'll execute printing
Task 1          Task 2  
------          ------

x = 1
x = x + 1
GLOBAL EVENT
              
             if GLOBAL EVENT
                print x
  • synchronization is the opposite of concurrency
  • synchronization makes some threads/goroutines to wait
  • synchronization is reducing effective use of hardware
  • but in some cases it is necessary - when things have to happen in order
  • synchronization is a necessary evil

Wait Groups

  • type of synchronization that are common

Sync WaitGroup

  • sync package contains functions to synchronize between goroutines
  • sync.WaitGroup forces a goroutine to wait for other goroutines
  • wait group is like a group of goroutines that our goroutine has to wait for
    • our goroutine will not continue until all goroutines from WaitGroup finish
  • We can wait on 1 or more other goroutines
  • in this example: we want to make main goroutine to wait for 2nd goroutine
func main() {
   go fmt.Printf("New routine")
   fmt.Printf("Main routine")
}

  • WaitGroup contains an internal counter
    • increment a counter for each goroutine we want to wait for
      • if there are 3 gorountines, we'll increase it by 3
    • decrement a counter when each goroutine completes
    • waiting goroutine has to wait till this counter becomes 0

Using WaitGroup

  • main() runs in main goroutine
  • foo() runs in 2nd (worker) goroutine

Main thread:

   var wg sync.WaitGroup
   wg.Add(1)
   go foo(&wg)
   wg.Wait()

Foo thread:

   wg.Done()

  • WaitGroup methods:
    • Add() increments the counter
    • Done() decrements the counter
    • Wait() blocks until counter == 0

WaitGroup Example


func foo(wg *sync.WaitGroup) {
   fmt.Printf("New routine")
   wg.Done()
}

func main() {
   var wg sync.WaitGroup
   wg.Add(1)
       go foo(&wg)
   wg.Wait()
   fmt.Printf("Main routine")
)
  • Output: "New routine" and then "Main routine"

Communication


Goroutine Communication

  • goroutine can wait for each other
  • but they can also communicate with each other
  • generally, goroutines work together to perform a bigger task
  • these goroutines are not completely independent
  • they are doing a small piece of a bigger task
  • e.g. web server
    • makes sense to create a new thread for each new browser connection
    • all these threads share some data; e.g. if some data is sent from one browser, can be seen by another browser => these threads have to cooperate
  • example: find the product of 4 integers
    • make 2 goroutines, each multiplies a pair
    • main goroutine multiplies the 2 results
    • need to send ints from main goroutine to two go subroutines
    • need to send results from subroutines back to main routine

Channels

  • used for communication between goroutines
  • used to transfer data between goroutines
  • channels are typed
    • one channel can handle integers, another strings etc...
  • use make() to create a channel:
c := make(chan int)
  • send and receive data using arrow operator (<-)
    • send data on a channel: 
c <- 3
    • receive data from a channel: 
x := <- c

Channel Example

func prod(v1 int, v2 int, c chan int) {
   c <- v1 * v2
}

func main() {
   c := make(chain int)
   go prod(1, 2, c)
   go prod(3, 4, c)
   a := <- c
   b := <- c
   fmt.Println(a * b)
}

  • a gets whatever first comes out of the channel
  • b gets whatever second comes out of the channel
  • There is also another way to send data between goroutines - via passing arguments to it when it is starting

Blocking in Channels

Unbuffered Channel

  • by default, when channel is created (with make()), it is unbuffered
    • default is unbuffered
  • unbuffered channels can't hold data in transit
  • the implications are: 
    • sending blocks until the data is received
    • receiving blocks until data is sent
Task 1: c <- 3

one hour later...

Task 2: x := <- c

Blocking and Synchronization

  • this is also doing a synchronization, just like a WaitGroup
    • Task2 has to wait till Task1 sends data
  • channel communication is synchronous
  • Blocking is the same as waiting for communication
  • this kind of communication can be used for pure synchronization - we can freely drop the data - throw away received result:
Task 1: c <- 3

one hour later...

Task 2: <- c

  • Task 2 is receiving the data but is throwing it away
  • All we do here is synchronizing two tasks: Task 2 has to wait for Task 1 to send the data
  • this is another way to implement WaitGroup's Wait()

Buffered Channel 

Channel Capacity

  • channels by default are unbuffered
    • they have no capacity to hold the data
    • unbuffered channels have capacity 0
  • channels can have some capacity
  • capacity is the number of objects channel can hold in transit
  • to make channel a buffered, we can use an optional argument of make() function - it's second argument would define a channel capacity
c := make(chan int, 3)
  • default size is 0
  • channel with some capacity still blocks under some conditions
  • sending only blocks if buffer is full
    • e.g. if we have 3 sends with no receives, new sends will be blocked
    • as soon as a new receive happens, the next, 4th send will unblock
  • receiving only blocks if buffer is empty - if there is nothing in a buffer, channel read operation will block until there is something to be read from the channel

Channel Blocking, Receive

  • channel with capacity 1
Task 1 -------> [    ] --------> Task 2

Task 1:
c <- 3

Task2:
a := <- c
b := <- c 

  • first receive blocks (in Task 2) until send occurs (in Task 1)
  • second receive blocks forever (in Task 2)

Channel Blocking, Send

  • channel with capacity 1
Task 1 -------> [    ] --------> Task 2

Task 1:
c <- 3
c <- 4

Task2:
a := <- c
  • second send blocks till first receive is done

Use of Buffering

  • buffering is used when producer and consumer work in different speeds
  • sender and receiver do not need to operate at exactly the same speed
  • Producer is generating data 
    • e.g. reading sensors; taking audio samples
    • can do it continuous
  • Consumer is processing data
Producer -------> [|||||||||] --------> Consumer
  • Buffer & blocking help equalizing speeds of producer(sender) and consumer(receiver)
  • if producer is producing data faster than consumer can consume, producer has to block, to slow down producing data - when buffer is full
  • if consumer is consuming data faster than producer can produce, consumer has to block - when there is no data in a buffer, when buffer is empty

Week 4 -  SYNCHRONIZED COMMUNICATION

Blocking on Channels

Iterating through a Channel


  • common operation on channel is to iteratively read from channel
  • this would happen when we have producer and consumer
    • consumer wants to continuously receive data from a channel and process it
  • there is a construct in Go which is made specifically to do this:
for i:= range c {
   fmt.Println(i)
}


  • continues to read from channel c
  • one iteration each time a new value is received (is available in channel)
  • i is assigned to the read value
  • this for loop could be an infinite loop
    • to quit the loop sender can close the channel
    • loop quits when sender calls close(c)
    • this is another method that can be performed on a channel
    • when sender closes channel that gives a signal to the receiver - for loop ends
    • if we use range to read from channel then we need to call close(c) 

Receiving from Multiple Goroutines

  • another common scenario is reading from multiple goroutines or multiple channels that can be associated with multiple goroutines
  • multiple channels may be used to receive from multiple sources
  • let's say we have 3 goroutines that are communicating with 2 channels
    • e.g. task3 tries to compute a product of two numbers, each coming from a different channel

task1 ----- c1 -----> task3 <----- c2 ------ task2

  • data from both sources might be needed
  • read sequentially
a := <- c1
b := <- c2
fmt.Println(a * b)

  • this is blocking
    • T3 first had to wait data to appear on c1 and then for data to appear on c2
  • eventually, T3 will both data and complete task

Select Statement

  • sometimes task need data from either channel, from either c1 OR c2, 
  • if we have a choice of multiple channels and want to use the data that comes first ("first come, first served"), no matter which channel from
    • we don't want to read from all channels
    • we don't want to block (wait on data) on some channel e.g. c1 as it might never happen - in the meantime data might be available on some other channel e.g. c2
    • we don't know on which channel data will come first
    • in this case we'll use the select statement:
select {
   case a = <- c1:
      fmt.Println(a)
   case b = <- c2:
      fmt.Println(b)
}
  • whichever case happens first, its print will be executed

Select Send or Receive

  • select allows choosing data from several channels
  • we don't have to block on all channels, we practically block only on channel which will first return data
  • we are blocking here on receiving data but we can also block on sending data
  • with select, case can be on receiving or sending data:
select {
   case a <- inchan:
      fmt.Println(a)
   case outchan <- b:
      fmt.Println("sent b")
}
  • if something comes on inchan, a gets that value
  • we also want to write b value to outchan 
  • inchan is blocked if noone is writing to it
  • outchan is blocked if noone is reading from it
  • either of these two actions (cases) can happen first e.g. inchan might have available data before outchan is emptied so can receive new value
  • whichever thing happens first that case will be executed
  • if some data comes to inchan before data on outchan becomes available then case 1 will be executed

Select with Abort Channel

  • one common use of select is to have a separate abort channel
  • producer-consumer scenario
  • use select with a separate abort channel
  • may want to receive data until an abort signal is received
for { // infinite loop
   select {
      case a <- c:
         fmt.Println(a) // process data
      case <-abort: 
         return             // abort signal received
}
  • if anything comes to an abort channel we quit the loop
  • we don't pay attention which data is coming on abort channel - we dismiss it 

Dafault Select

  • we have regular cases - e.g. waiting for some channels e.g. c1 and c2
  • default case: executed if no other cases are satisfied
  • in this case it will not block at all!
select {
   case a = <- c1:
      fmt.Println(a)
   case b = <- c2:
      fmt.Println(b)
   default:
      fmt.Println("nop")
}

Mutual Exclusion

Goroutines Sharing Variables

  • sharing variables between goroutines (concurrently) can cause problems 
  • two goroutnes writing to the same shared variable can interfere with each other
  • function/goroutine is said to be concurrency-safe if can be executed concurrently with other goroutines without interfering improperly with them
    • e.g. it will not alterate variables in other goroutines in some unexpected/unintended/unsafe way

Variable Sharing Example


var i int = 0 
var wg sync.WaitGroup

func inc() {
   i = i + 1
   wg.Done()
}

func main() {
   wg.Add(2)
   go inc()
   go inc()
   wg.Wait()
   fmt.Println(i)
}

  • two goroutines write to i
  • i should equal 2
  • BUT this doesn't always happen

Possible Interleavings


i = 0
Task1: i = i + 1
i = 1
Task2: i = i + 1
i = 2

i = 0
Task2: i = i + 1
i = 1
Task1: i = i + 1
i = 2

  • seems like there is no problem
  • BUT that is deceiving as there are more interleavings than we think

Granularity of Concurrency

  • concurrency is at the machine code level, NOT the source code level!
  • intreleavings are not of Go source code instructions, what actually gets interleaved is underlying machine code 
  • Go source code is compiled to machine code
  • machine instructions get interleaved
  • interleaving can start in the middle of some Go source code instruction
  • i = i + 1 might be mapped into three machine code instructions:
    • read i (read value from memory and place it in the registry)
    • increment (in registry)
    • write i (write it back to memory)
  • interleaving happens at this level
  • interleaving machine instructions causes unexpected problems

Interleaving Machine Instructions

  • Both tasks read 0 for i value
  • each task is using its own registry
  • both tasks are sharing variable i
i == 0
Task1: read i // 0
Task2: read i // 0
Task1: inc // 1 
Task1: write // 1
i == 1
Task2: inc  // 1
Task2: write // overwrites 1 with the same value - 1
i == 1

Mutex

  • how to we do sharing of data correctly between two goroutines?
  • don't let two goroutines write to a shared variable at the same time 
  • we need to restrict possible interleavings in such way that they don't write to shared variable at the same time
  • access to shared variables cannot be interleaved
  • Mutual Exclusion
    • declare code segments in different goroutines which cannot execute concurrently => they cannot be interleaved
  • writing to shared variables should be mutually exclusive

Sync.Mutex

  • A Mutex ensures mutual exclusion
  • uses a binary semaphore
    • if flag is up
      • shared variable is in use by somebody 
      • when one goroutine is writing into it
      • only one goroutine can write into variable at a time
      • once goroutine is using shared variable it has to put the flag up
      • once goroutine is done with using shared variable it has to put the flag down
    • if flag is down
      • shared variable is available
      • if another goroutine see that flag is down it knows it can use the shared variable but first it has to put the flag up

Mutex Methods

  • putting flag up and down are implemented in methods lock and unlock
  • Lock()
    • method puts the flag up (if none of other goroutines has already put the flag up)
    • notifies others that shared variable is in use
    • if second goroutine also calls Lock() it will be blocked, it has to wait until first goroutine releases the lock
    • we can have any number of goroutines (not just two) competing to put the flag up
  • Unlock() method puts the flag down
    • notifies others that it is done with using shared variable
  • When Unlock() is called, a blocked Lock() can proceed
  • so in a goroutine we have to put Lock() at the beginning of the mutually exclusive region and call Unlock() at the end of it; this will ensure that only one goroutine will be in this mutually exclusive region

Using Sync.Mutex

  • Let's fix the code with double increment
  • Increment operation is now mutually exclusive
var i int = 0
var mut sync.Mutex
func int() {
   mut.Lock()
   i = i + 1
   mut.Unlock()
}
  • this ensures that reading i from memory, incrementing it and writing the result will be done in one go for each task, context won't be switched in the middle of this Go command line

Once Synchronization


  • sync package gives a set of methods that can be used  for synchronization between goroutines

Synchronous Initialization

  • useful idiom
  • let's assume we have multi-threaded program and we need to perform initialization
  • initialization
    • must happen once
    • must happen before everything else
  • the order of the execution of goroutines is not known so how can we determine which goroutine shall perform the initialization?
  • How do you  perform initialization with multiple goroutines?
  • Could perform initialization before starting the goroutines
    • put it at the beginning of main() - the first goroutine
    • sometimes this might not be an option
  • another option is using once.Do()

Sync.Once

  • has one method, once.Do(f)
  • once.Do(f) can be put (called) in many goroutines but function f (e.g. some initialization) will be executed only once
  • all calls to once.Do(f) block until the first returns 
    • this ensures that initialization is executed first

Sync.Once Example

  • make two goroutines, initialization only once
  • each goroutine executes doStuff()
  • doStuff() performs initialization at the beginning of it
  • only one goroutine has to perform initialization within doStuff()
  • setup() should execute only once
    • only one goroutine will executed it, although it is called in both 
  • "hello" should not print until setup() returns
var wg sync.WaitGroup
var on sync.Once

func Setup() {
   fmt.Println("Init")
}

func doStuff() {
   on.Do(Setup)
   fmt.Println("hello")
   wg.Done()
}

func main() {
   wg.Add(2)
   go doStuff()
   go doStuff()
   wg.Wait()
}

Output:
Init // result of Setup()
hello // from one goroutine 
hello // from another goroutine 

Deadlock

  • it should be avoided when coding
  • comes from synchronization dependencies
  • if we have multiple goroutines and synchronization can cause one goroutine's execution to depend on another's

Synchronization Dependencies

  • synchronization causes the execution of different goroutines (e.g. G1 and G2) to depend on each other
G1:
ch <- 1
mut.Unlock()

G2:
x := <- ch 
mut.Lock()

  • there is a blocking dependencies here: 
    • G1 writes to channel and G2 reads from it so G2 is dependent on G1 as G2 blocks until G1 is executed; G2 cannot continue until G1 does something
    • G2 can acquire a lock only after G1 releases it

Deadlock

  • circular dependencies cause all involved goroutines to block 
    • G1 waits for G2
    • G2 waits for G1
  • can be caused by waiting on channels too

Deadlock Example

func dostuff(c1 chan int, c2 chan int) {
   <- c1
   c2 <- 1
   wg.Done()
}
  • read from first channel
    • wait for write onto first channel
  • write to second channel
    • wait for read from second channel

func main() {
   ch1 := make(chan int)
   ch2 := make(chan int)
   wg.Add()
   go doStuff(ch1, ch2)
   go doStuff(ch2, ch1)
   wg.Wait()
}
  • doStuff() argument order is swapped
  • Each goroutine blocked on a channel read
  • nothing can progress => deadlock!

Deadlock Detection

  •  Golang runtime automatically detects when all goroutines are deadlocked:
    • "fatal error: all goroutines are asleep - deadlock!"
  • however, it cannot detect when a subset of goroutines are deadlocked

Dining Philosophers Problem

  • classic concurrency problem, with a deadlock
  • Problem:
    • 5 philosophers sitting at a round table
    • each one has a plate of rice
    • 1 chopstick is placed between each adjacent pair
    • to eat, a philosopher need two chopsticks, one from the left and one from the right of his plate
    • only one philosopher can hold a chopstick at a time
    • not enough chopsticks for everyone to eat at once

Dining Philosophers Issues


    \ O | O / 
O               O
    /    O    \
  • each chopstick is a mutex because philosophers have mutually exclusive access to it
  • each philosopher is associated with a goroutine and two chopsticks

Chopsticks and Philosophers


type ChopS structure {
   sync.Mutex
}

type Philo struct {
   leftCS, rightCS *ChopS
}

Philosopher Eat Method


func (p Philo) eat() {
   for {
      p.leftCS.Lock()
      p.rigthCS.Lock()
      fmt.Println("eating")
      p.rigthCS.Unlock()
      p.leftCS.Unlock()
   }
}

Initialization in Main


CSticks := make([]*ChopS, 5) // create slice

for i:=0; i<5; i++ {
   CSticks[i] = new(ChopS) // BK: why not using &ChopS?
}

philos := make([]*Philo, 5) // create slice

for i:=0; i<5; i++ {
   philos[i] = &Philo{
      CSticks[i], 
      CSticks[(i + 1)%5]
   }
}
  • Initialize chopsticks and philosophers
  • Notice (i + 1)%5
    • this is because we can't use simply i + 1 as for i = 4, we'd have i = 5 but that should be 0 (5 % 5 = 0)

Start the dining in Main


for i := 0; i < 5; i++ {
   go philos[i].eat()
}

  • start each philosopher eating
  • would also need Wait in main() so main() does not return  before philosophers complete eating
  • that was the code written in naive way; naive for we'd have a deadlock!

Deadlock Problem


p.leftCS.Lock()
p.rigthCS.Lock()
fmt.Println("eating")
p.rigthCS.Unlock()
p.leftCS.Unlock()

  • this sequence is executed in each of 5 goroutines
  • these goroutines are ordered in non-deterministic way
  • in one possible interleaving all philosophers might lock their left chopsticks concurrently
  • all chopsticks would be locked => none can lock their right chopsticks cause someone's right chopstick is someone else's left one and is locked
  • in such interleaving we are in a deadlock!

Deadlock Solution

  • Dijkstra's solution: each philosopher picks up lowest numbered chopstick first
  • our code is NOT doing that:
philos[i] = &Philo{
   CSticks[i], 
   CSticks[(i+1)%5]
}
  • In our current code:
    • Philosopher 0 picks first CS0 as left and CS1 as right
      • this is OK as it picked CS with lowest number first
    • Philosopher 1 picks first CS1 as left and CS2 as right
      • this is OK as it picked CS with lowest number first
    • Philosopher 2 picks first CS2 as left and CS3 as right
      • this is OK as it picked CS with lowest number first
    • Philosopher 3 picks first CS3 as left and CS4 as right
      • this is OK as it picked CS with lowest number first
    • Philosopher 4 picks first CS4 as left and CS0 as right
      • this is NOT OK as it picked CS with higher number first
  • Philosopher 4 picks up chopstick 4 before chopstick 0
    • this violates Dijkstra's law
  • With Dijkstra's solution:
    • in one interleaving we may have:
      • Philosopher 0 picks first CS0 as left and CS1 as right
      • Philosopher 1 waits for PH0 to finish so can grab CS1 first; once PH0 finishes eating: Philosopher 1 picks first CS1 as left and CS2 as right
      • PH2 waits for PH1 to finish so can grab CS2 first; once PH1 finishes eating: PH2 picks first CS2 as left and CS3 as right
      • PH3 waits for PH2 to finish so can grab CS3 first; once PH2 finishes eating: PH3 picks first CS3 as left and CS4 as right
      • PH4 waits for PH3 to finish so can grab CS4 first; once PH3 finishes eating: PH4 picks first CS4 as left and CS0 as right
    • Philospher 4 would first pick CS0 and then CS4 BUT Philosopher 0 blocks allowing Philosopher 4 to eat as PH0 has already picked SC0 so PH4 can't pick it; PH4 is blocked to pick CS0 so it won't pick CS4 either allowing Philosopher3 to pick both CS
    • No deadlock, but philosopher 4 may starve
    • Philospher 4 gets the lowest priority most of the time; he has to wait for the other most of the time
    • starvation: some goroutines may not be scheduled to be executed as often as others
    • this happens when we have circular dependency
    • starvation is not ideal but deadlock is the worst