Thursday, 21 February 2019

Concurrency in Go - Notes on Coursera course

Week 1 - Why Use Concurrency?

Parallel Execution


  • big property of Go language is that concurrency is built in the language
  • concurrency and parallelism are two closely related ideas
  • in C, Python etc...you can do concurrent programming but you need to use some 3rd party library 

Parallel Execution

  • parallel is not the same as concurrent
  • why do we need concurrency?
  • parallel execution is when two programs are executed at the same time
  • at some point in time instructions from two programs are executing in parallel
    • at time t and instruction is being performed for both P1 and P2
  • one processor core is executing 1 instruction at a time
  • if we want to have parallel execution, we need two processors or at least two processor cores
  • we need replicated hardware: e.g. CPU1 and CPU2
  • or if we have quad-core CPU then we can run 4 instructions in parallel, at the same time

Why use Parallel Execution?


  • tasks may be completed more quickly
  • you get a better throughput overall
  • example: two piles of dishes to wash
    • two dishwashers can complete twice as fast as one
  • BUT: some tasks must be performed sequentially
    • example: wash dish then dry dish; it has to be in this order
      • must wash before you can dry
  • some tasks are more parallelizable then other
    • we can't parallelize washing and drying of the same dish - even if we have hundred dishwashers and sinks available
  • some things can't be parallelized

Von Neumann Bottleneck

  • running concurrent product is hard
  • students usually learn sequential programming at least at undergraduate curriculum
  • concurrent programming is difficult
  • can we achieve speedup without Parallelism?
  • Solution 1: Design faster processors
    • get speedup without changing software
    • this is what used to be case till recently 
    • hardware has been getting faster and faster but this really stopped now
    • faster = processor clock rate; this used to get doubled every year or so
  • limitation on a speed that we have now is called Von Neumann Bottleneck: delayed access to memory
    • CPU has to access memory to get instructions and data
    • memory is always slower than CPU
    • even if CPU has high clock rate, it has to wait on memory access; lots of time is wasted just waiting for memory access
  • Solution 2:  design processors with more memory
    • build cache, fast memory on chip
    • that's what's traditionally been done till now
    • again, the same software, the same code would run faster
  • But this is not the case anymore

Moore's Law

  • it is not a physical law but more an observation of trends which used to be actual in past
  • Predicted that transistor density would double every two years
  • smaller transistors switch faster
  • exponential increase in density would lead to exponential increase in software speed
  • this is not the case anymore so software engineers had to do something

Power Wall

Power/Temperature Problem

  • the speedup that we get from Moore's Law can't continue because transistors consume power and power is becoming a critical issue - Power Wall
  • transistors consume power whenever they switch 
  • as you increase number of transistors, power used also increases
    • smaller transistors use less power but transistor density scaling is much faster
  • nowadays many devices are portable and are using batteries; it is a limited power available
  • even if devices is plugged to a socket, the issue is a temperature
    • we need fans and heat sink above processors e.g. i7; with out them it would melt
    • high power leads to high temperature
    • air cooling (fans) can only remove so much heat

Dynamic Power

  • generic equation for Power: P = alpha * C * F * V^2
    • alpha - percent of time switching; transistors consume dynamic power when they switch (go from 0 to 1 or from 1 to 0; when they don't switch they don't use dynamic power)
    • C - capacitance (related to size; goes down as transistor shrinks)
    • F - clock frequency (we want to increase frequency)
    • V - voltage swing (from low to high) (0 is 0V and 1 is 5V; we want to reduce the voltage in order to reduce power e.g. 1 to be 0.3V)

Dennard Scaling

  • it is paired together with Moore's Law
  • Voltage should scale with transistor size
    • smaller transistor => voltage should be smaller
  • We want to scale down voltage to keep power consumption law
  • We want to have Dennard Scaling
  • but it can't go forever: 
  • Problem: Voltage can't go too low
    • must stay above threshold voltage otherwise transistor won't switch
    • noise problems occur; if there is a noise in a system, voltage can go + or - a volt or two and we can get wrong readings/errors
  • Problem: doesn't consider leakage power
    • leakage happens when you have thin insulators
    • as you scale everything down, insulators get thinner and leakage increases
  • Dennard scaling must stop

Multi-Core Systems

  • generic equation for Power: P = alpha * C * F * V^2
    • we can't increase frequency
    • frequency still goes up but with very slow rate
  • we then increase cores on chips, without increasing frequency => multi-core systems
    • they are still increasing density
  • if you have 4 core system but are running program only on one core, you are wasting 3 other cores who are idling
  • parallel execution is needed to exploit multi-core systems
  • code made to execute on multiple cores
    • we want to divide our program into smaller chunks which are executed in parallel, on different cores
  • different programs on different cores
  • parallel compilers: 
    • they take sequential code and parallelize it 
    • they chop it in tasks that can be run in parallel
    • this is extremely complex problem
    • this does not work that well
  • concurrent programming: it is actually a programmer who divides code into tasks that will be run in parallel

Concurrent vs Parallel

Concurrent Execution

  • concurrent execution is not necessarily the same as parallel execution
  • concurrent: start and end times overlap
    • they don't literally execute at the same time
    • if we take any point in time we can see that either task1 or task2 is executing
    • concurrent: tasks are competing for chunks of time (CPU clocks) which would execute their own instruction, not of the other competitor task
    • task1: --------       -------                           (time -->)
    • task2:           ----- 
  • parallel: execute at exactly the same time
    • if we take any point in time we can see that both task1 and task2 are executing
    • task1: ----------------                                 (time -->)
    • task2:         ----- 
  • completion time for both tasks is longer in case of concurrent case
  • why do concurrent execution? why just not do task2 after task1 if joint time of execution is the same?

Concurrent vs Parallel

  • parallel tasks need to be executed on different hardware (different cores)
  • concurrent tasks may be executed on the same hardware (e.g. on one core)
    • only one task is actually executed at a time
  • Mapping from tasks to hardware (which task is executed on which core) is not directly controlled by the programmer
    • at least not in Go

Concurrent Programming

  • which tasks are executed on which core
  • programmer determines which tasks can be executed in parallel; e.g. task1 has to do some task before before task2 etc...
  • programmer describes what can be (not what will be) executed in parallel
    • programmer defines possible concurrency
  • what will be executed in parallel depends on how tasks are mapped to hardware
  • mapping task to hardware is done by:
    • operating system
    • Go runtime scheduler

Hiding Latency

  • let's say we do concurrent programming and have one core; so why bother doing concurrency at all?
  • even though we can't do parallel execution we can still get significant performance improvements because tasks typically have to periodically wait for some slow events (e.g. input/output)
  • when CPU's have to communicate with memory, network, file system, video card...this is all slow
  • example: reading from memory
    • X = Y + Z
    • CPU has to read Y and Z from memory, do operation and write result back to memory
    • to read/write from/to memory CPU has to wait hundreds of cycles but in the meantime CPU could use those clocks to execute some useful tasks
  • Other concurrent tasks can operate while one task is waiting

Hardware Mapping 

  • Parallel Execution:
    • task1 --> Core1
    • task2 --> Core2
  • Concurrent Execution:
    • task1 ---/--> Core1
    • task2 --/

Hardware Mapping in Go

  • not under direct control of programmer in Go or any other standard language
  • programmer makes parallelism possible
  • this is very hard task which would slow down programming
  • hard task for it depends on many factors
    • e.g. underlying hardware architecture - programmers should not care about details about it
  • simple arbitrary multi-core system:
core                      core 
   |                            |
cache ------------- cache
                  |
       shared memory
                  |
cache ------------- cache
   |                            |
core                      core 

  • cache - a local cache for each core
  • one big consideration when figuring out hardware mapping is where is the data?
    • if core 1 has to perform some task we'd like data to be in its cache 
    • if that data is in cache of core 2 then core 2 should perform that task
      • task has to be performed on that core where data is
  • in Go we define which task can be executed in parallel


Week 2 - CONCURRENCY BASICS

Processes

  • a lot of concurrent execution ideas came from operating systems

Processes

  • an instance of a running program
  • things unique to a process
    • Memory
      • virtual address space 
      • code 
      • stack - region of memory which usually handles function calls
      • heap - for memory allocations
      • shared libraries - shared between processes
    • Registers - they store 1 value; tiny, super-fast, memory
      • Program counter - tells what instruction is executed now or the next one
      • data registers
      • stack pointer
      • ...

Operating System

  • Allows many processes to execute concurrently
    • makes sure that virtual address space is not overlapping
    • makes sure that all processes get fair share of processor time and resources
    • these processes are run concurrently; they switch quickly...like 20ms
    • from user's perspective it looks all processes run in parallel although they don't - OS ensures the illusion of parallelism

Task Manager

  • shows all running processes
    • foreground
    • background

Scheduling

  • one of the main tasks of OS
  • OS schedules processes for execution
  • Gives the illusion of parallel execution
...
process1
process2
process3
process1
process2
...

  • OS gives fair access to CPU, memory, etc...
  • there are many different scheduling algorithms
    • one is called Round Robin - processes simply alternate in a round fashion (1, 2, 3, 1, 2, 3, 1, 2, 3...) so every process gets the same chunk of time
    • if processes don't have the same priority - processes with high priority would get more CPU time, they would be scheduled more frequently
  • embedded systems: some tasks are critical e.g. breaking - that would be considered a high-priority task while playing stereo music would be a low-priority task

Context Switch

  • control flow changes from one process to another
  • e.g. switching from processA to processB
  • before each switch OS has to save the state of currently running process and restore it when next time its execution gets resumed
  • this state is called the context - all the stuff unique to the process we listed before
  • process "context" must be swapped
...
processA
context switch
processB
context switch
processA
...
  • during context switch periods kernel of the OS is running
  • context switch usually happens after a timer times out

Threads vs Processes

  • there used to be only Processes
  • downsize of processes: process switching time is long (switching between processes is slow because memory access)
  • to speed up this: threads
  • thread is like a process but ig has less contexts; it shares some of the context with other threads in a process
  • Parts of Process context shared among process threads:
    • Virtual memory
    • File descriptors
  • Specific (unique) parts of Process context for each thread:
    • Stack
    • Data registers
    • Code (PC)
  • Switching between threads is faster because there is less context - less data that has to be read/written from/to memory

Goroutines

  • goroutine is basically a like a thread in Go
  • many goroutines execute within a single OS thread
  • Go takes a process with main thread and schedules / switches goroutines within that thread

Go Runtime Scheduler

  • schedules goroutines inside an OS thread (main thread)
  • like a little OS inside a single OS thread
  • Go runs on a main thread and switches goroutines on one thread
  • from OS point of view - there is only a single thread
    Main thread
         |
 Logical processor
         |
    Go runtime
         |
         |-------------------------- 
         /              |           \
Goroutine1         Goroutine2        Goroutine3


  • Go runtime scheduler uses Logical Processor which is mapped to a thread
  • typically there is one Logical Processor which is mapped to a main thread
  • since all these goroutines are running on one thread, we don't have parallelism but concurrency
  • we can increase number of Logical Processors - mapped to different threads and OS can map those threads to different cores
  • program can determine how many Logical Processors will be there; default is 1 (so we'll have concurrent execution of routines) but can be increased (so we might have parallel goroutines execution - if OS schedules running different threads on different cores)

Interleavings

  • writing concurrent code is hard as it's difficult mentally to keep track what's happening on which thread
  • the overall state of the machine is not deterministic
  • in case of crash, it can happen at different places
  • order of execution within task is known
  • order of execution between concurrent tasks is unknown
  • Let's look instructions in two tasks:
Task1 

1: a = b+ c 
2: d = e + f
3: g = h + i

Task2

1: r = s + t
2: u = v + w
3: x = y + z

Possible Interleavings



1: a = b+ c 
             1: r = s + t
2: d = e + f
             2: u = v + w
3: g = h + i
             3: x = y + z

OR

1: a = b+ c 
2: d = e + f
3: g = h + i
             1: r = s + t
             2: u = v + w
             3: x = y + z

  • many interleavings are possible
  • must consider all possibilities
  • ordering is non-deterministic

Race Conditions

  • problem that can happen because of these interleavings that can happen
  • the outcome of the program depends on the interleaving; interleavings are indeterministic => outcome is undeterministic
  • the outcome of the program depends on non-deterministic ordering
  • interleaving can change every time we run a program
  • we want to have determinism: for the same set of inputs we want the same set of outputs
  • we want outcome of the program does not depend on interleavings
1st running - 1st interleaving combination

x = 1
         print x
x = x + 1

Output: 1

2nd running - 2nd interleaving combination

x = 1
x = x + 1
         print x

Output: 2
  • This needs to be avoided, prevented
  • Races occur due to communications: two tasks are communicating through the share variable x
  • if we didn't have this communication there would not be race condition
  • communication between tasks is often unavoidable 

Communication Between Tasks

  • Threads are largely independent but not completely independent
  • Web server, one thread per client
Web                       Client 1
page  --> Web server -->  Client 2
Data                      Client 3
  • Clients are coming at the same time
  • Example: webpage shows visits counter
  • Image processing example: 1 thread per pixel block
thread1 --> [][] --> thread2
  • image processing is parallelizable but there must be some level of communication between threads e.g. in case of blurring

Week 3 - THREADS IN GO

Goroutines

  • to create threads of execution we have to use some of constructs built in Go

Creating a goroutine

  • one goroutine is created automatically to execute the main()
  • other goroutines are created using the go keyword
  • code snippet from some main() function with only one goroutine (the one for main())
    • a is assigned 2 only after foo() returns
    • foo() blocks main()
a = 1
foo()
a = 2 
  • code snippet from some main() function with two goroutines
    • with go foo() we create a new goroutine
    • a might be (but also might not be)  assigned before foo() returns - this depends on how the scheduler would schedule concurrent goroutine
    • foo() is non-blocking for main()- it continues execution without waiting for foo() to return
a = 1
go foo()
a = 2 

Exiting a goroutine

  • goroutine exits when code of the associated with its function is complete
  • when the main goroutine is complete, all other goroutines exit, even if they are not finished
  • a goroutine may not complete its execution because main completes early

Exiting goroutines

  • goroutines are forced to exit when main goroutine exits

Early Exit

func main() {
   go fmt.Printf("New routine")
   fmt.Printf("Main routine")
}

  • we don't know the order of the execution of these routines - this is undeterministic
  • we'd expect to see both messages but actually only the 2nd message is printed (almost always)
  • this is because the scheduler seems to give preference to main routine (this might not be completely true) and also (very true) main() does not block after first message is scheduled to be printed in new goroutine
  • main() is finished before the new goroutine has a chance to start
  • we want main() to wait for other goroutines to complete

Delayed Exit

  • the following snippet shows a hacky/bad solution which might work:
func main() {
   go fmt.Printf("New routine")
   time.Sleep(100 * time.Millisecond)
   fmt.Printf("Main routine")
}
  • we put main goroutine to sleep so another goroutine can resume or start 
  • now is printed 1st and then 2nd message
  • this is a hack, bad solution because we assumed that 100ms would be enough for 2nd goroutine to be completed/executed. But this might be not the case, we don't know how much time it would take. Maybe it would take 100ms for OS to schedule main Go application thread to some other thread and we assume that this would not happen which is bad! Maybe the Go runtime schedules another goroutine - and we assumed it won't - again, bad!
  • timing is non-deterministic
  • we need formal synchronization constructs

Basic Synchronization

Synchronization

  • synchronization is when multiple threads agree on a timing of an event
  • there are global events whose execution is viewed by all threads, simultaneously
  • one goroutine does not know the timing of other goroutines
  • synchronization breaks that - it introduces some global events that every thread sees at the same time
  • this is important in order to restrict interleavings
  • e.g. two possible interleavings showing a rave condition: output depends on the interleaving; interleavings/schedules are non-deterministic

1st running - 1st interleaving combination

Task 1    Task 2  
------    ------

x = 1
         print x
x = x + 1

Output: 1

2nd running - 2nd interleaving combination

x = 1
x = x + 1
         print x

  • we want to know what is the intention of the programmer
  • let's assume they want printing to happen after x in incremented
  • synchronization is used to restrict bad interleavings
  • synchronization:
    • we need some global event (GLOBAL EVENT) so both tasks/threads/gorotuines can see at the same time
    • Task 1: event happens
    • Task 2: we need a conditional execution:
      • if that event happened, goroutine will wait or run;
      • if GLOBAL EVENT happened, we'll execute printing
Task 1          Task 2  
------          ------

x = 1
x = x + 1
GLOBAL EVENT
              
             if GLOBAL EVENT
                print x
  • synchronization is the opposite of concurrency
  • synchronization makes some threads/goroutines to wait
  • synchronization is reducing effective use of hardware
  • but in some cases it is necessary - when things have to happen in order
  • synchronization is a necessary evil

Wait Groups

  • type of synchronization that are common

Sync WaitGroup

  • sync package contains functions to synchronize between goroutines
  • sync.WaitGroup forces a goroutine to wait for other goroutines
  • wait group is like a group of goroutines that our goroutine has to wait for
    • our goroutine will not continue until all goroutines from WaitGroup finish
  • We can wait on 1 or more other goroutines
  • in this example: we want to make main goroutine to wait for 2nd goroutine
func main() {
   go fmt.Printf("New routine")
   fmt.Printf("Main routine")
}

  • WaitGroup contains an internal counter
    • increment a counter for each goroutine we want to wait for
      • if there are 3 gorountines, we'll increase it by 3
    • decrement a counter when each goroutine completes
    • waiting goroutine has to wait till this counter becomes 0

Using WaitGroup

  • main() runs in main goroutine
  • foo() runs in 2nd (worker) goroutine

Main thread:

   var wg sync.WaitGroup
   wg.Add(1)
   go foo(&wg)
   wg.Wait()

Foo thread:

   wg.Done()

  • WaitGroup methods:
    • Add() increments the counter
    • Done() decrements the counter
    • Wait() blocks until counter == 0

WaitGroup Example


func foo(wg *sync.WaitGroup) {
   fmt.Printf("New routine")
   wg.Done()
}

func main() {
   var wg sync.WaitGroup
   wg.Add(1)
       go foo(&wg)
   wg.Wait()
   fmt.Printf("Main routine")
)
  • Output: "New routine" and then "Main routine"

Communication

Goroutine Communication


  • goroutine can wait for each other
  • but they can also communicate with each other
  • generally, goroutines work together to perform a bigger task
  • these goroutines are not completely independent
  • they are doing a small piece of a bigger task
  • e.g. web server
    • makes sense to create a new thread for each new browser connection
    • all these threads share some data; e.g. if some data is sent from one browser, can be seen by another browser => these threads have to cooperate
  • example: find the product of 4 integers
    • make 2 goroutines, each multiplies a pair
    • main goroutine multiplies the 2 results
    • need to send ints from main goroutine to two go subroutines
    • need to send results from subroutines back to main routine

Channels

  • used for communication between goroutines
  • used to transfer data between goroutines
  • channels are typed
    • one channel can handle integers, another strings etc...
  • use make() to create a channel:
c := make(chan int)


  • send and receive data using arrow operator (<-)
    • send data on a channel: 
c <- 3
    • receive data from a channel: 
x := <- c

Channel Example

func prod(v1 int, v2 int, c chan int) {
   c <- v1 * v2
}

func main() {
   c := make(chain int)
   go prod(1, 2, c)
   go prod(3, 4, c)
   a := <- c
   b := <- c
   fmt.Println(a * b)
}

  • a gets whatever first comes out of the channel
  • b gets whatever second comes out of the channel
  • There is also another way to send data between goroutines - via passing arguments to it when it is starting

Blocking in Channels

Unbuffered Channel

  • by default, when channel is created (with make()), it is unbuffered
    • default is unbuffered
  • unbuffered channels can't hold data in transit
  • the implications are: 
    • sending blocks until the data is received
    • receiving blocks until data is sent
Task 1: c <- 3

one hour later...

Task 2: x := <- c

Blocking and Synchronization


  • this is also doing a synchronization, just like a WaitGroup
    • Task2 has to wait till Task1 sends data
  • channel communication is synchronous
  • Blocking is the same as waiting for communication
  • this kind of communication can be used for pure synchronization - we can freely drop the data - throw away received result:
Task 1: c <- 3

one hour later...

Task 2: <- c

  • Task 2 is receiving the data but is throwing it away
  • All we do here is synchronizing two tasks: Task 2 has to wait for Task 1 to send the data
  • this is another way to implement WaitGroup's Wait()

Buffered Channel 

Channel Capacity

  • channels by default are unbuffered
    • they have no capacity to hold the data
    • unbuffered channels have capacity 0
  • channels can have some capacity
  • capacity is the number of objects channel can hold in transit
  • to make channel a buffered, we can use an optional argument of make() function - it's second argument would define a channel capacity
c := make(chan int, 3)
  • default size is 0
  • channel with some capacity still blocks under some conditions
  • sending only blocks if buffer is full
    • e.g. if we have 3 sends with no receives, new sends will be blocked
    • as soon as a new receive happens, the next, 4th send will unblock
  • receiving only blocks if buffer is empty - if there is nothing in a buffer, channel read operation will block until there is something to be read from the channel

Channel Blocking, Receive

  • channel with capacity 1
Task 1 -------> [    ] --------> Task 2

Task 1:
c <- 3

Task2:
a := <- c
b := <-c 

  • first receive blocks (in Task 2) until send occurs (in Task 1)
  • second receive blocks forever (in Task 2)

Channel Blocking, Send

  • channel with capacity 1
Task 1 -------> [    ] --------> Task 2

Task 1:
c <- 3
c <- 4

Task2:
a := <- c
  • second send blocks till first receive is done

Use of Buffering

  • buffering is used when producer and consumer work in different speeds
  • sender and receiver do not need to operate at exactly the same speed
  • Producer is generating data 
    • e.g. reading sensors; taking audio samples
    • can do it continuous
  • Consumer is processing data
Producer -------> [|||||||||] --------> Consumer
  • Buffer & blocking help equalizing speeds of producer(sender) and consumer(receiver)
  • if producer is producing data faster than consumer can consume, producer has to block, to slow down producing data - when buffer is full
  • if consumer is consuming data faster than producer can produce, consumer has to block - when there is no data in a buffer, when buffer is empty

Week 4 -  SYNCHRONIZED COMMUNICATION

Blocking on Channels

Iterating through a Channel


  • common operation on channel is to iteratively read from channel
  • this would happen when we have producer and consumer
    • consumer wants to continuously receive data from a channel and process it
  • there is a construct in Go which is made specifically to do this:
for i:= range c {
   fmt.Println(i)
}


  • continues to read from channel c
  • one iteration each time a new value is received (is available in channel)
  • i is assigned to the read value
  • this for loop could be an infinite loop
    • to quit the loop sender can close the channel
    • loop quits when sender calls close(c)
    • this is another method that can be performed on a channel
    • when sender closes channel that gives a signal to the receiver - for loop ends
    • if we use range to read from channel then we need to call close(c) 

Receiving from Multiple Goroutines

  • another common scenario is reading from multiple goroutines or multiple channels that can be associated with multiple goroutines
  • multiple channels may be used to receive from multiple sources
  • let's say we have 3 goroutines that are communicating with 2 channels
    • e.g. task3 tries to compute a product of two numbers, each coming from a different channel

task1 ----- c1 -----> task3 <----- c2 ------ task2

  • data from both sources might be needed
  • read sequentially
a := <- c1
b := <- c2
fmt.Println(a * b)

  • this is blocking
    • T3 first had to wait data to appear on c1 and then for data to appear on c2
  • eventually, T3 will both data and complete task

Select Statement

  • sometimes task need data from either channel, from either c1 OR c2, 
  • if we have a choice of multiple channels and want to use the data that comes first ("first come, first served"), no matter which channel from
    • we don't want to read from all channels
    • we don't want to block (wait on data) on some channel e.g. c1 as it might never happen - in the meantime data might be available on some other channel e.g. c2
    • we don't know on which channel data will come first
    • in this case we'll use the select statement:
select {
   case a = <- c1:
      fmt.Println(a)
   case b = <- c2:
      fmt.Println(b)
}
  • whichever case happens first, its print will be executed

Select Send or Receive

  • select allows choosing data from several channels
  • we don't have to block on all channels, we practically block only on channel which will first return data
  • we are blocking here on receiving data but we can also block on sending data
  • with select, case can be on receiving or sending data:
select {
   case a <- inchan:
      fmt.Println(a)
   case outchan <- b:
      fmt.Println("sent b")
}
  • if something comes on inchan, a gets that value
  • we also want to write b value to outchan 
  • inchan is blocked if noone is writing to it
  • outchan is blocked if noone is reading from it
  • either of these two actions (cases) can happen first e.g. inchan might have available data before outchan is emptied so can receive new value
  • whichever thing happens first that case will be executed
  • if some data comes to inchan before data on outchan becomes available then case 1 will be executed

Select with Abort Channel

  • one common use of select is to have a separate abort channel
  • producer-consumer scenario
  • use select with a separate abort channel
  • may want to receive data until an abort signal is received
for { // infinite loop
   select {
      case a <- c:
         fmt.Println(a) // process data
      case <-abort: 
         return             // abort signal received
}
  • if anything comes to an abort channel we quit the loop
  • we don't pay attention which data is coming on abort channel - we dismiss it 

Dafault Select

  • we have regular cases - e.g. waiting for some channels e.g. c1 and c2
  • default case: executed if no other cases are satisfied
  • in this case it will not block at all!
select {
   case a = <- c1:
      fmt.Println(a)
   case b = <- c2:
      fmt.Println(b)
   default:
      fmt.Println("nop")
}

Mutual Exclusion

Goroutines Sharing Variables

  • sharing variables between goroutines (concurrently) can cause problems 
  • two goroutnes writing to the same shared variable can interfere with each other
  • function/goroutine is said to be concurrency-safe if can be executed concurrently with other goroutines without interfering improperly with them
    • e.g. it will not alterate variables in other goroutines in some unexpected/unintended/unsafe way

Variable Sharing Example


var i int = 0 
var wg sync.WaitGroup

func inc() {
   i = i + 1
   wg.Done()
}

func main() {
   wg.Add(2)
   go inc()
   go inc()
   wg.Wait()
   fmt.Println(i)
}

  • two goroutines write to i
  • i should equal 2
  • BUT this doesn't always happen

Possible Interleavings


i = 0
Task1: i = i + 1
i = 1
Task2: i = i + 1
i = 2

i = 0
Task2: i = i + 1
i = 1
Task1: i = i + 1
i = 2

  • seems like there is no problem
  • BUT that is deceiving as there are more interleavings than we think

Granularity of Concurrency

  • concurrency is at the machine code level, NOT the source code level!
  • intreleavings are not of Go source code instructions, what actually gets interleaved is underlying machine code 
  • Go source code is compiled to machine code
  • machine instructions get interleaved
  • interleaving can start in the middle of some Go source code instruction
  • i = i + 1 might be mapped into three machine code instructions:
    • read i (read value from memory and place it in the registry)
    • increment (in registry)
    • write i (write it back to memory)
  • interleaving happens at this level
  • interleaving machine instructions causes unexpected problems

Interleaving Machine Instructions

  • Both tasks read 0 for i value
  • each task is using its own registry
  • both tasks are sharing variable i
i == 0
Task1: read i // 0
Task2: read i // 0
Task1: inc // 1 
Task1: write // 1
i == 1
Task2: inc  // 1
Task2: write // overwrites 1 with the same value - 1
i == 1

Mutex

  • how to we do sharing of data correctly between two goroutines?
  • don't let two goroutines write to a shared variable at the same time 
  • we need to restrict possible interleavings in such way that they don't write to shared variable at the same time
  • access to shared variables cannot be interleaved
  • Mutual Exclusion
    • declare code segments in different goroutines which cannot execute concurrently => they cannot be interleaved
  • writing to shared variables should be mutually exclusive

Sync.Mutex

  • A Mutex ensures mutual exclusion
  • uses a binary semaphore
    • if flag is up
      • shared variable is in use by somebody 
      • when one goroutine is writing into it
      • only one goroutine can write into variable at a time
      • once goroutine is using shared variable it has to put the flag up
      • once goroutine is done with using shared variable it has to put the flag down
    • if flag is down
      • shared variable is available
      • if another goroutine see that flag is down it knows it can use the shared variable but first it has to put the flag up

Mutex Methods

  • putting flag up and down are implemented in methods lock and unlock
  • Lock()
    • method puts the flag up (if none of other goroutines has already put the flag up)
    • notifies others that shared variable is in use
    • if second goroutine also calls Lock() it will be blocked, it has to wait until first goroutine releases the lock
    • we can have any number of goroutines (not just two) competing to put the flag up
  • Unlock() method puts the flag down
    • notifies others that it is done with using shared variable
  • When Unlock() is called, a blocked Lock() can proceed
  • so in a goroutine we have to put Lock() at the beginning of the mutually exclusive region and call Unlock() at the end of it; this will ensure that only one goroutine will be in this mutually exclusive region

Using Sync.Mutex

  • Let's fix the code with double increment
  • Increment operation is now mutually exclusive
var i int = 0
var mut sync.Mutex
func int() {
   mut.Lock()
   i = i + 1
   mut.Unlock()
}
  • this ensures that reading i from memory, incrementing it and writing the result will be done in one go for each task, context won't be switched in the middle of this Go command line

Once Synchronization

Synchronous Initialization






















Tuesday, 19 February 2019

Functions, Methods, and Interfaces in Go - Notes on Coursera course

Week 1 - FUNCTIONS AND ORGANIZATION


Why Use Functions?


  • function - set of instructions grouped together, usually with a name
  • all programs in Go have to have function main() - where program's execution begins
func main() {
   fmt.Printf("Hello, world!")
}
  • main() function is a special in a sense we don't call it explicitly which is not the case for other functions
func PrintHello() {
   fmt.Printf("Hello, world!")
}

func main() {
   PrintHello()
}
  • function declaration starts with keyword func, function name, arguments, return type and body
  • Why using functions?
    • reusability (within the same project or across multiple projects via libraries)
    • abstraction
      • hiding details of implementation; only input-output behaviour is what we need to know (we look at function as a black box)
      • improves understandability
        • naming 
        • grouping of function calls


Function Parameters and Return Values

  • functions need some data to work on - they can be passed via function parameters
  • parameters are listed in parentheses after function name
  • arguments are supplied in the call
func foo(x int, y int) {
   return x * y
}

func main() {
   foo(2, 3)
}
  • Parameter Options
    • if no parameters are needed, put nothing in parentheses; you still need parentheses
    • list arguments of same type
func foo(x, y int) {
}
  • Return Values
    •  Functions can return value as result
    • type of return value after parameters in declaration
    • function call is on the right side of an assignment
func foo(x int) int {
   return x + 1

y := foo(1) // y gets assigned value 2
  • Functions can have multiple return values
    • their types must be listed in the declaration
func foo2(x int) (int, int) {
   return x, x + 1
}

a, b := foo2(3) // a is assigned 3, b 4

Call by Value, Reference

  • how are arguments passed to parameters during the function call
  • Call by Value
    • arguments are copied to parameters
    • data used is the copy of the original;
    • called function can't interfere with the original variables in the calling function
    • modifying parameters has no effect outside the function
func foo(y int) {
   return y + 1
}

func main() {
   x := 2
   foo(x)
   fmt.Print(x) // still 2
}
  • tradeoffs of call by value
    • advantage: 
      • data encapsulation
      • function variables are changed only inside the function
    • disadvantage:
      • copying time
      • large object may take a long time to copy
  • Call by Reference
    •  programmer can pass a pointer as an argument
    • called function has direct access to caller variable in memory
    • we don't pass data but a pointer to data (the address of data)
func foo(y *int) {
   *y = *y + 1
}

func main() {
   x := 2
   foo(&x) // foo gets a copy of the address where x is in memory; foo is modifying x
   fmt.Print(x) // 3
}
  • tradeoffs of Call by Reference
    • advantages:
      • copying times; only the address is copied, not data
    • disadvantage:
      • (no) data encapsulation
      • called function can change data/variables in the calling function

Passing Arrays and Slices


  • how to pass an array to a function?
  • array arguments are copied
    • array can be big so this can be a problem
func foo(x [3]int) int { // NOTE size of array => argument is of type array (not slice)
   return x[0]
}

func main() {
   a := [3]int{1, 2, 3}
   fmt.Print(foo(a)) // entire array is copied!
}
  • instead, we can pass a reference - a pointer to the array (array pointer)
    • this can be messy for having to reference and dereference
func foo(x *[3]int) {
   (*x)[0] = (*x)[0] + 1
}

func main() {
   a := [3]int{1, 2, 3}
   foo(&a)
   fmt.Print(a) // [2 2 3]
}
  • proper Go approach is to pass slices instead of arrays
  • in Go: get used to use slices instead of arrays! you can almost always use slices instead of arrays
  • slice is a structure that contains 3 things:
    • pointer to an underlying array
    • length
    • capacity
  • if we pass a slice, we actually pass a copy of the pointer to an array
    • function can use that pointer directly, without the need to dereference/reference explicitly
func foo(sli int) int {
   sli[0] = sli[0] + 1
}

func main() {
   a := []int {1, 2, 3} // NOTE no size is specified between [] => this is a SLICE declaration!!!
   foo(a)
   fmt.Print(a) [2 2 3]
}
  • instead of arrays or pointers to arrays pass slices to functions!!!

Well-Written Functions

  • how you should write functions so your code is well-organized and understandable
  • Understandability
    • code = functions + data
    • if you are asked to find a feature, you should be able find it quickly; also, your peer reviewers should also be able to find it quickly
    • if you are asked where data is used (defined, accessed, modified), you should be able to find it quickly
  • Debugging Principles
    • e.g. code crashes inside a function
    • two options for a cause:
      • function is written incorrectly
        • e.g. sorts a slice in a wrong order
      • data that function uses is incorrect
        • e.g. sorts slice correctly but slice has wrong elements in it
  • Supporting Debugging
    • functions need to be understandable
      • determine if actual bahaviour matches desired behaviour - this shall be easy to do
    • data needs to be traceable
      • we should be able to trace where did data come from
      • global variables complicate this
        • anybody can write into them
        • otherwise we know that calling function passed data...

Guidelines for Functions

  • Function Naming
    • behaviour should be understandable at a glance
    • parameter naming counts too
func ProcessArray(a []int) float {}  // BAD! What kind or processing? What is the meaning of a?

func ComputeRMS(samples []float) float {} // GOOD. We know what f-n does, and what is the argument

  • Functional Cohesion
    • function should perform only one "operation"
    • an "operation" depends on context
    • even from a function name we can see that it does only one thing
      • if you need to put two or more actions in a function name, that should raise an alert
    • merging behaviours makes code complicated
  • Reduce number of parameters
    • use few parameters
    • debugging requires tracing function input data
      • it's more difficult with large number of parameters
    • if you function has many parameters it may have bad functional cohesion - it does too many things; each functionality ("operation") needs its own set of inputs; 
    • group related arguments into a structure
      • e.g. if we want to pass 3 points in space to a function we might need to pass 2 x 3 = 9 arguments
      • improvement: define a struct point and pass 3 points: 
type point struct {x, y, z float}
  • best: define a struct triangle and pass a triangle as a single argument:
type triangle struct {p1, p2, p3 point}


Function Guidelines

  • Function Complexity
    • function length is the most obvious measure
    • functions should not be complex; easier to debug
    • short functions can be complex too
  • Function length
    • write complicated code with simple functions
    • function call hierarchy
  • Control-flow Complexity
    • how many control-flow paths are there in a function?
    • paths from the start to the end of function
    • if there are no if statements: one control-flow path
    • if there is if statement: two control-flow paths
    • control-flow describes conditional paths
    • functional hierarchy can reduce control-flow complexity: separate conditional code into separate functions

Week 2 - FUNCTION TYPES

First-Class Values

  • Go treats functions as a first-class values
  • functions are First-class
  • Go implements some features of functional programming
  • Go treats functions as any other type like int, float...
    • variables can be declared to be a function type and then assigned a function
    • functions can be created dynamically, on the fly
      • so far we've been creating them dynamically, in the global space we'd use func
      • but they can be created dynamically, inside other functions
    • functions can be passed as arguments to functions
    • functions can be returned from functions
    • functions can be stored in structs

Variables as Functions 

  • declare variable as function
    • variable becomes an alias (another name) for that function
var funcVar func(int) int // "func(int) int" is function signature

func incFn(x int) int {
   return x + 1
}

func main() {
   funcVar = incFn // in assignment just use function name, without () as we're not calling function here
   fmt.Print(funcVar(1))
}

Functions as Arguments

  • functions can be passed to other functions as arguments
    • we have to use keyword func
func applyIt(afunc func(int) int, val int) int {
   return afunc(val)
}


func incFn(x int) int { return x + 1 }
func decFn(x int) int { return x - 1 }

func main() {
   fmt.Println(applyIt(incFn, 2)) // 3
   fmt.Println(applyIt(decFn, 2)) // 1
}

Anonymous Functions

  • functions don't need to have names
  • functions with no name are called anonymous 
  • when passing function to another function you usually don't need to name passed function
    • function is created right there at the call 
    • this comes from lambda calculus
func main() {
   v := applyIt(func (x int) int { return x + 1}, 2);
   fmt.Println(v) // 3
}

Returning Functions

Functions as Return Values

  • functions can create functions and return them
    • new function can have a different set of parameters; controllable parameters
  •  example: Distance to Origin function
    • takes a point (x, y coordinates)
    • returns distance to origin
    • what if I want to change the origin?
      • option1: origin becomes a parameter
      • option 2: we create a function for each origin (o_x, o_y)
        • origin is build in the returned function
        • func (float64, float64) float64 is the type of the function returned
func makeDistOrigin(o_x, o_y float64) func (float64, float64) float64 {
   fn := func (x, y float64) float64 {
      return math.Sqrt(math.Pow(o_x, 2) + math.Pow(o_y, 2))
  }
  return fn
}

Special-Purpose Functions

  • we make special-purpose functions by giving them parameters (e.g. Dist1 and Dist2 have different origins)

func main() {
   Dist1 := MakeDistOrigin(0, 0)
   Dist2 := MakeDistOrigin(2, 2)
   fmt.Println(Dist1(2, 2))
   fmt.Println(Dist2(2, 2))
}

Environment (Scope) of a Function

  • every function has an environment ("scope")
  • set of all names that are valid inside a function; that you can refer inside the function
  • environment includes names defined locally, in the function
  • Lexical Scoping: Go is lexically scoped 
    • environment includes names defined in block where the function is defined
    • BK: this is called variable capturing 
    • when you start passing around functions as arguments, the environment goes along with functions 
var x int
func foo(y int) {
   z := 1
   ...
}

Closure

  • function + its environment, together
  • in Go, it is implemented as a structure which contains pointer to function and pointer to environment
  • when you pass function as an argument to another function, you pass its environment with it
  • at the place where this function is executed, it still has an access to variables from the place where it was defined
  • e.g. o_x and o_y are carried with returned function, and are accessible when its called later, wherever and whenever is called
  • variables are coming from the closure, from the environment where function was defined
func makeDistOrigin(o_x, o_y float64) func (float64, float64) float64 {
   fn := func (x, y float64) float64 {
      return math.Sqrt(math.Pow(o_x, 2) + math.Pow(o_y, 2))
   }
   return fn
}


Variadic and Deferred

Variable Argument Number

  • it is possible to pass variable number or arguments to function; such function is called variadic
  • to specify this use ellipsis character: ...
  • such argument is treated as a slice inside the function
func getMax(vals ...int) int {
   maxV := -1
   for _, v := range vals {
      if v > maxV {
         maxV = v 
      }
   }
   return maxV


  • How to pass list of arguments to variadic function?
    • you can pass a comma-separated list of arguments
    • you can pass a slice
      • need a ... suffix
func main() {
   fmt.Println(getMax(1, 2, 6, 4))
   vslice := []int {1, 3, 6, 4}
   fmt.Println(getMax(vslice...))
}

Deferred Function Calls


  • call can be deferred until surrounding function completes
  • they don't get executed where they are explicitly called but after the surrounding function is done
  • typically used for cleanup activities
  • use keyword defer
func main() {
   defer fmt.Println("Bye!") // "Bye!" printed after "Hello!"
   fmt.Println("Hello!")
}
  • the arguments are NOT evaluated in a deferred way, they are evaluated immediately but the call is deferred
  • if you pass an argument, it is evaluated right there where defer statement is 
func main() {
   i := 1
   defer fmt.Println(i + 1) // 2 is printed second time
   i++ // 2
   fmt.Println(i) // 2 is printed first time
}


Week 3 - OBJECT ORIENTATION IN GO

Classes and Encapsulation

Classes 

  • What is OOP?
  • Go supports OOP
  • It does not have classes but something equivalent (structs)
  • What is class? Collection of data fields and functions that share a well-defined responsibility (they are all related to the same concept)
    • function in a class is called a method
  • Example: Point class
    • used in geometry program
    • data: x and y coordinate
    • functions:
      • DistToOrigin(), Quadrant()
      • AddXOffset(), AddYOffset()
      • SetX(), SetY()
  • class is a template; contains fields, not data

Object

  • instance of the class
  • contains data
  • Example: instances of Point class

Encapsulation

  • associated with OOP (and generally, with abstraction)
  • if there is a program using your class, you want to hide details
  • you want to prevent someone changing internal data; therefore we provide public methods that shall be used to modify the state of the object from the outside
  • Example: double distance to origin (double x and y)
    • option 1 (safe): expose method DoubleDist() which doubles x and y internally
    • option 2 (not safe): allow programmer to access x and y directly; but programmer can make mistake if for example they double x but forget to double y
    • by exposing methods we prevent such mistakes and object will always be in good state

Support for Classes (1)

No "class" keyword


  • there is no "class" keyword in Go
  • most OO languages have class keyword
  • Data fields and methods are defined inside a class block
  • example in Python:

class Point:
   def _init_(self, xval, yval):
      self.x = xval
      self.y = yval


Associating Methods with Data

  • Go has different way of associating methods with data
  • Go is using "receiver types"
  • data is some type 
  • method has a receiver type that it is associated with
  • BK: the approach is the same as in C: we have some struct type and if some function has to deal with it, we just pass to it a pointer to that struct. The terminology reminds me of Objective-C
  • type and function have to be defined in the same package
  • when we call a method we use dot notation
  • example: we want to associate function Double with our custom type MyInt
    • MyInt is the receiver type - it is specified before the name of the function
    • mi is the receiver object (instance of the receiver type) that double would be called on
type MyInt int

func (mi MyInt) Double () int { 
   return int(mi *2) 
}

func main() {
   v := MyInt(3)
   fmt.Println(v.Double())
}

  • Double() could be defined for multiple types (to have multiple receiver types); Go looks what's the type left of the dot (.) operator to find out what is the receiver type (and so which Double() function to call)
  • in the example above mi becomes an implicit argument of function Double() (just like this - pointer to the current instance of the class - is an implicit argument in C++ classes' methods)

Implicit Method Arguments

  • although it seems that Double() takes no arguments, there is one implicit (hidden) argument: and instance (object) of the receiver type
  • object v is an implicit argument to the method
    • call by value (that's how argument passing is done in Go)
    • a copy of v is made and passed to the function

Support for Classes (2)

  • in a normal OOP language lots of different data (fields in class) is associated with any number of methods
  • the same can be done in Go: we'll just make a struct with lots of various data and make it to be a receiver type
  • in struct you can group together arbitrary number of arbitrary data 

Structs, again

  • struct types compose data fields 
    • this is traditional feature of classes
type Point struct {
   x float64
   y float64
}

Structs with Methods

  • structs and methods together allow arbitrary data and functions to be composed
func (p Point) DistToOrig() {
   t := math.Pow(p.x, 2) + math.Pow(p.y, 2)
   return math.Sqrt(t)
}

func main() {
   p1 := Point(3, 4)
   fmt.Println(p1.DistToOrig())
}

Encapsulation

Controlling Access

  • Go provides lots of different support of encapsulation and how to keep data private
  • we want to be able to control data access
    • we want people to use data in a way we define - via functions/methods
    • we can define a set of public functions that allow another/external package to access the data
package data

var x int = 1

func PrintX() {
   fmt.Println(x)
}
-------------------
package main

import "data"

func main() {
   data.PrintX()
}
  • PrintX function starts with capital leter => it gets exported
  • package main can access (see) x only through that exported function 
  • x can't be modified externally in the example above but we can allow that if we export another method which allows that

Controlling Access to Structs

  • we can do something similar to struct members
  • hide fields of structs by starting field name with lower-case letter
  • define public methods which access hidden data
  • example: need InitMe() to assign hidden data fields
package data

type Point struct {
   x float64
   y float64
}

func (p *Point) InitMe(xn, yn float64) {
   p.x = xn
   p.y = yn
}

func (p *Point) Scale (v float64) {
   p.x = p.x * v
   p.y = p.y * v
}

func (p *Point) PrintMe() {
   fmt.Println(p.x, p.y)
}

-----------------------------
package main
import "data"

func main() {
   var p data.Point
   p.InitMe(3, 4)
   p.Scale(2)
   p.PrintMe()
}

  • access to hidden fields only through public methods

Point Receivers

Limitations of Methods

  • there are some limitations to the process of associating methods with receiver types we described above
(1)  Method cannot change the state of receiver object as it's passed by value 
  • receiver type/object is implicitly passed to the method
  • it is passed by value (like any function argument in Go) => method receives only its copy => method can't change receiver object (method can't modify  the data inside the receiver)
  • example: OffsetX() should increase coordinate x in object p1
func main() {
   p1 := Point(3, 4)
   p1.OffsetX(5)  // only temp copy of p1 inside OffsetX() is changed
}
(2) Large Receivers

  • if receiver object is large, lots of copying is required when you make a call
  • all object is copied onto the stack
type image [100][100]int
func main(){
   i1 := GrabImage()
   i1.BlurImage() // 10000 bytes gets copied on stack - this can be slow
}

Solution:

Pointer Receivers

  • instead of passing objects by value we can pass by reference (pointer)
  • instead of using regular types for receiver types, we can use pointer to those types as receiver type
func (p *Point) OffsetX(v float64) {
   p.x = p.x + v
}

Point Receivers, Referencing, Dereferencing

No Need to Dereference

  • when using a pointer receiver there is no need to perform explicit dereferencing (as in the previous example where Point is referenced as p, not *p)
  • dereferencing is automatic with . operator

No Need to Reference


func main() {
   p := Point(3, 4)
   p.OffsetX(5)  // no need to do something like (&p).OffsetX()
   fmt.Println(p.x)
}

Using Pointer Receivers

  • Good programming practice
    • all methods for a type have pointer receivers, or 
    • all methods for a type have non-pointer receivers
  • This is for mixing pointer/non-pointer receivers for a type will get confusing
  • pointer receiver allows modification

Week 4 - INTERFACES FOR ABSTRACTION

Polymorphism

  • one of OOP properties
  • ability of an object to have different "forms" depending on the context
  • example: Area() function - the function with the same name can do the same thing but in a different way, depending on the context
    • rectangle: area = base * height
    • triangle: area = 0.5 * base * height
  • these two functions
    • at high level of abstraction, they are identical in a way what they do; they do the same thing - compute the area
    • at low level of abstraction, they are different, in a way how do they compute the area
  • We need Go's support for polymorphism to achieve this
  • How is polymorphism implemented in traditional OOP languages?

Inheritance

  • Go does NOT have inheritance
  • there are parent-child (base-derived; superclass-subsclass) relations between classes
    • superclass is a top level class
    • subclass extends from superclass, subclass inherits data and methods from a superclass
    • Example: 
      • Speaker superclass - represents anything that can make noise/speak
        • Speak() method prints "<noise>"
      • Subclasses Cat and Dog
        • Also have the Speak() method, inherited from the Speaker superclass
        • Cat and Dog are different forms of Speaker

Overriding

  • subclass redefines a method inherited from the superclass 
  • example: Speaker, Cat, Dog
    • Speaker Speak() prints "<noise>"
    • Cat Speak() prints "meow"
    • Dog Speak() prints "woof"
  • without overriding Cat and Dog Speak() methods would print "<noise>" but overriding allows them to redefine Speak() methods to print what they want
  • Speak() is polymorphic: in context of Cat it prints "meow" and in context of Dog, it prints "woof"
  • although overriden in subclasses, the method keeps the same signature

Interfaces

  • interface is a concept used in Go to help us get polymorphism
  • we don't need inheritance or overriding to get polymorphism, we can get it with interfaces
  • interface is a set of method signatures (name, parameters, return values)
    • implementation is not defined
  • it is used to express conceptual similarity between types
  • example: Shape2D interface has two methods: Area() and Perimeter()
    • all 2D shapes must have Area() and Perimeter()
    • any type that has these two methods can be considered to be 2D shape

Satisfying the Interface

  • types satisfies the interface if type defines all methods specified in the interface
    • same method signatures (names, args, return values)
    • example: Rectangle and Triangle types satisfy the Shape2D interface if:
      • have Area() and Perimeter() methods
      • Additional methods are OK
  • similar to inheritance with overriding

Defining an Interface Type

  • use keyword interface

type Shape2D interface {
   Area() float64
   Perimeter() float64

type Triangle {
   ...
}

func (t Triangle) Area() float 64 {
   ...
}

func (t Triangle) Perimeter() float 64 {
   ...
}

  • in the example above, we can say that Triangle implements (satisfies) interface Shape2D
  • we don't state explicitly that Triangle implements interface; we just present to compiler the interface and methods and it infers which types can are satisfying which interfaces
  • we don't care what data is in Triangle, which fields/properties it has; it only matters that that type is set as receiver type for functions with the same signature as defined in the interface

Interface vs. Concrete Types

Concrete vs Interface Types

  • concrete and interface types are fundamentally different
  • Concrete Type
    • a regular type
    • Specifies the exact representation of the data and methods 
    • fully specified
    • complete method implementation included
    • it has data which is associated with it
  • Interface type:
    • just specifies some method signatures
    • not data
    • implementations are abstracted
    • interface type eventually gets mapped to a concrete type

Interface Values

  • can be treated like other values
    • assigned to variables
    • passed, returned
  • interface values have two components
    • Dynamic Type: concrete type which it is assigned to (like a class which implements an interface in classic OOP languages)
    • Dynamic Value: value of the dynamic type (like an instance of the class which implements an interface in classic OOP languages)
  • interface value is actually a pair (dynamic type, dynamic value)

Defining an Interface Type


type Speaker interface {
   Speak()
}

type Dog struct {
   name string
}

func (d Dog) Speak() {
   fmt.Println(d.name)
}

func main() {
   var s1 Speaker // interface value
   var d1 Dog{"Brian"} // 
   s1 = d1 // legal as Dog satisfies interface Speaker; 
   s1.Speak()
}
  • interface type is Speaker
  • interface value is s1
  • dynamic type is Dog
  • dynamic value is d1
  • pair is (Dog, d1)

Interface with Nil Dynamic Value

  • an interface can have a nil dynamic value (no dynamic value)
var s1 Speaker
var d1 *Dog // pointer
s1 = d1 // legal
  • d1 is pointer to Dog, it is not concrete object, has no data in it; d1 has no concrete value yet
  • s1 has dynamic type - Dog, but has NO dynamic value

Nil Dynamic Value

  • interface with dynamic type but not dynamic value; it is legal to call interface methods on nil dynamic value
  • can still call the Speak() method of s1
  • doesn't need a dynamic value to call interface methods
  • need to check inside the method
func(d *Dog) Speak() {
   if d == nil { // does it have dynamic value or not?
      fmt.Println("<noise>")
   } else {
      fmt.Println(d.name)
   }
}

var s1 Speaker
var d1 *Dog
s1 = d1
s1.Speak() // it is legal to call function on a non-assigned pointer!

Nil Interface Value

  • interface with nil dynamic type
  • very different from an interface with a nil dynamic value
  • we can't call interface methods as there is no underlying concrete type with methods to call, there is no method implementations
  • Nil dynamic value and valid dynamic type
    • can call a method since type is known
    • example:
var s1 Speaker
var d1 *Dog
s1 = d1
s1.Speak()
  • Nil dynamic type
    • cannot call a method, runtime error
var s1 Speaker
s1.Speak() // error!


Using Interfaces

  • why would we need interfaces? why are they used?

Ways to Use an Interface

  • need a function which takes multiple types of parameter
    • e.g. we want function to take either type int or type float
  • we want function foo() to accept parameter of type X or type Y
  • we can:
    • define interface Z
    • make types X and Y satisfy Z (BK: this is something like class extensions...as any type can at any time be extended to satisfy any interface; this can happen after some concrete type is defined)
    • make Z to be the type of the foo() argument
    • interface methods must be those needed (called) by foo()

 Pool in a Yard

  • I need to put a pool in my yard
  • Pool needs to fit in my yard
    • total area must be limited
  • Pool needs to be fenced
    • total perimeters must be limited
  • Need to determine if a pool shape satisfies criteria
  • FitInYard() bool
    • takes a shape as argument
    • returns true if the shape satisfies criteria

FitInYard()

  • many possible shape types
    • rectangle, triangle, circle, etc...
  • FitInYard() should take many shape types
  • Valid shape types must have:
    • Area()
    • Perimeter()
  • Any shape with these methods is OK

Interface for Shapes

type Shape2D interface {
   Area() float64
   Perimeter float64
}

type Triangle {...}
func (t Triangle) Area() float64 {...}
func (t Triangle) Perimeter() float64 {...}

type Rectangle {...}
func (r Rectangle) Area() float64 {...}
func (r Rectangle) Perimeter() float64 {...}

  • Rectangle and Triangle satisfy interface Shape2D
func FitInYard(s Shape2D) bool {
   if (s.Area() < 100 && s.Perimeter() < 100) {
      return true}
   else {
      return false
   }
}
  • parameter is any type that satisfies Shape2D interface

Empty Interface

  • specifies no methods
  • all types satisfy the empty interface
  • use it to have a function accept any type as a parameter
  • use interface{} to specify it
  • val can be any type
func PrintMe(val interface{}) {
   fmt.Println(val)
}

Type Assertions

Concealing Type Differences

  • a lot of point of interfaces is to hide differences (or highlight similarities) between types
    • example: Triangles and Rectangles are treated in the same way in FitInYard() - as long as they satisfy Shape2D interface
    • different types which has some similarities are treated in the same way
  • sometimes you need to treat different types in different ways
  • sometimes we need to differentiate based on the type, to figure out what is the concrete type
  • in FitInYard() it does not matter what is the concrete type; it can be Rectangle or Triangle

Exposing Type Differences

  • example: Graphics program
  • DrawShape() will draw any shape, it can take any Shape2D as an argument
    • func DrawShape(s Shape2D) { ...
  • Underlying API has different drawing functions for each shape so they have to take particular, specific, concrete types as arguments:
    • func DrawRect (r Rectangle) { ...
    • func DrawTriangle (t Triangle) { ...
  • Inside DrawShape() we need to find out what is the concrete type of s so we know which underlying function to call; concrete type must be determined
  • type assertions are used for that

Type Assertions for Disambiguation

  • type assertions can be used to determine and extract the underlying concrete type 

func DrawShape(s Shape2D) bool {
   rect, ok := s.(Rectangle)
   if ok {
      DrawRect(rect)
   }
   tri, ok := s.(Triangle)
   if ok {
      DrawTriangle(tri)
   }
}

  • type assertions extract Rectangle from Shape2D
    • concrete type has to be specified in parentheses
  • If interface contains specified concrete type 
    • rect == specified concrete type
    • ok == true
  • If interface does not contain concrete type 
    • rect == zero value for that type
    • ok == false

Type Switch 


  • Interface can be satisfied by many concrete types; we might be interested only in some of them
  • Another way to do this disambiguation is to use switch
  • switch statement used with a type assertion
    • use keyword type in parentheses: .(type)
    • if s is Triangle then sh will be Triangle
func DrawShape(s Shape2D) bool {
   switch := sh := s.(type) {
   case Rectangle:
      DrawRectangle(sh)
   case Triangle:
      DrawTriangle(sh)
   }
}


 Error Handling

  • common use of Error interface in Go

Error Interface

  • there are lot of Go functions built in packages where return two values: 
    • value
    • error (error interface objects which indicates an error)

type error interface {
   Error() string // prints the error message
}

  • correct / successful operation: error == nil
  • incorrect / failed operation: error != nil
  • if Go function returns an error (usually as a second value) you should check that error and handle it! (BK: compiler will complain if returned error is not checked explicitly in the code)
f, err := os.Open("harris/test.txt")
if err != nil {
   fmt.Println(err)
   return
}

  • check whether error is nil
  • if it's not nil, handle it