Concurrency and Parallelism

Introduction

The need for concurrency and parallelism today in our applications is a must for responsiveness. Multicore CPU’s and GPU’s which allow parallelism to be executed effectively are commonplace now.

Before talking about parallelism and concurrency we need to define what is a process. A process is an application in action with its own dedicated memory address space. A thread is a light weight process which has its own stack frame but shares the memory with the process that created it.

Threads allows multiple activities to proceed concurrently. Concurrent programming is harder than single-threaded programming, because more things can go wrong, and failures can be hard to reproduce. But you can’t avoid concurrency. It is inherent in much of what we do, and a requirement if you are to obtain good performance from multicore processors. Therefore it is important for modern programmers, in all industries including games, to have a solid understanding of parallel computing hardware, and to be well versed in concurrent programming techniques.

Concurrency vs Parallelism

Confusion arise between the difference of concurrency and parallelism, sometimes used interchangeably, but they are different concepts.

Concurrency is when a computer does many different things seemingly at the same time by what is known as context switching between different threads, This is a feature of a multitasking operating system and allows a single CPU to be shared by multiple processes, where inside every process a single thread will be executed.

Parallelism refers to any situation in which two or more distinct hardware components are operating simultaneously. Devices with multicore CPU’s can accomplish parallelism. Each CPU core runs the instructions of a separate program, allowing each program to make forward progress during the same instant.

The key difference between parallelism and concurrency is speedup. When two distinct paths of execution in a program make forward progress in parallel, the time it takes to do the total work is cut in half; the speed of execution is faster by a factor of two. In contrast, concurrent programs may run thousands of separate paths of execution seemingly in parallel but provide no speedup for the total work, unless some threads are blocked for I/O operation.

Concurrency issues

Threads have their own call stack, but can also share data as shown in the figure.

Functions are executed sequentlly LIFO
Functions are executed sequentlly LIFO
Functions are executed concurrently or parallely on multi core CPU's
Functions are executed concurrently or parallely on multi core CPU’s

Therefore you will face basic problems such as:

  • Race Condition: When thread a updates a shared data which later is changed by thread b and thread a is unaware of.
  • Stale condition: thread a reads a shared data thinking it has the latest info where later thread b updates the same data and not informing thread a.
  • Deadlock: thread a holds resource 1 and waits for thread b to release resource 2, while thread b is waiting for thread a to release resource 1.

Concurrency in Swift

Modern day languages, C# and Erlang, have the actor model for concurrency. Unfortunately Swift relies on the foundation library for concurrency. A future Swift version will hopefully have it soon. To get a head start check out WWDC16 Concurrent Programming With GCD in Swift 3 section 720.

Grand Central Dispatch to the Rescue

Creating and managing threads is no easy or fun task. The burden of creating a scalable solution rests squarely on your shoulders. It is your responsibility to decide how many threads to create and adjust that number dynamically as system conditions change. Fortunately there is GCD.

GCD manages all the headaches of threads for you, it takes care of creating the needed threads and of scheduling your tasks to run on those threads. Things that you normally write in your own application is moved down to the system level by the magic of GCD. Now it not your problem it is the OS’s (macOS, iPadOS, iOS, tvOS, and watchOS) problem, and it will handle it.

Coding

The fun part, actual code! We will start by writing the code in a completely traditional way without any concurrency, the code simply creates a two dimensional array and initialize each element with product of its subscription. The run time is obviously O(n^2).

import Foundation

let row = 30000
let column = 40000
var array  = Array(repeating: Array<Double>(repeating: 0, count: column), count: row)

let start = DispatchTime.now() // <<<<<<<<<< Start time

for i in 0..<row{
    for j in 0..<column{
        array[i][j] = Double(i * j)
    }
}



let end = DispatchTime.now()   // <<<<<<<<<<   end time

let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
let timeInterval = Double(nanoTime) / 1_000_000_000 // Technically could overflow for long running tests

print("Time to execute: \(timeInterval) seconds")

The output of the executed code on my iMac is

Time to execute: 609.508379696 seconds
Program ended with exit code: 0

Obviously the time depends on the machine. During execution keep an eye on the memory and CPU usage, if it is taking too long try to reduce the number of rows and columns.

Next we will optimize our code by creating a dispatch queue which will execute every row as a task and wait for it to finish before starting the next row.

import Foundation

let row = 30000
let column = 40000
var array  = Array(repeating: Array<Double>(repeating: 0, count: column), count: row)

let queue = DispatchQueue(label: "com.lanterntech.rowWorker")

let start = DispatchTime.now() // <<<<<<<<<< Start time

for i in 0..<row{
    queue.async {
        for j in 0..<column{
        array[i][j] = Double(i * j)
        }
    }
}

let end = DispatchTime.now()   // <<<<<<<<<<   end time

let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
let timeInterval = Double(nanoTime) / 1_000_000_000 // Technically could overflow for long running tests

print("Time to execute: \(timeInterval) seconds")

The output executed on the same machine:

Time to execute: 1.842261556 seconds
Program ended with exit code: 0

A dramatic improvement! Too good to be true? Unfortunately yes!

The code is correct and the execution is fine, the problem is with the timing. The timing is incorrect because the tasks have not finished. The rows in the loop have been executed on different threads, and the main thread is left to continue and get the end time, before the other threads have finished. We need to wait for all the threads to finish to get the end time.

import Foundation

let row = 30000
let column = 40000
var array  = Array(repeating: Array<Double>(repeating: 0, count: column), count: row)

let queue = DispatchQueue(label: "com.lanterntech.rowWorker")
let group = DispatchGroup()

let start = DispatchTime.now() // <<<<<<<<<< Start time

for i in 0..<row{
    queue.async(group: group) {
        for j in 0..<column{
        array[i][j] = Double(i * j)
        }
        group.leave()
    }
}

group.wait()
let end = DispatchTime.now()   // <<<<<<<<<<   end time

let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
let timeInterval = Double(nanoTime) / 1_000_000_000 // Technically could overflow for long running tests

print("Time to execute: \(timeInterval) seconds")
Time to execute: 301.481945919 seconds
Program ended with exit code: 0

Here we created a group and added all the threads to the group, as they exit the inform the thread manager they have left(finished). The wait member function on the group will block the main thread until all threads have been executed.

Conclusion

By utilizing concurrency and parallelism we gained a tremendous performance boost with minimal effort. In our next blog we will utilize the GPU for further gains.

Til next time, happy coding!

1796
%d bloggers like this: