How it works...
You can create a new thread by calling thread::spawn, which will then begin executing the provided lambda. This will return a JoinHandle, which you can use to, well, join the thread. Joining a thread means waiting for the thread to finish its work. If you don't join a thread, you have no guarantee of it actually ever finishing. This might be valid though when setting up threads to do tasks that never complete, such as listening for incoming connections.
Keep in mind that you cannot predetermine the order in which your threads will complete any work. In our example, it is impossible to foretell whether Hello from a new thread! or Hello from the main thread! is going to be printed first, although most of the time it will probably be the main thread, as the operating system needs to put some effort into spawning a new thread. This is the reason why small algorithms can be faster when not executed in parallel. Sometimes, the overhead of letting the OS spawn and manage new threads is just not worth it.
As demonstrated by line [49], joining a thread will return a Result that contains the value your lambda returned.
Threads can also be given names. Depending on your OS, in case of a crash, the name of the responsible thread will be displayed. In line [37], we call our new summation threads calculation. If one of them were to crash, we would be able to quickly identify the issue. Try it out for yourself, insert a call to panic!(); at the beginning of sum_bucket in order to intentionally crash the program and run it. If your OS supports named threads, you will now be told that your thread calculation panicked with an explicit panic.
parallel_sum is a function that takes a slice of integers and adds them together in parallel on four threads. If you have limited experience in working with parallel algorithms, this function will be hard to grasp at first. I invite you to copy it by hand into your text editor and play around with it in order to get a grasp on it. If you still feel a bit lost, don't worry, we will revisit parallelism again later.
Adapting algorithms to run in parallel normally comes at the risk of data races. A data race is defined as the behavior in a system where the output is dependent on the random timing of external events. In our case, having a data race would mean that multiple threads try to access and modify a resource at the same time. Normally, programmers have to analyze their usage of resources and use external tools in order to catch all of the data races. In contrast, Rust's compiler is smart enough to catch data races at compile time and stops if it finds one. This is the reason why we had to call .to_vec() in line [35]:
let bucket = range[count..count + bucket_size].to_vec();
We will cover vectors in a later recipe (the Using a vector section in Chapter 2, Working with Collections), so if you're curious about what is happening here, feel free to jump to Chapter 2, Working with Collections and come back again. The essence of it is that we're copying the data into bucket. If we instead passed a reference into sum_bucket in our new thread, we would have a problem, the memory referenced by range is only guaranteed to live inside of parallel_sum, but the threads we spawn are allowed to outlive their parent threads. This would mean that in theory, if we didn't join the threads at the right time, sum_bucket might get unlucky and get called late enough for range to be invalid.
This would then be a data race, as the outcome of our function would depend on the uncontrollable sequence in which our operating system decides to launch the threads.
But don't just take my word for it, try it yourself. Simply replace the aforementioned line with let bucket = &range[count..count + bucket_size]; and try to compile it.