Andrzej Pragacz

Learning Rust while solving Advent of Code puzzles (a post mortem)

Did you try to learn the new, shiny, language called Rust?

I wanted to learn Rust for some time, but wasn't motivated enough. Finally, there was an opportunity: Every year in December, there is a coding challenge called Advent of Code. So I decided I will use edition 2018 as a motivator to learn new programming language.

Disclaimer 1: Unfortunately I wasn't able to finish all tasks on time (till 25 December 2018). I will not get into the reasons why. This post is about things I learned about the Rust language, not the solutions for given puzzles.

If you're interested about the solutions, I posted some of them here.

Disclaimer 2: I used Rust compiler version 1.30.1. So some things I wrote below may not apply to the subsequent versions.

So I started reading "The Book" and coding. In this blog post I will mention various things I unexpectedly ran into during coding.

Moving semantics

This is the first thing you notice when you will start coding in Rust. In most languages, when you pass some parameter to a function then it is copied. Even when the parameter is a reference to some object located elsewhere, and the actual object does not get copied, semantically speaking the reference is "copied" (some people call it "Call by sharing").

However, in Rust the parameters are moved by default. This gives more control for the user when the data should be copied, which has performance and memory usage implications.

you can achieve similar result in C++ by using std::move but the difference is that in Rust you cannot reuse the moved variable anymore (it will cause a compilation error), which will spare you potentially a lot of trouble.

If you want your type to have the copy semantics by default, there is a way to do it - it needs to implement special Copy trait (We will talk about traits later but for now you may think about it as more exquisite version of Java interface). The caveat this that for you custom types you don't implement it directly, but rather implement Clone trait, so the clone method from it used implictly when you pass the value to function or assign it to another variable. Then you use special derive directive with Copy. In our example we also used the derive to automatically provide implementation for Clone:

#[derive(Clone, Copy)]
struct Point {
    x: i32,
    y: i32,
}

Then:

let p = Point {x: 0, y: 0};
let p_copy = p;
// `p` can be still used because it was copied, not moved.

Some built-in types, which you could consider as "primitive" are already implementing the Copy trait. Examples would be signed integers ( i16, i32), unsigned integers (usize, u8, u32), floats ( f32, f64) and others (char, bool).

As you may imagine, you shouldn't mark every struct you make with the Copy trait - only the ones where the cost of copying whole value is comparable to using the references. It's usually better to have only the Clone trait implemented/derived and use the clone() method explicitly when needed.

So now, let's talk about the references.

References

Because of the moving semantics, in a lot of cases you end up passing parameters as references. In Rust, you need to explicitly state that by using the & operator:

fn do_stuff(numbers: &Vec<i32>)  {
    ...
}

fn main() {
    let numbers = vec![0, 1, 1, 2, 3];
    do_stuff(&numbers); // referencing here
}

The vec! is a macro to easily define a vector. We will mention macros later.

The reference gives you read-only access to the object. You can also use mutable references but more about that later.

What is important to know that you do need to know that the . attribute access operator will work the same whether you're using object as value (for instance vector of integers which has type Vec<i32>) or reference (type &Vec<i32> appropriately).

Sometimes, you will need to dereference it using the * operator. In some cases, dereference will not be needed because operation (like addition) is working for both cases:

fn sum1(numbers: &Vec<i32>) -> i32  {
    let mut acc = 0;
    for el in numbers {
        // The `el` will be of type `&i32` because we are
        // passing `numbers` as a reference.
        acc += *el;
    }
    return acc;
}

fn sum2(numbers: &Vec<i32>) -> i32  {
    let mut acc = 0;
    for el in numbers {
        // The `el` will be of type `&i32`
        acc += el;
    }
    return acc;
}
// both sum1 and sum2 will compile without errors

But if you want to perform other operation (like inserting value into a set which expects you to move the value), you will need in most cases to dereference it:

use std::collections::HashSet;

fn count_unique(numbers: &Vec<i32>) -> usize  {
    let mut unique_numbers: HashSet<i32> = HashSet::new();
    for el in numbers {
        unique_numbers.insert(*el);
    }
    return unique_numbers.len();
}

If you remove dereference from *el you compilation will fail and compiler will be complaining about "mismatched types".

As you may imagine, you can have multi-level references (e.g &&Vec<i32> is legitimate type). So the references behave somewhat like the pointers in C/C++, except you don't have the dangerous pointer arithmetic and the arrow access operator (->).

The other confusing aspect of references is that they may appear unexpectedly in inferred types.

For instance if you remove the dereferencing operator from *el from the function above you could make count_unique working by removing also the type of unique_numbers:

use std::collections::HashSet;

fn count_unique(numbers: &Vec<i32>) -> usize  {
    let mut unique_numbers = HashSet::new();
    for el in numbers {
        unique_numbers.insert(el);
    }
    return unique_numbers.len();
}

The reason is that Rust will infer the type of unique_numbers to be HashSet<&i32> instead of HashSet<i32>.

Also the references may unexpectedly appear in the return type of some functions. One example could be the iter() function. The code below will not work:

let numbers = vec![1,2,3,4,5,6,7,8,9,10];
let even_numbers: Vec<i32> = numbers.iter()
    .filter(|n| n % 2 == 0).collect();

The reason for that is that n will be a reference and will not play well with % operator. Also, the resulting collection should be of elements of type i32, not &i32.

You can fix the problem by using cloned() method which would effectively clone all elements of the iterator (the type of element should implement clone() method from the Clone trait).

let even_numbers: Vec<i32> = numbers.iter().cloned()
    .filter(|n| n % 2 == 0).collect();

Another way to fix the problem would be to use the into_iter() method, which will move the iterated collection so it can't be reused.

let even_numbers: Vec<i32> = numbers.into_iter()
    .filter(|n| n % 2 == 0).collect();

Return types are important

As you noticed, there is type inferencing in Rust. But in contrary to languages like Scala, you cannot omit the type of the value returned by function.

For instance, ommitting the return type of function:

fn count_unique(numbers: &Vec<i32>)  {
    ...
}

will make it equivalent to function

fn count_unique(numbers: &Vec<i32>) -> () {
    ...
}

where () is the unit type. Basically function with unit return type is equivalent to a function without return value (like C/C++ function with void return type).

I was bitten a lot of times because I forgot to specify return type of function (this resulted in error message like this: expected (), found i32).

Semicolons are important

Rust is very strict about semicolons. If you forget to add a semicolon you will end up with following error message: expected one of ., ;, ?, or an operator, found <something>.

Semicolons are important: for instance, if you not use semicolon on the last line (or in other words: last expression) of your function, then the value of the expression will be considered the return value of the function (even without return keyword used):

fn add_numbers(a: i32, b: i32) -> i32 {
    a + b
}

In contrast, when you mistakenly add an excessive semicolon:

fn add_numbers(a: i32, b: i32) -> i32 {
    a + b;
}

You will end up with confusing error message expected i32, found ().

No functions overloading

Let's say that you want add_numbers to support two types. If you come from C++/Java background, you would try something like this:

fn add_numbers(a: i32, b: i32) -> i32 {
    a + b
}

fn add_numbers(a: f64, b: f64) -> f64 {
    a + b
}

If you try to compile it, you will get error message:

`add_numbers` must be defined only once in the value namespace of this module

Which basically means that you can't overload a function with a different type. What you can do is you can obviously change the names of the functions:

fn add_int_numbers(a: i32, b: i32) -> i32 {
    a + b
}

fn add_float_numbers(a: f64, b: f64) -> f64 {
    a + b
}

Or make the function a generic:

use std::ops;

fn add_numbers<T: ops::Add>(a: T, b: T) -> T::Output {
    a + b
}

Using T::Output is a quirk of std::ops::Add trait - in most cases T::Output will be the same type as T.

No default parameter values

What is even more surprising is that in Rust there are no default parameter values, so something like this will not work:

fn sum(numbers: &Vec<i32>, initial_value: i32 = 0) -> i32  {
    let mut acc = initial_value;
    for el in numbers {
        acc += *el;
    }
    return acc;
}

Again, the way to go around this is use different function name for the 2-parameter function (in this example it could be sum_with_initial_value) and call this function from the 1-parameter function with the "default" parameter.

In case of creating an object you probably want to use builder pattern.

Now, the Borrow checker

A lot of people complain that this is the first problem which they encounter when using Rust for the first time. In my case I encountered the problems later.

I think the most problems happen when you need to use mutable reference (of type &mut T). the reason of that is that mutable reference is exclusive, so you can't get any other reference to the object, including immutable references (of type &T).

One example is the following. Let's say that you have a graph defined as adjacency lists, with graph node struct like this:

#[derive(Debug)]
struct GraphNode {
    id: char,
    edges_to: BTreeSet<char>,
    num_edges_from: usize,
}

And the graph defined like this (dependencies is of type Vec<(char, char)>):

let mut graph: HashMap<char, GraphNode> = HashMap::new();
for (a, b) in &dependencies {
    graph.entry(*a).or_insert(create_graph_node(*a));
    graph.entry(*b).or_insert(create_graph_node(*b));
}

If you tried something like this, the borrow checker would start to complain:

for (a, b) in &dependencies {
    let a_node: &mut GraphNode = graph.get_mut(a).unwrap();
    a_node.edges_to.insert(*b);
    let b_node: &mut GraphNode = graph.get_mut(b).unwrap();
    b_node.num_edges_from += 1;
}

To avoid the problems, you need to put some instructios in separate blocks:

for (a, b) in &dependencies {
    {
        let a_node: &mut GraphNode = graph.get_mut(a).unwrap();
        a_node.edges_to.insert(*b);
    }
    {
        let b_node: &mut GraphNode = graph.get_mut(b).unwrap();
        b_node.num_edges_from += 1;
    }
}

Let's look at another example. Lets say that we want to simulate groups of microbes attacking each other. We will skip all the details and just assume that the attack on group g1 by group g2 is simulated by calling g1.deal_damage_by(&g2). Below is the skeleton code:

type GroupId = usize;

#[derive(Eq, PartialEq, Clone, Debug)]
struct Group {
    ...
}
impl Group {
    fn deal_damage_by(&mut self, group: &Self) {
        ...
    }
}

Now let's say that we store the groups in a map under groups_map variable (groups_map is of type HashMap<GroupId, Group>).

We also calculated the attack sequence saing which group id attacks another group id (attack_sequence is of type Vec<(GroupId, GroupId)>).

Now let's say that we perform the actual attack:

for (group_id, selected_group_id) in attack_sequence {
    let group: &mut Group = groups_map.get_mut(&selected_group_id).unwrap();
    group.deal_damage_by(&groups_map.get(&group_id).unwrap());
}

Ooops. This time we will get following error from the compiler:

cannot borrow `groups_map` as immutable because it is
also borrowed as mutable

To fix the problem, we need to clone the group to get rid of the immutable borrow, by cloning the group:

for (group_id, selected_group_id) in attack_sequence {
    let damaging_group: Group = groups_map.get(&group_id).unwrap().clone();
    let group: &mut Group = groups_map.get_mut(&selected_group_id).unwrap();
    group.deal_damage_by(&damaging_group);
}

Now, let's talk about error handling in Rust.

No exceptions

You may ask yourself, how you can handle errors. In most modern languages you use the exceptions mechanism to do that.

If you ever wrote code without exceptions (for instance, in pure C), you're probably aware that is quite hard to to because:

  1. It is difficult to remember to handle each function returring error code as return value. Therefore it is possible to omit handling an error and end up in a terrible situation
  2. Even if one is careful enough to handle all errors, the code will end up as a clutterly pile of if statements handling the errors.

Rust does not use exceptions. Instead it uses special enum wrapper type Result<T, E> where T is the expected type, and E is error type.

How Rust handles the two problems mentioned above?

First of all, it does not allow that any expression separated by the semicolon will be of type Result; in other words, you need to transform the result by for instance using methods like .unwrap(), .expect() or performing a pattern match.

OK, but how do you avoid writing annoying checks for each function call returning a Result? There is a special macro ? which as a expression suffix allows you to simplify the code of handling the potential error.

Let's say that we want to able to parse a string into a Point structure;

let p: Point = "1,2".parse().expect("not a valid point");

like one below:

#[derive(Clone)]
struct Point {
    x: i32,
    y: i32,
}

To do that, instead of directly implementing parse method, we need to implement std::str::FromStr trait for our Point structure:

use std::str::FromStr;

#[derive(Clone)]
enum ParsePointError{
    InvalidFormat(String),
    InvalidNumber(String),
}

impl FromStr for Point {
    type Err = ParsePointError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        let segments: Vec<_> = s.split(',').collect();
        let coords: Vec<Result<i32, _>> = segments
            .iter()
            .map(|seg| Self::parse_number(seg)).collect();
        if coords.len() != 2 {
            return Err(ParsePointError::InvalidFormat(
                s.to_string()))
        }
        return Ok(
            Point {
                // using macro '?':
                x: coords[0].clone()?,
                y: coords[1].clone()?,
            }
        );
    }
}

If coords[i] is an Err(...) then the '?' macro will cause the error to propagate so from_str will return this error.

You could notice that we used custom parse_number method instead of built-in parse method. The reason for that is that parse uses its custom std::num::ParseIntError as error type, which is incompatible with our ParsePointError, so we need to convert it manually:

impl Point {
    fn parse_number<T: FromStr>(s: &str) -> Result<T, ParsePointError> {
        match s.parse() {
            Ok(m) => Ok(m),
            Err(_) => Err(ParsePointError::InvalidNumber(
                s.to_string())),
        }
    }
}

Another way to solve this problem could be using map_err method from std::result::Result to map the error ParseIntError into ParsePointError.

Two versions of string type

This is somewhat confusing, because the string literals ("foo") are represented by the str type, usually in the borrowed form (of type &str). Yet you usually want to use the String type, which is a growable version. String is also the type you want to use as a building block of your structs or enums.

Because of the type inference I sometimes ended being confused whether the variable is of type &str or String.

Also in case of C++ you can cleary see the difference between char[n]/char* and std::string. In case of Rust it's not clear what str actually is, at least not at first glance.

Traits implementation

I mentioned earlier about traits. We also implemented std::str::FromStr trait for our Point struct.

Another example could be implementing addition using the + operator. This can be done by implementing std::ops::Add trait:

use std::ops;

...

impl ops::Add for Point {
    type Output = Point;

    fn add(self, other: Self) -> Self {
        Point {
            x: self.x + other.x,
            y: self.y + other.y,
        }
    }
}

You can also define complex types from built-in types with type keyword:

type TuplePoint = (i32, i32);

If you try to to implement std::ops::Add trait for it:

impl ops::Add for TuplePoint {
    type Output = TuplePoint;

    fn add(self, other: Self) -> Self {
        let (sx, sy) = self;
        let (ox, oy) = other;
        (sx + ox, sy + oy);
    }
}

The compiler will complain with following error: only traits defined in the current crate can be implemented for arbitrary types.

However, you can define your own trait and implement it for TuplePoint:

trait PointOps<T> {
    fn add(&self, other: &T) -> T;
}
impl PointOps<TuplePoint> for TuplePoint {

    fn add(&self, other: &Self) -> Self {
        let (sx, sy) = self;
        let (ox, oy) = other;
        (*sx + *ox, *sy + *oy)
    }
}

and then use it like this:

let p1: TuplePoint = (1, 2);
let p2: TuplePoint = (5, 3);
let p3 = p1.add(&p2);

Weird import requirements

Let's say that you want to read the standard input line by line. The easiest way to do that is to use .lock().lines():

let stdin = io::stdin();
for line in stdin.lock().lines() {
    let l = line.unwrap();
    ...
}

In this case it is obvious that you need to add use std::io; in your import section so io can be accessed in your code. What is less obvious that you need to add use std::io::BufRead; so you can use .lock() method.

Lifetime is confusing

At some point I wanted to pass an iterator as an function parameter. I tried to pass something of type Iterator<Item=&'a u32>, where 'a is the lifetime of the iterator.

It turned to be hard, so I ended up making a struct which contained the vector and current position, which 'emulated' the iteration behavior.

Let's look at another example. Let's say that we want to read standard input line by line, but this time we want also to trim any whitespace characters:

let stdin = io::stdin();
for line in stdin.lock().lines() {
    let l = line.unwrap().trim();
    ...
}

Suprisingly, this will not work, because compiler will complain with following error: temporary value does not live long enough.

The fix is simple in this case, we need additional temporary variable:

let stdin = io::stdin();
for line in stdin.lock().lines() {
    let tmp = line.unwrap();
    let l = l.trim();
    ...
}

The lifetime topic is complicated enough that explaining it could take separate blog post.

Various versions of callable types

I wanted to write a function which could find an element in a vector for which given function f would return the biggest (maximal) value. I wanted to be able to pass a closure (inline function) there.

fn argmax<T: Clone, S: cmp::Ord>(
        vec: &Vec<T>, f: Fn(&T) -> S) -> Option<T>

It turned out that there are several function types:

  • Fn(&T) -> S - immutable function
  • FnMut(mut &T) -> S - mutable function
  • FnOnce(&T) -> S - can be called only once

The closure is actually not of type Fn, but it rather seems to implement one or more of these function traits. Therefore the type of parameter f in function declaration becomes impl Fn(&T) -> S.

So the final implementation looks like this:

use std::cmp;

use std::cmp::Ordering;

fn argmax<T: Clone, S: cmp::Ord>(
        vec: &Vec<T>, f: impl Fn(&T) -> S) -> Option<T> {
    let mut best_el: Option<&T> = None;

    for el in vec {
        let val = f(el);
        match best_el.map(|el| f(el)) {
            None => {
                best_el = Some(el);
            },
            Some(bv) => {
                match bv.cmp(&val) {
                    Ordering::Less => {
                        best_el = Some(el);
                    },
                    _ => {},
                }
            },
        }
    }

    return best_el.cloned();
}

My guess is that the function can be improved by handling any iterator (we wouldn't need the dependency on Clone trait being implemented), but that may require improving my knowledge about handling lifetimes, which I mentioned earlier.

Batteries not included

I was surprised that there was no built-in library for regular expressions. you need to install separate cargo crate regex (crate is a package in Rust ecosystem) if you want this kind of support. This was annoying for me because I wanted to have the solution of each AoC task as a simple .rs source file.

This is probably because Rust positions itself as a system language. So you should expect similar things you may find in standard libraries of languages like C++ (STL).

Macros

A lot of times I forgot to add ! after println. The reason for the ! is that println! is a macro, not a function.

There are other useful macros:

  • vec! - allows to easily define a vector in a inline way
  • panic! - prints the message and exits for a given format string and parameters. It is very useful for prototyping, but not so much for production-grade code.
  • format! - generates a String for a given format string and parameters (works similar to println!, except it does not print the formatted message but returns it as a string)

The topic of Rust macros is also something which could fill a separate blog post.

Final thoughts

If you ask me, whether Rust is a good programming language for solving Advent of Code puzzles I would say it performed quite well. However, if your main focus is to solve all tasks on time, I would stick with a language you are fluent in (in my case it would be Python).

To summarize, Rust seems to be a very promising language. It has some quirks (some of them mentioned above). I imagine a lot of them has to do with the principle of making the language secure and less error-prone.

I'm looking to build more complex stuff using Rust. I'm sure that I will share my experiences in further blog posts!