Learning Rust while solving Advent of Code puzzles (a post mortem)
Did you try to learn the new, shiny, language called Rust?
I wanted to learn Rust for some time, but wasn't motivated enough. Finally, there was an opportunity: Every year in December, there is a coding challenge called Advent of Code. So I decided I will use edition 2018 as a motivator to learn new programming language.
Disclaimer 1: Unfortunately I wasn't able to finish all tasks on time (till 25 December 2018). I will not get into the reasons why. This post is about things I learned about the Rust language, not the solutions for given puzzles.
If you're interested about the solutions, I posted some of them here.
Disclaimer 2: I used Rust compiler version 1.30.1. So some things I wrote below may not apply to the subsequent versions.
So I started reading "The Book" and coding. In this blog post I will mention various things I unexpectedly ran into during coding.
Moving semantics
This is the first thing you notice when you will start coding in Rust. In most languages, when you pass some parameter to a function then it is copied. Even when the parameter is a reference to some object located elsewhere, and the actual object does not get copied, semantically speaking the reference is "copied" (some people call it "Call by sharing").
However, in Rust the parameters are moved by default. This gives more control for the user when the data should be copied, which has performance and memory usage implications.
you can achieve similar result in C++ by using std::move
but the difference
is that in Rust you cannot reuse the moved variable anymore (it will cause
a compilation error), which will spare you potentially a lot of trouble.
If you want your type to have the copy semantics by default, there is
a way to do it - it needs to implement special Copy
trait (We will talk
about traits later but for now you may think about it as more exquisite
version of Java interface).
The caveat this that for you custom types you don't implement it directly,
but rather implement Clone
trait, so the clone
method from it used
implictly when you pass the value to function or assign it to another variable.
Then you use special derive
directive with Copy
.
In our example we also used the derive
to automatically provide
implementation for Clone
:
#[derive(Clone, Copy)]
struct Point {
x: i32,
y: i32,
}
Then:
let p = Point {x: 0, y: 0};
let p_copy = p;
// `p` can be still used because it was copied, not moved.
Some built-in types, which you could consider as "primitive" are already
implementing the Copy
trait. Examples would be signed integers (
i16
, i32
), unsigned integers (usize
, u8
, u32
), floats (
f32
, f64
) and others (char
, bool
).
As you may imagine, you shouldn't mark every struct
you make with the
Copy
trait - only the ones where the cost of copying whole value
is comparable to using the references. It's usually better to have only the
Clone
trait implemented/derived and use the clone()
method explicitly
when needed.
So now, let's talk about the references.
References
Because of the moving semantics, in a lot of cases you end up
passing parameters as references.
In Rust, you need to explicitly state that by using the &
operator:
fn do_stuff(numbers: &Vec<i32>) {
...
}
fn main() {
let numbers = vec![0, 1, 1, 2, 3];
do_stuff(&numbers); // referencing here
}
The vec!
is a macro to easily define a vector. We will mention macros later.
The reference gives you read-only access to the object. You can also use mutable references but more about that later.
What is important to know that you do need to know that the .
attribute access
operator will work the same whether you're using object as value (for instance
vector of integers which has type Vec<i32>
) or reference (type &Vec<i32>
appropriately).
Sometimes, you will need to dereference it using the *
operator.
In some cases, dereference will not be needed because operation (like addition)
is working for both cases:
fn sum1(numbers: &Vec<i32>) -> i32 {
let mut acc = 0;
for el in numbers {
// The `el` will be of type `&i32` because we are
// passing `numbers` as a reference.
acc += *el;
}
return acc;
}
fn sum2(numbers: &Vec<i32>) -> i32 {
let mut acc = 0;
for el in numbers {
// The `el` will be of type `&i32`
acc += el;
}
return acc;
}
// both sum1 and sum2 will compile without errors
But if you want to perform other operation (like inserting value into a set which expects you to move the value), you will need in most cases to dereference it:
use std::collections::HashSet;
fn count_unique(numbers: &Vec<i32>) -> usize {
let mut unique_numbers: HashSet<i32> = HashSet::new();
for el in numbers {
unique_numbers.insert(*el);
}
return unique_numbers.len();
}
If you remove dereference from *el
you compilation will fail and compiler
will be complaining about "mismatched types".
As you may imagine, you can have multi-level references (e.g &&Vec<i32>
is
legitimate type).
So the references behave somewhat like the pointers in C/C++,
except you don't have the dangerous pointer arithmetic and the arrow access
operator (->
).
The other confusing aspect of references is that they may appear unexpectedly in inferred types.
For instance if you remove the dereferencing operator from *el
from
the function above you could make count_unique
working by removing also
the type of unique_numbers
:
use std::collections::HashSet;
fn count_unique(numbers: &Vec<i32>) -> usize {
let mut unique_numbers = HashSet::new();
for el in numbers {
unique_numbers.insert(el);
}
return unique_numbers.len();
}
The reason is that Rust will infer the type of unique_numbers
to be HashSet<&i32>
instead of HashSet<i32>
.
Also the references may unexpectedly appear in the return type of some functions.
One example could be the iter()
function. The code below will not work:
let numbers = vec![1,2,3,4,5,6,7,8,9,10];
let even_numbers: Vec<i32> = numbers.iter()
.filter(|n| n % 2 == 0).collect();
The reason for that is that n
will be a reference and will not play well
with %
operator. Also, the resulting collection should be of elements
of type i32
, not &i32
.
You can fix the problem by using cloned()
method which would effectively
clone all elements of the iterator (the type of element should implement
clone()
method from the Clone
trait).
let even_numbers: Vec<i32> = numbers.iter().cloned()
.filter(|n| n % 2 == 0).collect();
Another way to fix the problem would be to use the into_iter()
method, which
will move the iterated collection so it can't be reused.
let even_numbers: Vec<i32> = numbers.into_iter()
.filter(|n| n % 2 == 0).collect();
Return types are important
As you noticed, there is type inferencing in Rust. But in contrary to languages like Scala, you cannot omit the type of the value returned by function.
For instance, ommitting the return type of function:
fn count_unique(numbers: &Vec<i32>) {
...
}
will make it equivalent to function
fn count_unique(numbers: &Vec<i32>) -> () {
...
}
where ()
is the unit type.
Basically function with unit return type is equivalent to a function without
return value (like C/C++ function with void
return type).
I was bitten a lot of times because I forgot to specify return type of function
(this resulted in error message like this: expected (), found i32
).
Semicolons are important
Rust is very strict about semicolons. If you forget to add a semicolon
you will end up with following error message:
expected one of
.,
;,
?, or an operator, found <something>
.
Semicolons are important:
for instance, if you not use semicolon on the last line (or in other
words: last expression) of your function, then the value of the expression
will be considered the return value of the function (even without return
keyword used):
fn add_numbers(a: i32, b: i32) -> i32 {
a + b
}
In contrast, when you mistakenly add an excessive semicolon:
fn add_numbers(a: i32, b: i32) -> i32 {
a + b;
}
You will end up with confusing error message expected i32, found ()
.
No functions overloading
Let's say that you want add_numbers
to support two types.
If you come from C++/Java background, you would try something like this:
fn add_numbers(a: i32, b: i32) -> i32 {
a + b
}
fn add_numbers(a: f64, b: f64) -> f64 {
a + b
}
If you try to compile it, you will get error message:
`add_numbers` must be defined only once in the value namespace of this module
Which basically means that you can't overload a function with a different type. What you can do is you can obviously change the names of the functions:
fn add_int_numbers(a: i32, b: i32) -> i32 {
a + b
}
fn add_float_numbers(a: f64, b: f64) -> f64 {
a + b
}
Or make the function a generic:
use std::ops;
fn add_numbers<T: ops::Add>(a: T, b: T) -> T::Output {
a + b
}
Using T::Output
is a quirk of std::ops::Add
trait - in most cases
T::Output
will be the same type as T
.
No default parameter values
What is even more surprising is that in Rust there are no default parameter values, so something like this will not work:
fn sum(numbers: &Vec<i32>, initial_value: i32 = 0) -> i32 {
let mut acc = initial_value;
for el in numbers {
acc += *el;
}
return acc;
}
Again, the way to go around this is use different function name for the
2-parameter function (in this example it could be sum_with_initial_value
)
and call this function from the 1-parameter function with the "default"
parameter.
In case of creating an object you probably want to use builder pattern.
Now, the Borrow checker
A lot of people complain that this is the first problem which they encounter when using Rust for the first time. In my case I encountered the problems later.
I think the most problems happen when you need to use mutable reference
(of type &mut T
). the reason of that is that mutable reference is exclusive, so
you can't get any other reference to the object, including immutable references
(of type &T
).
One example is the following. Let's say that you have a graph defined as adjacency lists, with graph node struct like this:
#[derive(Debug)]
struct GraphNode {
id: char,
edges_to: BTreeSet<char>,
num_edges_from: usize,
}
And the graph defined like this (dependencies
is of type Vec<(char, char)>
):
let mut graph: HashMap<char, GraphNode> = HashMap::new();
for (a, b) in &dependencies {
graph.entry(*a).or_insert(create_graph_node(*a));
graph.entry(*b).or_insert(create_graph_node(*b));
}
If you tried something like this, the borrow checker would start to complain:
for (a, b) in &dependencies {
let a_node: &mut GraphNode = graph.get_mut(a).unwrap();
a_node.edges_to.insert(*b);
let b_node: &mut GraphNode = graph.get_mut(b).unwrap();
b_node.num_edges_from += 1;
}
To avoid the problems, you need to put some instructios in separate blocks:
for (a, b) in &dependencies {
{
let a_node: &mut GraphNode = graph.get_mut(a).unwrap();
a_node.edges_to.insert(*b);
}
{
let b_node: &mut GraphNode = graph.get_mut(b).unwrap();
b_node.num_edges_from += 1;
}
}
Let's look at another example. Lets say that we want to simulate groups of
microbes attacking each other. We will skip all the details and just assume
that the attack on group g1
by group g2
is simulated by calling
g1.deal_damage_by(&g2)
. Below is the skeleton code:
type GroupId = usize;
#[derive(Eq, PartialEq, Clone, Debug)]
struct Group {
...
}
impl Group {
fn deal_damage_by(&mut self, group: &Self) {
...
}
}
Now let's say that we store the groups in a map under groups_map
variable
(groups_map
is of type HashMap<GroupId, Group>
).
We also calculated the attack sequence saing which group id attacks another
group id (attack_sequence
is of type Vec<(GroupId, GroupId)>
).
Now let's say that we perform the actual attack:
for (group_id, selected_group_id) in attack_sequence {
let group: &mut Group = groups_map.get_mut(&selected_group_id).unwrap();
group.deal_damage_by(&groups_map.get(&group_id).unwrap());
}
Ooops. This time we will get following error from the compiler:
cannot borrow `groups_map` as immutable because it is
also borrowed as mutable
To fix the problem, we need to clone the group to get rid of the immutable borrow, by cloning the group:
for (group_id, selected_group_id) in attack_sequence {
let damaging_group: Group = groups_map.get(&group_id).unwrap().clone();
let group: &mut Group = groups_map.get_mut(&selected_group_id).unwrap();
group.deal_damage_by(&damaging_group);
}
Now, let's talk about error handling in Rust.
No exceptions
You may ask yourself, how you can handle errors. In most modern languages you use the exceptions mechanism to do that.
If you ever wrote code without exceptions (for instance, in pure C), you're probably aware that is quite hard to to because:
- It is difficult to remember to handle each function returring error code as return value. Therefore it is possible to omit handling an error and end up in a terrible situation
- Even if one is careful enough to handle all errors, the code will end up as a clutterly pile of if statements handling the errors.
Rust does not use exceptions. Instead it uses special enum wrapper type
Result<T, E>
where T
is the expected type, and E
is error type.
How Rust handles the two problems mentioned above?
First of all, it does not allow that any expression separated by the semicolon
will be of type Result
; in other words, you need to transform the result
by for instance using methods like .unwrap()
, .expect()
or performing
a pattern match.
OK, but how do you avoid writing annoying checks for each function call
returning a Result
? There is a special macro ?
which as a expression suffix
allows you to simplify the code of handling the potential error.
Let's say that we want to able to parse a string into a Point
structure;
let p: Point = "1,2".parse().expect("not a valid point");
like one below:
#[derive(Clone)]
struct Point {
x: i32,
y: i32,
}
To do that, instead of directly implementing parse
method, we need
to implement std::str::FromStr
trait for our Point
structure:
use std::str::FromStr;
#[derive(Clone)]
enum ParsePointError{
InvalidFormat(String),
InvalidNumber(String),
}
impl FromStr for Point {
type Err = ParsePointError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let segments: Vec<_> = s.split(',').collect();
let coords: Vec<Result<i32, _>> = segments
.iter()
.map(|seg| Self::parse_number(seg)).collect();
if coords.len() != 2 {
return Err(ParsePointError::InvalidFormat(
s.to_string()))
}
return Ok(
Point {
// using macro '?':
x: coords[0].clone()?,
y: coords[1].clone()?,
}
);
}
}
If coords[i]
is an Err(...)
then the '?' macro will cause the error to
propagate so from_str
will return this error.
You could notice that we used custom parse_number
method instead of built-in
parse
method. The reason for that is that parse
uses its custom
std::num::ParseIntError
as error type, which is incompatible with
our ParsePointError
, so we need to convert it manually:
impl Point {
fn parse_number<T: FromStr>(s: &str) -> Result<T, ParsePointError> {
match s.parse() {
Ok(m) => Ok(m),
Err(_) => Err(ParsePointError::InvalidNumber(
s.to_string())),
}
}
}
Another way to solve this problem could be using
map_err
method from std::result::Result
to map the error ParseIntError
into ParsePointError
.
Two versions of string type
This is somewhat confusing, because the string literals ("foo"
) are
represented by the str
type, usually in the borrowed form (of type &str
).
Yet you usually want to use the String
type, which is a growable version.
String
is also the type you want to use as a building block of your structs
or enums.
Because of the type inference I sometimes ended being confused whether
the variable is of type &str
or String
.
Also in case of C++ you can cleary see the difference between
char[n]
/char*
and std::string
. In case of Rust it's not clear
what str
actually is, at least not at first glance.
Traits implementation
I mentioned earlier about traits. We also implemented std::str::FromStr
trait for our Point
struct.
Another example could be implementing addition using the +
operator.
This can be done by implementing std::ops::Add
trait:
use std::ops;
...
impl ops::Add for Point {
type Output = Point;
fn add(self, other: Self) -> Self {
Point {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
You can also define complex types from built-in types with type
keyword:
type TuplePoint = (i32, i32);
If you try to to implement std::ops::Add
trait for it:
impl ops::Add for TuplePoint {
type Output = TuplePoint;
fn add(self, other: Self) -> Self {
let (sx, sy) = self;
let (ox, oy) = other;
(sx + ox, sy + oy);
}
}
The compiler will complain with following error:
only traits defined in the current crate can be implemented for arbitrary types
.
However, you can define your own trait and implement it for TuplePoint
:
trait PointOps<T> {
fn add(&self, other: &T) -> T;
}
impl PointOps<TuplePoint> for TuplePoint {
fn add(&self, other: &Self) -> Self {
let (sx, sy) = self;
let (ox, oy) = other;
(*sx + *ox, *sy + *oy)
}
}
and then use it like this:
let p1: TuplePoint = (1, 2);
let p2: TuplePoint = (5, 3);
let p3 = p1.add(&p2);
Weird import requirements
Let's say that you want to read the standard input line by line. The easiest
way to do that is to use .lock().lines()
:
let stdin = io::stdin();
for line in stdin.lock().lines() {
let l = line.unwrap();
...
}
In this case it is obvious that you need to add use std::io;
in your
import section so io
can be accessed in your code. What is less obvious
that you need to add use std::io::BufRead;
so you can use .lock()
method.
Lifetime is confusing
At some point I wanted to pass an iterator as an function parameter.
I tried to pass something of type Iterator<Item=&'a u32>
, where 'a
is
the lifetime of the iterator.
It turned to be hard, so I ended up making a struct which contained the vector and current position, which 'emulated' the iteration behavior.
Let's look at another example. Let's say that we want to read standard input line by line, but this time we want also to trim any whitespace characters:
let stdin = io::stdin();
for line in stdin.lock().lines() {
let l = line.unwrap().trim();
...
}
Suprisingly, this will not work, because compiler will complain with
following error:
temporary value does not live long enough
.
The fix is simple in this case, we need additional temporary variable:
let stdin = io::stdin();
for line in stdin.lock().lines() {
let tmp = line.unwrap();
let l = l.trim();
...
}
The lifetime topic is complicated enough that explaining it could take separate blog post.
Various versions of callable types
I wanted to write a function which could find an element in a vector
for which given function f
would return the biggest (maximal) value.
I wanted to be able to pass a closure (inline function) there.
fn argmax<T: Clone, S: cmp::Ord>(
vec: &Vec<T>, f: Fn(&T) -> S) -> Option<T>
It turned out that there are several function types:
Fn(&T) -> S
- immutable functionFnMut(mut &T) -> S
- mutable functionFnOnce(&T) -> S
- can be called only once
The closure is actually not of type Fn
, but it rather seems to implement
one or more of these function traits. Therefore the type of parameter f
in
function declaration becomes
impl Fn(&T) -> S
.
So the final implementation looks like this:
use std::cmp;
use std::cmp::Ordering;
fn argmax<T: Clone, S: cmp::Ord>(
vec: &Vec<T>, f: impl Fn(&T) -> S) -> Option<T> {
let mut best_el: Option<&T> = None;
for el in vec {
let val = f(el);
match best_el.map(|el| f(el)) {
None => {
best_el = Some(el);
},
Some(bv) => {
match bv.cmp(&val) {
Ordering::Less => {
best_el = Some(el);
},
_ => {},
}
},
}
}
return best_el.cloned();
}
My guess is that the function can be improved by handling any
iterator (we wouldn't need the dependency on Clone
trait being implemented),
but that may require improving my knowledge about handling lifetimes,
which I mentioned earlier.
Batteries not included
I was surprised that there was no built-in library for regular expressions.
you need to install separate cargo crate
regex
(crate is a package in Rust ecosystem)
if you want this kind of support.
This was annoying for me because I wanted to have the solution of each AoC task
as a simple .rs
source file.
This is probably because Rust positions itself as a system language. So you should expect similar things you may find in standard libraries of languages like C++ (STL).
Macros
A lot of times I forgot to add !
after println
. The reason for the !
is
that println!
is a macro, not a function.
There are other useful macros:
vec!
- allows to easily define a vector in a inline waypanic!
- prints the message and exits for a given format string and parameters. It is very useful for prototyping, but not so much for production-grade code.format!
- generates a String for a given format string and parameters (works similar toprintln!
, except it does not print the formatted message but returns it as a string)
The topic of Rust macros is also something which could fill a separate blog post.
Final thoughts
If you ask me, whether Rust is a good programming language for solving Advent of Code puzzles I would say it performed quite well. However, if your main focus is to solve all tasks on time, I would stick with a language you are fluent in (in my case it would be Python).
To summarize, Rust seems to be a very promising language. It has some quirks (some of them mentioned above). I imagine a lot of them has to do with the principle of making the language secure and less error-prone.
I'm looking to build more complex stuff using Rust. I'm sure that I will share my experiences in further blog posts!