Doubly linked list in Rust
Doubly linked list is a linear data structure in which each node holds a reference to the node before and the node after. This is a particularly challenging data structure to implement in Rust (for beginners like me) because the borrow-checker prohibits the kind of trivial solutions that one can implement in higher-level language (like Python) or memory-unsafe language (like C):
class Node:
def __ini__(self, val):
self.val = val
self.prev = None
self.next = None
class Queue:
def push(self, new):
if len(self) == 0:
# Rust won't allow both self.head and self.tail to own new at the
# same time
self.head = self.tail = new
self.size = 1
else:
# After self.tail.next owns "new", you can no longer use new.prev
self.tail.next = new
new.prev = self.tail
self.tail = new
As a work around, this implementation takes advantage of the two smart pointers provided by Rust’s stadard library:
Rc
is used so that one node can be referenced by the node before it and the node after itRefCell
is used so that we can mutate the references that a node holds
Before diving into the implementation, let’s first define what we expect out of the doubly linked list (or rather, a queue that we will implement using a doubly linked list):
fn test_queue_push() {
let mut queue: Queue<i32> = Queue::new(); // []
queue.push(0); // [0]
assert_eq!(queue.peek(), 0);
assert_eq!(queue.len(), 1);
queue.push(1); // [0 <- 1]
assert_eq!(queue.peek(), 0);
assert_eq!(queue.len(), 2);
queue.push(2); // [0 <- 1 <- 2]
assert_eq!(queue.peek(), 0);
assert_eq!(queue.len(), 3);
assert_eq!(queue.pop(), 0); // [1 <- 2]
assert_eq!(queue.len(), 2);
assert_eq!(queue.peek(), 1);
assert_eq!(queue.pop(), 1); // [2]
assert_eq!(queue.len(), 1);
assert_eq!(queue.pop(), 2); // []
assert_eq!(queue.len(), 0);
}
The Node
type
Similar to how a singly linked list is defined, the Node
type will be defined as an enum with a Nil
indicating an empty node. With this implementation, we will use Nil
as a sentinel node so that the head node and the tail node has something to point to:
use std::rc::{Rc, Weak};
use core::cell::RefCell;
enum Node<T: Copy> {
Nil,
Cons{
val: T,
prev: Weak<RefCell<Node<T>>>,
next: Rc<RefCell<Node<T>>>,
},
}
There are two things to note of this definition. First, I choose to declare Node<T>
as an enum instead of a struct out of personal preferences. It is entirely possible to declare the node type as the following, and I could probably do it at another time as an exercise:
struct Node<T: Copy> {
val: T,
// empty references presented using "None"
prev: Optional<Weak<RefCell<Node<T>>>>,
next: Optional<Rc<RefCell<Node<T>>>>,
}
Second, prev
is a Weak
references because in doubly linked list, two neighboring nodes necessarily hold reference to each other, at which time the circular references will fail to be automatically dropped, resulting in a memory leak (which the compiler will not be able to detect).
The Queue
type
Because unlike the singly linked list, a doubly linked list needs to keep track of both the head
and the tail
node, it is impractical to leave the implementation on the node alone. As a result, we need to define a separate Queue
type
pub struct Queue<T: Copy> {
head: Rc<RefCell<Node<T>>>,
tail: Rc<RefCell<Node<T>>>,
_len: usize, // hidden, can only be read using the len() method
_sentinel: Rc<RefCell<Node<T>>>, // the one and only sentinel node
}
conveniently with the Queue
type we can also store additional information, such as _len
to achieve constant time len()
implementation (with nodes alone we will have to traverse through the entire list to count the number of elements).
A second design choice was to define a unique sentinel
node for each queue. While it is possible to maintain a unique sentinel node using references of head
and tail
, it is a tedious chore that can be very hard to debug. So instead, I chose to keep a reference to the unique sentinel node in each queue.
Pointers can be hard to use. They are hard to use and easy to mess up in C and C++, and they are still hard to use in Rust, although the borrow-checker (whether at compile time or runtime) will make it harder to write bug that blow up at a later time in production (instead of at development). Here are a few things that I found challenging during my implementation
Mutating references on a node
Mutating references on the node type is actually kind of difficult. Say you have new: Rc<RefCell<Node<T>>>
and tail: Rc<RefCell<Node<T>>>
, and you want to set tail
’s next
to point to the RefCell
that new
points to, and you want the new
’s prev to point to the RefCell
that tail
points to, then you will have to the following:
- match a mutable reference on
&mut tail
and disregard theNode::Nil
branch because it is unreachable - for the
Node::Cons{ val, prev, next }
branch, takenext: &mut Rc<RefCell<Node<T>>>
- dereference
next
and reassignRc::clone(&new)
to*next
- Repeat the steps above, but with
tail
andnext
switched, andRc::clone
replaced withRc::downgrade
because we are working with weak references
This could become very tedious and redundant to implement all by hand. Instead, I chose to implement two methods on Node
type:
impl<T: Copy> Node<T> {
pub fn set_next(&mut self, next: &Rc<RefCell<Node<T>>>) {
match self {
Node::Nil => panic!("Setting next on Nil"),
Node::Cons{ val: _, prev: _, next: selfnext } => {
*selfnext = Rc::clone(next);
},
}
}
pub fn set_prev(&mut self, prev: &Rc<RefCell<Node<T>>>) {
match self {
Node::Nil => panic!("Setting prev on Nil"),
Node::Cons{ val: _, prev: selfprev, next: _ } => {
*selfprev = Rc::downgrade(prev);
}
}
}
}
Extracting reference out of a node
In implementing the branch of pop()
when queue.len() > 1
, I need to grab both the head
and head.next
, return the value held in head, and mutate the references in head.next
so that head.next.prev
points to the sentinel node instead of the old head
.
head
has type Rc<RefCell<Node<T>>>
, so *head.borrow()
has type Node<T>
, which we can put in match
statement. However, this violates the borrow-checker: the value behind deferencing a Ref
cannot be moved (which we intend to do).
// compiler error: cannot move out of deference of 'Ref<'_, queue::Node<T>>'
let (output, second) = match *self.head.borrow() {
Node::Cons{ val, prev: _, next } => {
(val, Rc::clone(&next))
},
_ => unreachable!(),
};
Instead we have to borrow the dereferenced value:
let (output, second) = match &*self.head.borrow() {
Node::Cons{
val, // &T
prev: _, // &Weak<RefCell<Node<T>>>
next // &Rc<RefCell<Node<T>>>
} => {
(*val, Rc::clone(next))
},
_ => unreachable!(),
};
Full implementation
Finally, here is the full implementation
impl<T: Copy> Queue<T> {
/// Return an empty queue
pub fn new() -> Queue<T> {
let sentinel = Rc::new(RefCell::new(Node::Nil));
return Queue {
head: Rc::clone(&sentinel),
tail: Rc::clone(&sentinel),
_len: 0,
_sentinel: sentinel,
};
}
/// Add an element to the tail-end of the queue
pub fn push(&mut self, val: T) {
match self._len {
0 => { // create new node then set both head and tail to new node
let new_node = Rc::new(RefCell::new(Node::Cons{
val,
prev: Rc::downgrade(&self._sentinel),
next: Rc::clone(&self._sentinel),
}));
self.head = Rc::clone(&new_node);
self.tail = Rc::clone(&new_node);
self._len = 1;
},
_ => {
let new_node = Rc::new(RefCell::new(Node::Cons{
val,
prev: Rc::downgrade(&self.tail),
next: Rc::clone(&self._sentinel),
}));
self.tail.borrow_mut().set_next(&new_node);
self.tail = Rc::clone(&new_node);
self._len += 1;
}
}
}
/// Pop the head and return (a copy of) the element held
pub fn pop(&mut self) -> T {
match self._len {
0 => panic!("Popping an empty queue!"),
1 => {
let output = match *self.head.borrow() {
Node::Cons{ val, prev: _, next: _ } => val,
_ => unreachable!(),
};
let sentinel = Rc::new(RefCell::new(Node::Nil));
self.head = Rc::clone(&sentinel);
self.tail = Rc::clone(&sentinel);
self._len = 0;
return output;
},
_ => {
let (output, second) = match &*self.head.borrow() {
Node::Cons{ val, prev: _, next } => {
(*val, Rc::clone(next))
},
_ => unreachable!(),
};
self.head = Rc::clone(&second);
self.head.borrow_mut().set_prev(&self._sentinel);
self._len -= 1;
return output;
}
}
}
/// Return the number of elements
pub fn len(&self) -> usize {
return self._len;
}
/// Return a copy of the value held by the head
pub fn peek(&self) -> T {
let head = self.head.borrow();
match &*head {
Node::Nil => panic!(""),
Node::Cons{ val, prev: _, next: _ } => *val,
}
}
}
Conclusion
This is a fantastic exercise that reviews Rc
, Weak
, RefCell
, and that helps practice the use of pointers and deferencing. With this exercise done I felt more confident of my knowledge of this chapter of “the book.”
And remember kids, don’t roll your own linked list in production, use the dequeue
that is already provided in the standard library.
You can find the complete source code here.