Command-line rust notes (5)
With three chapters remaining (fortune
, cal
, and ls
), I felt sufficiently bored of the content and would like to move on. However, leaving all the efforts that went into this set of mini projects behind without proper review makes me weary that I will forget the valuable lessons I learned from the past ten weeks of grind. Therefore, for the remainder of March while I also need to deal with tax return, I will be re-reading the first ten chapters and consolidate my learnings into one or more blog posts, as well as a cheat sheet on this repository.
Table of content
Exit code patterns
In UNIX system, the exit code of a program can be used to communicate the final status of a program. By convention, an exit code of 0
indicates that the program finished without any errors, while non-zero exit codes can be used to express a variety of errors.
A common pattern in all except for the most fundamental programs (true
, false
, and echo
) is to let the library’s run
function return a Result
struct and use the variants to decide which exit code to run with:
use std::process;
use crate::libtail;
fn main() {
if let Err(e) = libtail::run() {
eprintln!("tail: {e}")
process::exit(1);
}
process::exit(0); // kind of redundant
I personally found this pattern to be insufficient since it requires that the run
function to return some Err
for the program to exit with a non-zero exit code, which is not always the elegant thing to do. For example, when the program reads through multiple files (such as head
, tail
, cat
, etc.), even if some of the files fail to open, the program will still apply its logic to the other files, but the exit code will be non-zero.
My solution to this is to set the return Result
type to encapsulate an i32
as the exit code in its Ok
variant:
use std::process;
use crate::libtail;
fn main() {
match libtail::run() {
Ok(exit_code) => process::exit(exit_code),
Err(e) => {
eprintln!("tail: {e}");
process::exit(1);
}
}
}
CLI argument parsing using Clap
While it is possible to directly parse command-line arguments from std::env::args
, in practice it’s wildly impractical and error-prone. In this project, the crate clap
is used.
In newer versions of clap
, a “derive” pattern can be used to define CLI parsing scheme through a struct that derives the Parser
trait. The struct can be instantiated using the try_parse
method (which is preferred over parse
since try_parse
will return error instead of panicking)
use clap::Parser;
/// Brief description of the program
#[derive(Debug,Parser)]
#[command(version="x.y.z")]
#[command(author="Ganyu Xu <xuganyu@berkeley.edu>")]
struct Args {
/// Boolean flag
#[arg(short='v', long="verbose")]
verbose: bool,
/// An optional integer argument
#[arg(short, long, default_value_t = 10)]
count: Option<usize>,
/// Demonstrate mutual exclusivity with "that"
#[arg(short, long)]
this: bool,
/// Demonstrate mutual exclusivity with "this"
#[arg(short, long, conflicts_with("this")]
that: bool,
/// A positional argument
file1: String
/// A second positional argument
file2: Optional<String>
}
pub fn run() -> Result<i32, Box<dyn Error>> {
let args = Args::try_parse()?;
// ...
}
Helpful information
The -h
flag can be used to display information about the acceptable arguments and information about the program.
- A short description of the program is specified using
///
comments on the parser struct - Description of each argument is specified using
///
comments on each of the argument - Apply the
#[command(version = "x.y.z")]
attribute to the parser struct so that the--version
can be used to display version information. If no value is specified#[command(version)]
then version information will be derived fromCargo.toml
. Author info is a similar story
Keyword arguments
Keyword arguments must have the #[arg(...)]
attribute, for which short
and long
flag name can be specified.
If no value is specified, the short and long flags are inferred from the name of the variable. Otherwise, the short flag must be a char
while the long flag should be a string
Parsing non-string type
Sometimes when the argument is meant to be non-string types, it’s possible to specify it in the argument and let clap
parse it. However, for anything other than the most simple parsing it’s recommended to use clap
to read the string and then explicitly parse the argument.
If the input argument cannot be parsed, clap
will crash the program with an error message.
Optional argument
Keyword arguments that are not required should be specified as Option<T>
, otherwise it will be considered required and will lead to errors if missing.
Alternative, default values can be specified using #[arg(default_value_t = ...)]
. However, the value that can be specified in this attribute is limited.
#[arg(short, long, default_value_t = 10)]
count: usize
For anything other than the simplest default value, I personally recommend using an Option<T>
then provide a default after parsing before constructing the Config
struct.
Mutually exclusive arguments
Some keyword arguments are mutually exclusive (e.g. see the bytes
and chars
flags in wc). Mutual exclusivity is specified using #[arg(conflicts_with("variable_name")]
Project organization
For all but the most straightforward programs, it makes sense that the source code is divided between the binary and the library, where the binary simply invokes the functions in the library module, including the main routine that is conventionally named run
.
Foreseeing the number of binaries and libraries, I chose to organize the project as follows:
- Each binary is stored under
src/bin/program.rs
- Functions common to all programs are stored under
src/lib.rs
in acommon
module - Functions unique to individual programs are stored in individual
src/libxxx.rs
modules and referenced insrc/lib.rs
For example, one function that almost shows up in every program starting with cat
is open
, which takes a path and returns a buffered reader that points to either a file or stdin
depending on the input:
/// Open a file or a stdin and return a buffered reader against it
/// Upon encountering error, the error will be pre-pended with the path
pub fn open(path: &str) -> MyResult<Box<dyn BufRead>> {
let reader: Box<dyn BufRead> = match path {
"" | "-" => {
Box::new(BufReader::new(io::stdin()))
},
_ => {
let file = File::open(path)
.map_err(|e| format!("{path}: {e}"))?;
Box::new(BufReader::new(file))
},
};
return Ok(reader);
}
Note that for importing modules and components within the library modules, we need to import using crate::xxx
; on the other hand, we need to use packagename::xxx
to import components into the binary (reference).
Iterating over lines using BufRead
The BufReader
struct and the BufRead
trait are common recurrences in the programs of this project for interacting with STDIN
and files.
First, note that many functions are not available in BufReader
alone; instead, the BufRead
trait must be brought into scope before functions like lines()
and read_lines()
become available to the BufReader
object.
When implementing cat
, I choose to implement a function that reads from the input (stdin
or file) line by line so as to keep count of the appropriate line number depending on whether I am counting all lines or non-empty lines. My first implementation uses the read_line
method from the BufRead
trait, which required the input of a buffer:
/// An implementation of "cat" with C-style read_line
fn cat<T: BufRead>(
reader: &mut T,
...
) -> MyResult<String> {
let mut buf = String::new();
while let Ok(nbytes) = reader.read_line(&mut buf) {
if nbytes == 0 { break; }
// cat logic ...
}
// return
}
We can further simplify the implementation using the lines()
function, which returns an iterator over the lines Iterator<Item = Result<String, ...>>
.
fn cat<T: BufRead>(
reader: &mut T,
count_nonblank: bool,
count_all: bool,
) -> MyResult<()> {
let mut line_no = 0;
for line in reader.lines() {
let line = line?; // why I don't use iterators
if (count_nonblank && line.len() != 0) || count_all {
println!("{:>6}\t{}", line_no + 1, line);
line_no += 1;
} else {
println!("{line}");
}
}
return Ok(());
}
Finally, we can convert the for loop into functional-style code using closures:
fn cat<T: BufRead>(
reader: &mut T,
count_nonblank: bool,
count_all: bool,
) -> MyResult<()> {
let mut line_no = 0;
reader.lines()
.filter_map(|line_or_err| line_or_err.map_or(None, |line| Some(line)))
.map(|line| {
if (count_nonblank && line.len() != 0) || count_all {
line_no += 1;
return format!("{:>6}\t{}", line_no, line);
}
return line;
})
.for_each(|line| println!("{line}"));
return Ok(());
}
Another minor detail to note are the various syntaxes to declare the functions:
/// For specifying a simple trait, do it at the function name:
fn cat<T: BufRead>(reader: &mut T) -> MyResult<()> {}
/// For specifying combinations of trait, use a "where" claus:
fn tail<T>(reader: &mut T) -> MyResult<()>
where T: Read + Seek {}
/// TODO: I am not sure if it makes sense to move the reader object
fn cat<T>(mut reader: T) -> MyResult<()> {}