RU beehive logo promo banner for Computing & Info Sciences
CS 320
2025spring
ibarland

Random Walks
passing arrays in Rust

#1-#3 (incl. unit tests) due in class Apr.01 (Tue), D2L main.rs only;
All problems due in class Apr.08 (Tue), on D2L and hardcopy main.rs only.
Some details might be clarified/added; Any such changes will be summarized at the top of this file.

clarifications/updates As per the syllabus's honor policy, you must understand, and be the direct author of, all work you submit. (See syllabus for what level of help is acceptable, and how to proceed when you work through the code w/ help from others.)

How far does one tend to get, on a random walk? We'll calculate some random walks, find their average distance, and then compute how far those values tend to be (i.e. are the numbers clustered around the average, or do they vary widely?).

We will write functions which, for an array of numbers, compute the sum, average (mean), standard deviation, and mean-distance-from-mean.


  1. Write a function fill_0_to_N which, given a &mut [f64], fills it with data. In this case, it will be boring data: the location at index i will just hold i.
    recall: We discussed how to test a void method (which mutates an arrays contents) in our recent lab/lecture.
  2. Write a function which, given a &[f64], returns its sum. Do not use the sum method of Iterators.

    Use assert_f64_near! to test with doubles. You can import this function as we discussed our lab/lecture.

  3. Write a function which, given a &[f64], returns its average (mean). If the array is empty, its average can be f64::NAN (not-a-number, which is what 0.0/0.0 returns). What do you think the average would be for an array filled with [0,1,2,3,4] be? How about [0,1,2,3,…,100]? These can become a couple of test-cases, when combined with fill_1_to_N.
    Testing NaN: Very-oddly, f64::NAN != f64::NAN! (This is also true in Java: Double.NaN != Double.NaN; it's part of the IEEE standard for floating-point numbers.) If you expect a function to return NaN, then you can't use assert_eq!; instead you will need to assert!(someF64.is_nan()).
  4. Write a function distances_from which, given a &[f64] and a particular target value x, returns another new array (or array-slice) where each element is the distance (absolute value)1 from x.
    Part of this requirement is so you have a function which returns a new array/slice. Do not use Vec (or vec!) — just array-slices. You may, however, use the following one function:
    /// Return a (boxed) array of `len` elements of type T.
    /// author: nlahn@radford.edu
    ///
    fn boxed_array<T : Copy + Default>( len: usize ) -> Box<[T]> {
        vec![T::default(); len].into_boxed_slice()
    }
    You can call this function with, e.g., boxed_array::<bool>(99).
  5. Write a function mean_distance_from_mean( data: &[f64] ) -> f64, which returns a measure of data's spread: how far each element is from the average, on average. See below.
    This function will be a one-liner, using the functions above. It's tests will also be easy to make, between the tests you already have for the above functions, and the examples mentioned below
  6. Write many_walks, which (given n and k) returns the lengths of n random walks each of k steps (see below).

    Your function should also take in a seed, used for seed_from_u64. Even better: have the function take in a Rng object, as we showed in class, in rng-example.rs. This lets our unit tests be reproducible, even in the presence of randomness.

  7. Finally: For each walk-length in 4, 16, 64, 256, and 1024: Run 100 random walks of that length, and print both the average-distance of those walks, and the average-distance-from-the-average.
    A couple of things to think about: Do you see a pattern, in the distance achieved in a walk of length n? (If so, can you test it?) Are the walk-lengths pretty tightly clustered, or spread out?

Tests, Comments and Program Style

Every function should be preceded by /// and a short (one-sentence) purpose-statement; it should mention all its parameters. (It might even be copy/pasted from this program.) It should not mention any implementation details.

Comments about implementation should be inside the function. You don't need to comment things that would be clear to a Rust programmer. So no need for comments like “This is creating a new struct”, but it's still fine to have comments like “Because nums.contains(…) returned true above, nums is non-empty, so this nums.get(0) is safe”).

Every function should include unit tests. For unsigned-integers, testing 0, 1, and a bigger value help “cover all the bases”. Other types may warrant checking with a negative number and/or fractional value. For array inputs and/or results, be sure to test an array-of-size-0, an array-of-size-1, and an array-of-size-many. Depending on the function, you might have further tests checking other situtations. You do not need multiple tests all checking for the same situation repeatedly. For example, in a function processing strings: testing the strings "", "a", and "abcde" are all plausible, but adding a test for "abcdefgh" probably won't catch any bugs that the previous three didn't. Depending on the function's purpose, you may — or may not — want to test strings that have spaces, a string of only-spaces, and a string that has punctuation (or, is only punctuation). Though if you know2 that the function's task need never pay any attention to what the letters are, then testing spaces/punctuation would not be a priority.

Use good, meaningful names athat are self-documenting, not just for local variables (incl. parameters), but also for functions. This can greatly lessen, or even replace, the need for other documentation.


Measuring spread: mean-distance-from-mean

We are not so much concerned with the average value of our data, but rather how far the data tends to vary from its average value (from its “mean”). For instance, the values {3,4,5} and the values {−1,9} both have a mean of 4, but in the first case the values tend to be close to 4, while in the second they are spread further away.

In particular, how far are the numbers {3,4,5} from their mean, 4, on average? We add up the distance of each number from 4, and then divide by how many numbers we had: (1 + 0 + 1)/3 = 2/3. This indicates that the data is clustered close to its mean: on average, it’s only 2/3 away from the mean.

In the second case, how far does {−1,9} tend to be from 4? This should be a larger value than before, since these numbers are spread further away from their mean. Don’t confuse the two things we are averaging: the average value (mean) of the array, and later the average distance-from-that-mean.3

So all that said, how can our program calculate the average distance of our data from its mean? We already have a function which calculates the average of an array of numbers, so let’s use it again!


Random Walks

Consider a person who leaves a pub, and travels in the following manner: a third of the time they walk north a block. A third of the time they walk south a block. And the remaining third of the time they don’t move at all. This is called a random walk (along one dimension, only walking along a line going north-south.)

After taking, say, k moves, where will the person tend to be? Sometimes they’ll be fully k moves north of the pub, though that’s not very likely. Sometimes they’ll be exactly back at the pub entrance (also still pretty unlikely that they went north exactly as many times as they went south). Most of the time they’ll probably be somewhat near the pub, but not exactly back where they started from. So a natural question is: On average, where do they end up?

And if fifty such random walkers all leave the pub, where will they be, on average? How much variation will there be — will these fifty people all be roughly the same distance away from the pub, or will several be very close to the pub while several others are nearly a full k blocks away?

Aha, we are now in a position to experimentally find how far away they are!

First, write a function randWalk( numSteps: u32 ) -> i32, which simulates a single random walk of numSteps steps, and returns the final position (in blocks north of the pub; this number will of course be negative if the final position was south of the pub). You’ll call this function, passing it WALK LENGTH, which you should #define to be 150. Initially, the walker starts 0 blocks north of the pub, and for the following numSteps times, it takes adds a random amount of −1, 0, or +1. (Hmm, last week’s function randBetween could come in handy here!) See the section coding-randomness.

We will run our program, using NUM DATA different random walkers leaving the pub. The function fillArray will put into each array element the result of a single random walk: That is, arr[0] will be the ending position of one random walk of length WALK LENGTH. Seperately, arr[1] will be the ending position of another random walk, and so on. So fillArray is almost identical to what it was before, except that instead of filling arr[i] with the value i, it fills it with randWalk( WALK LENGTH ).

The rest of your program, without changing it4, is now calculating the mean ending position of all those random walkers, and how far away they are from this mean position, on average.

This approach is called a Monte Carlo simulation: rather than mathematically figuring out exactly what the average is, we do a whole bunch of simulations, and get an idea of what’s going on. Can you guess a mathematical expression which gives how far the walkers tend to be from their mean position, as WALK LENGTH gets larger and larger?


Coding randomness

In class Apr.01, we go over rng-example, which illustratres using random numbers in Rust. The tl;dr:


1 You can call abs as a method: e.g. (-10).abs() returns 10      
2 In black-box testing, you make no assumptions about what the code might be doing. But for this course, white-box testing is fine: you can make assumptions/guesses about how the code might work (and after it's written, even add more tests based its exact code, if-statements, etc.).      
3 Another common way to measure how spread out data is from its mean is its standard deviation. This quantity is nicer to deal with mathematically, but isn't quite as intuitive a quantity as mean-difference-from-mean.

For standard deviation, instead of taking the absolute-value of the difference (ensuring our distance is a positive amount), we use the difference squared. The mean-squared-distance-from-mean is called the variance. It is decent measure of spread, although it kinda has the wrong units (e.g. square-miles, instead of miles), so we finish by taking the square-root of the variance, and that is the standard deviation.

However note that $\sqrt{3^2 + 4^5}$ is different than $3+4$; by squaring, then adding, then taking the root we've distorted things a bit. So why use it at all, instead of mean-distance-from-mean? Because calculus: we can take the derivative of square-root, but absolute-value has a cusp, which prevents us from using calculus to do all sorts of higher-level analysis of distributions.

     
4 You may want to modify what your program prints out when reporting its results, though.      

logo for creative commons by-attribution license
This page licensed CC-BY 4.0 Ian Barland
Page last generated
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.