cs320 Random Walks

#1-#3 (incl. unit tests) due in class Apr.01 (Tue), D2L main.rs only;
All problems due in class Apr.08 (Tue), on D2L and hardcopy main.rs only.
Some details might be clarified/added; Any such changes will be summarized at the top of this file.

clarifications/updates As per the syllabus's honor policy, you must understand, and be the direct author of, all work you submit. (See syllabus for what level of help is acceptable, and how to proceed when you work through the code w/ help from others.)

How far does one tend to get, on a random walk? We'll calculate some random walks, find their average distance, and then compute how far those values tend to be (i.e. are the numbers clustered around the average, or do they vary widely?).

We will write functions which, for an array of numbers, compute the sum, average (mean), standard deviation, and mean-distance-from-mean.

Tests, Comments and Program Style

Every function should be preceded by /// and a short (one-sentence) purpose-statement; it should mention all its parameters. (It might even be copy/pasted from this program.) It should not mention any implementation details.

Comments about implementation should be inside the function. You don't need to comment things that would be clear to a Rust programmer. So no need for comments like “This is creating a new struct”, but it's still fine to have comments like “Because nums.contains(…) returned true above, nums is non-empty, so this nums.get(0) is safe”).

Every function should include unit tests. For unsigned-integers, testing 0, 1, and a bigger value help “cover all the bases”. Other types may warrant checking with a negative number and/or fractional value. For array inputs and/or results, be sure to test an array-of-size-0, an array-of-size-1, and an array-of-size-many. Depending on the function, you might have further tests checking other situtations. You do not need multiple tests all checking for the same situation repeatedly. For example, in a function processing strings: testing the strings "", "a", and "abcde" are all plausible, but adding a test for "abcdefgh" probably won't catch any bugs that the previous three didn't. Depending on the function's purpose, you may — or may not — want to test strings that have spaces, a string of only-spaces, and a string that has punctuation (or, is only punctuation). Though if you know ² that the function's task need never pay any attention to what the letters are, then testing spaces/punctuation would not be a priority.

Use good, meaningful names athat are self-documenting, not just for local variables (incl. parameters), but also for functions. This can greatly lessen, or even replace, the need for other documentation.

Measuring spread: mean-distance-from-mean

We are not so much concerned with the average value of our data, but rather how far the data tends to vary from its average value (from its “mean”). For instance, the values {3,4,5} and the values {−1,9} both have a mean of 4, but in the first case the values tend to be close to 4, while in the second they are spread further away.

In particular, how far are the numbers {3,4,5} from their mean, 4, on average? We add up the distance of each number from 4, and then divide by how many numbers we had: (1 + 0 + 1)/3 = 2/3. This indicates that the data is clustered close to its mean: on average, it’s only 2/3 away from the mean.

In the second case, how far does {−1,9} tend to be from 4? This should be a larger value than before, since these numbers are spread further away from their mean. Don’t confuse the two things we are averaging: the average value (mean) of the array, and later the average distance-from-that-mean.³

So all that said, how can our program calculate the average distance of our data from its mean? We already have a function which calculates the average of an array of numbers, so let’s use it again!

Random Walks

Consider a person who leaves a pub, and travels in the following manner: a third of the time they walk north a block. A third of the time they walk south a block. And the remaining third of the time they don’t move at all. This is called a random walk (along one dimension, only walking along a line going north-south.)

After taking, say, k moves, where will the person tend to be? Sometimes they’ll be fully k moves north of the pub, though that’s not very likely. Sometimes they’ll be exactly back at the pub entrance (also still pretty unlikely that they went north exactly as many times as they went south). Most of the time they’ll probably be somewhat near the pub, but not exactly back where they started from. So a natural question is: On average, where do they end up?

And if fifty such random walkers all leave the pub, where will they be, on average? How much variation will there be — will these fifty people all be roughly the same distance away from the pub, or will several be very close to the pub while several others are nearly a full k blocks away?

First, write a function randWalk( numSteps: u32 ) -> i32, which simulates a single random walk of numSteps steps, and returns the final position (in blocks north of the pub; this number will of course be negative if the final position was south of the pub). You’ll call this function, passing it WALK LENGTH, which you should #define to be 150. Initially, the walker starts 0 blocks north of the pub, and for the following numSteps times, it takes adds a random amount of −1, 0, or +1. ~~(Hmm, last week’s function randBetween could come in handy here!)~~ See the section coding-randomness.

We will run our program, using NUM DATA different random walkers leaving the pub. The function fillArray will put into each array element the result of a single random walk: That is, arr[0] will be the ending position of one random walk of length WALK LENGTH. Seperately, arr[1] will be the ending position of another random walk, and so on. So fillArray is almost identical to what it was before, except that instead of filling arr[i] with the value i, it fills it with randWalk( WALK LENGTH ).

The rest of your program, without changing it ⁴, is now calculating the mean ending position of all those random walkers, and how far away they are from this mean position, on average.

This approach is called a Monte Carlo simulation: rather than mathematically figuring out exactly what the average is, we do a whole bunch of simulations, and get an idea of what’s going on. Can you guess a mathematical expression which gives how far the walkers tend to be from their mean position, as WALK LENGTH gets larger and larger?

Coding randomness

In class Apr.01, we go over rng-example, which illustratres using random numbers in Rust. The tl;dr:

¹ You can call abs as a method: e.g. (-10).abs() returns 10 ↩

² In black-box testing, you make no assumptions about what the code might be doing. But for this course, white-box testing is fine: you can make assumptions/guesses about how the code might work (and after it's written, even add more tests based its exact code, if-statements, etc.). ↩

³ Another common way to measure how spread out data is from its mean is its standard deviation. This quantity is nicer to deal with mathematically, but isn't quite as intuitive a quantity as mean-difference-from-mean.

For standard deviation, instead of taking the absolute-value of the difference (ensuring our distance is a positive amount), we use the difference squared. The mean-squared-distance-from-mean is called the variance. It is decent measure of spread, although it kinda has the wrong units (e.g. square-miles, instead of miles), so we finish by taking the square-root of the variance, and that is the standard deviation.

However note that $\sqrt{3^2 + 4^5}$ is different than $3+4$; by squaring, then adding, then taking the root we've distorted things a bit. So why use it at all, instead of mean-distance-from-mean? Because calculus: we can take the derivative of square-root, but absolute-value has a cusp, which prevents us from using calculus to do all sorts of higher-level analysis of distributions.

↩

⁴ You may want to modify what your program prints out when reporting its results, though. ↩

Random Walks passing arrays in Rust

Tests, Comments and Program Style

Measuring spread: mean-distance-from-mean

Random Walks

Coding randomness

Random Walks
passing arrays in Rust