![]() |
![]() |
|
#1-#3 (incl. unit tests) due in class Apr.01 (Tue), D2L main.rs only;
All problems due in class Apr.08 (Tue), on D2L and hardcopy main.rs only.
Some details might be clarified/added;
Any such changes will be summarized at the top of this file.
clarifications/updates As per the syllabus's honor policy, you must understand, and be the direct author of, all work you submit. (See syllabus for what level of help is acceptable, and how to proceed when you work through the code w/ help from others.)
How far does one tend to get, on a random walk? We'll calculate some random walks, find their average distance, and then compute how far those values tend to be (i.e. are the numbers clustered around the average, or do they vary widely?).
We will write functions which, for an array of numbers, compute the sum, average (mean), standard deviation, and mean-distance-from-mean.
recall: We discussed how to test a void method (which mutates an arrays contents) in our recent lab/lecture.
Use assert_f64_near! to test with doubles. You can import this function as we discussed our lab/lecture.
Testing NaN: Very-oddly, f64::NAN != f64::NAN! (This is also true in Java: Double.NaN != Double.NaN; it's part of the IEEE standard for floating-point numbers.) If you expect a function to return NaN, then you can't use assert_eq!; instead you will need to assert!(someF64.is_nan()).
/// Return a (boxed) array of `len` elements of type T. /// author: nlahn@radford.edu /// fn boxed_array<T : Copy + Default>( len: usize ) -> Box<[T]> { vec![T::default(); len].into_boxed_slice() } You can call this function with, e.g., boxed_array::<bool>(99). |
Your function should also take in a seed, used for seed_from_u64. Even better: have the function take in a Rng object, as we showed in class, in rng-example.rs. This lets our unit tests be reproducible, even in the presence of randomness.
Every function should be preceded by /// and a short (one-sentence) purpose-statement; it should mention all its parameters. (It might even be copy/pasted from this program.) It should not mention any implementation details.
Comments about implementation should be inside the function.
You don't need to comment things that would be clear to a Rust programmer.
So no need for comments like
Every function should include unit tests. For unsigned-integers, testing 0, 1, and a bigger value help “cover all the bases”. Other types may warrant checking with a negative number and/or fractional value. For array inputs and/or results, be sure to test an array-of-size-0, an array-of-size-1, and an array-of-size-many. Depending on the function, you might have further tests checking other situtations. You do not need multiple tests all checking for the same situation repeatedly. For example, in a function processing strings: testing the strings "", "a", and "abcde" are all plausible, but adding a test for "abcdefgh" probably won't catch any bugs that the previous three didn't. Depending on the function's purpose, you may — or may not — want to test strings that have spaces, a string of only-spaces, and a string that has punctuation (or, is only punctuation). Though if you know2 that the function's task need never pay any attention to what the letters are, then testing spaces/punctuation would not be a priority.
Use good, meaningful names athat are self-documenting, not just for local variables (incl. parameters), but also for functions. This can greatly lessen, or even replace, the need for other documentation.
We are not so much concerned with the average value of our data, but rather how far the data tends to vary from its average value (from its “mean”). For instance, the values {3,4,5} and the values {−1,9} both have a mean of 4, but in the first case the values tend to be close to 4, while in the second they are spread further away.
In particular, how far are the numbers {3,4,5} from their mean, 4, on average? We add up the distance of each number from 4, and then divide by how many numbers we had: (1 + 0 + 1)/3 = 2/3. This indicates that the data is clustered close to its mean: on average, it’s only 2/3 away from the mean.
In the second case, how far does {−1,9} tend to be from 4? This should be a larger value than before, since these numbers are spread further away from their mean. Don’t confuse the two things we are averaging: the average value (mean) of the array, and later the average distance-from-that-mean.3
So all that said, how can our program calculate the average distance of our data from its mean? We already have a function which calculates the average of an array of numbers, so let’s use it again!
Consider a person who leaves a pub, and travels in the following manner: a third of the time they walk north a block. A third of the time they walk south a block. And the remaining third of the time they don’t move at all. This is called a random walk (along one dimension, only walking along a line going north-south.)
After taking, say, k moves, where will the person tend to be? Sometimes they’ll be fully k moves north of the pub, though that’s not very likely. Sometimes they’ll be exactly back at the pub entrance (also still pretty unlikely that they went north exactly as many times as they went south). Most of the time they’ll probably be somewhat near the pub, but not exactly back where they started from. So a natural question is: On average, where do they end up?
And if fifty such random walkers all leave the pub, where will they be, on average? How much variation will there be — will these fifty people all be roughly the same distance away from the pub, or will several be very close to the pub while several others are nearly a full k blocks away?
Aha, we are now in a position to experimentally find how far away they are!
First, write a function
randWalk( numSteps: u32 ) -> i32, which simulates a single random
walk of numSteps steps,
and returns the final position
(in blocks north of the pub; this number will of course be negative
if the final position was south of the pub).
You’ll call this function, passing it WALK LENGTH,
which you should #define to be 150.
Initially, the walker starts 0 blocks north of the pub,
and for the following numSteps times,
it takes adds a random amount of
−1, 0, or +1.
(Hmm, last week’s function randBetween could come in handy here!)
See the section coding-randomness.
We will run our program, using NUM DATA different random walkers leaving the pub. The function fillArray will put into each array element the result of a single random walk: That is, arr[0] will be the ending position of one random walk of length WALK LENGTH. Seperately, arr[1] will be the ending position of another random walk, and so on. So fillArray is almost identical to what it was before, except that instead of filling arr[i] with the value i, it fills it with randWalk( WALK LENGTH ).
The rest of your program, without changing it4, is now calculating the mean ending position of all those random walkers, and how far away they are from this mean position, on average.
This approach is called a Monte Carlo simulation: rather than mathematically figuring out exactly what the average is, we do a whole bunch of simulations, and get an idea of what’s going on. Can you guess a mathematical expression which gives how far the walkers tend to be from their mean position, as WALK LENGTH gets larger and larger?
In class Apr.01, we go over rng-example, which illustratres using random numbers in Rust. The tl;dr:
For standard deviation, instead of taking the absolute-value of the difference (ensuring our distance is a positive amount), we use the difference squared. The mean-squared-distance-from-mean is called the variance. It is decent measure of spread, although it kinda has the wrong units (e.g. square-miles, instead of miles), so we finish by taking the square-root of the variance, and that is the standard deviation.
However note that $\sqrt{3^2 + 4^5}$ is different than $3+4$; by squaring, then adding, then taking the root we've distorted things a bit. So why use it at all, instead of mean-distance-from-mean? Because calculus: we can take the derivative of square-root, but absolute-value has a cusp, which prevents us from using calculus to do all sorts of higher-level analysis of distributions.
↩This page licensed CC-BY 4.0 Ian Barland Page last generated | Please mail any suggestions (incl. typos, broken links) to ibarland ![]() |