immutable data
pros and cons

In the course of writing our videogame, we (hopefully) learned:

The template for processing lists
model-view-controller — a very obvious distinction. The View is all of our draw-stuff functions; the Controller is our event-handling functions (stuff-handle-key, and the "tick-handlers" move-stuff/update-stuff). The model is the actual structs, along with functions like frog-overlap-truck? and game-over?.
How to write a videogame with immutable data — never reassigning to a single variable or field!
In particular, this approach is called “world passing style” — we have a (say) list of trucks, give it to a function, and get back a whole new list of trucks. Or, a world struct, which big-bang passes to various event-handler functions, and gets back the new/updated world struct.

Immutable data: disadvantages

Compare move-truck with an imperative version that just re-assigns to a field. This requires much more memory (allocate a whole new truck), and more time (we have to copy all the fields over, rather than just assign to one field). For a struct with 20 fields, this is a slowdown of perhaps 20x, plus more time for memory allocation and garbage collection.

Certainly, this seems expensive, but a we can also make the following observations:

People often mis-estimate where delays come from: programs almost always keep up with my keystrokes more than fine — it’s web-connections that really slow things down to the degree that I notice a delay.
In practice, our video-game ran just fine — a few hundred trucks/aliens/bricks/etc doesn’t come close to taxing a modern system.
You can take your video-game assignment and scale up the number of objects, and see how it plays. (Any lagginess in the assignment is usually based on only moving 1 pixel per keypress, not actual CPU run-time! That said, the image-library is not designed for time-efficiency.)

In real life, rather than fear or suspect that an approach won’t scale, I encourage you to implement something both ways and actually measure the difference. The slower approach might end up being too slow, but then again maybe it’s not a bottleneck, and you have saved dozens of development & debugging hours by avoiding useless optimization. (And even if not useless: how many times must your program run milliseconds faster, just to recoup the extra hours and hours of development time and cost?)
Measuring the effects is better than just imagining performance differences:
- A rule-of-thumb is that (imperative) python programs tend to be a factor of 5 slower than a C program, and (mostly-functional) racket programs are a factor of 2-3x slower than a python program.
- A confounding factor: dynamically-typed languages require more run-time checks, which slows performance.
- See {some actual benchmarks.
- Haskell, a statically-typed pure-functional language, often approaches C in efficiency. But there do exist some problems where it doesn’t (can’t?).
- Again, most programs you write are not computation-bound these days, so efficiency trade-offs are as important as people tend to think. When measuring tradeoffs, ignore differences less than 100ms for typical-size inputs! In my experience, if I notice a program is running slowly, it is almost always a network-delay, not a CPU issue.

Further discussion

Immutable data can mean better memory-sharing, w/o worrying about aliasing. This might (slightly) ameliorate the increased memory overhead.
to do in class:: Draw memory-diagram of sharing in lists, and trees.
How much extra memory is needed? At most a factor of two, in the world-passing style. When passing in (say) a list-of-trucks and getting a new “modified” list back, it’s likely that the old list is now unreachable, and can be garbage-collected. You can imagine that there is enough memory for two copies of the list, and the program alternates between memory chunk holds the current version, and which chunk is being used to write the udpated version into.
If you need to distant functions to convey information to each other, they can do so with global variables. (This is a strength, and a weakness.) See the next point.
To be purely functional, you tend to need more inputs to functions, and want/need to return multiple values. This can be tedious, especially if the language doesn’t provide support.
Though this also is precisely why such code is harder to reason about — it’s unclear what changes in distant code might change this procedure’s behavior. If you hear people talk about “dependency injection”, they mean “pass in extra arguments (like database-connections), rather than using global state”; people feel this is a win even in imperative programming, which suggests this negative is not so big after all.

Immutable data: advantages

It’s easier to reason about your program
(since a function’s behavior is entirely given by its code, w/o other (distant) code changing any fields or variables the local function deals with).
Every bug can be exposed by unit tests. Code in one place can’t trample on data that’s also used in a far-away function. If you have a bug in your program, and resorted to a debugger to step through until you located where things went wrong. (This is part of what’s meant by “functional programs are easier to reason about”.)
Immutable data can mean that the compiler can make better optimizations, if it knows that certain values won’t/can’t change.
One small example:
int a = foo(); // this might be a long/complicated expression. if (someCondition) return a; else return -1;
In a functional setting, you can evaluate someCondition first, and if it’s false then don’t even bother calling foo(). But in an imperative program, that can’t be done—the call to foo() might change state which then affects the outcome of someCondition. (That’s often not the case, but the compiler can’t know it’s safe to discard the call.)
The more data is (known to the compiler to be) immutable, the more multiple cores can be easily used. In particular, cache coherency isn’t a problem with immutable data.
No == vs. .equals subtleties to worry about — in absence of mutation, you tend to only care about objects having equal fields. (Think Java’s String — equals is almost certainly the comparison you care about.)

Secure Programming

Mutability can become a security hole. (We'll assume we're using Java, for the moment.) Imagine the following:

In an operating system, you certainly want to know who all your users are — String[] usernames = { ... };. And of course you wouldn't make this a public-field/variable, because you don't want other code to assign to it! But you may want to let other people know all the users, so it might be plausible that you'd include a getter, String[] getUsernames() { return usernames; }.

What's the problem? Aliasing + Mutability! An attacker can go ahead:

      String[] theUsers = getUsernames();
      getUserNames[2] = "hax0r";

at this point, the OS now thinks that hax0r is a known, registered user — oops!

Secure Programming principle: Don't give references to your own data-structures, to untrusted code! Instead, make a copy.

So we'll change our function to be String[] getUsernames() { return Arrays.copy(usernames); }. Now an attacker can get a copy of the array, and if they assign into the array that's fine, they're only assigning to their own copy.

But our problems may not be over yet! If we are in a language like C (or many others), strings are mutable. So an attacker can still weasel their way into the system's trust:

      String[] theUsers = getUsernames();
      getUserNames[2][0] = 'h';
      getUserNames[2][1] = 'a';
      getUserNames[2][2] = 'x';
      getUserNames[2][3] = '0';
      getUserNames[2][4] = 'r';
      getUserNames[2][5] = '\0';

Secure Programming principle, cont.: Remember to make a deep copy!

Note that Java is not susceptible to this latter attack. Why not? Java's Strings are immutable!

This page licensed CC-BY 4.0 Ian Barland
Page last generated Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu

immutable data pros and cons

Immutable data: disadvantages

Immutable data: advantages

Secure Programming

immutable data
pros and cons