The story of Morgiana from the Arabian Nights, as told by Mill, from my previous post is central to understanding information theory. To re-iterate:
If like the robber in the Arabian Nights we make a mark with chalk on a house to enable us to know it again, the mark has a purpose, but it has not properly any meaning. The chalk does not declare anything about the house; it does not mean, This is such a person’s house, or This is a house which contains a booty. The object of making the mark is merely distinction. I say to myself, All these houses are so nearly alike that if I lose sight of them I shall not again be able to distinguish that which I am now looking at, from any of the others; I must therefore contrive to make the appearance of this house unlike that of others, that I may hereafter know when I see the mark – not indeed any attribute of the house – but simply that it is the same house that I am looking at. Morgiana chalked all the other houses in a similar manner, and defeated the scheme: how? simply by obliterating the difference of appearance between that house and the others. The chalk was still there, but it no longer served the purpose of a distinctive mark.
The basic question for information theory is: Given some chalk and some houses with doors, how many different sorts of messages could Morgiana send or erase. In what follows I am going to try to show how Entropy, Average surprisal and Zipfian slopes tells this story, and show why Zipfian slopes are more universally applicable and therefore useful to science generally.
The basic insight behind using Zipfian slopes is that the number of doors is irrelevant to Morgiana’s strategy. Each door also be subdivided indefinitely and the same approach would still work. With the charts that I have been showing there are twelve doors (or marking slots) in total, and twelve colours of chalk (including no mark). For twelve doors, there are a total of 77 ways in which the doors and chalk marks could be combined.
The average surprisal to entropy graph below is useful because it shows the structure of these combinations clearly. Don’t worry too much about the values: Part of my thesis here is that the values themselves are not as important as the structural relationship between the results taken by different measures (this is a key point I will be returning to repeatedly).
At the top left, with an entropy of 3.58, is the case where all of the 12 doors are marked the differently. At the bottom right is where every door is marked the same (including no marks), with an entropy of 0. We can denote these by {1,1,1,1,1,1,1,1,1,1,1,1} and {12} respectively.
Importantly, if we imagine ourselves trying to communicate a message without pre-arranging a code, both of these arrangements are equally useless. The only way that we can know if a mark is special is if it is repeated. When we pre-arrange a code (e.g. when we develop a language), we rely on the repetition of marks between things we hold in our memory and what we see on the door. In that way, we Re-Cognize marks.
So how does repeating marks in different combinations affect the number of ways Morgiana can send a message (or erase it)?
Let us now tilt the graph to focus on the bands of results:
Going from top to bottom in the tilted graph, we have lines going from {12} (no differences), and then the next line down has, from left to right {6,6}, {7,5}, {8,3}, …{11,1}. So as we go down the graph, the number of different types of chalk marks increase. And as we go from left to right, the amount difference between the most and least common mark increases.
The next view shows all the values where there is one repeated mark and the rest all different, from top to bottom: {11,1}, {10,1,1}, {9,1,1,1}…
All that the Zipfian chart does to this picture is tilt it, such that all cases where there are an even number of marks of a certain type have the same slope value of 0. This is to represent the fact that the effectiveness of Morgiana’s strategy is not a function of the number of doors or chalkmarks, but the relationship between their relative frequencies.
Average surprisal is useful here, because it helps clarify the connection between entropy and Zipfian slopes.
So this is where the following chart from my thesis comes from:
I hope that these charts demonstrate how we can use Zipfian slopes to tell the story of the relative amount of markedness that Morgiana achieves, without us having to know anything about the number of doors or the nature of the marks themselves.
In the next post I will have a quick look at the role of Zipf’s law itself in this whole affair.