What is Information?

Is there information in a book? Is it in the pages and extracted when it is read? Why don’t two people get the same amount of information from the same book? For that matter, why do I feel informed differently when I read the same book a second time? Or if I read the same book but it’s written in another language? Why do I matter so much when I’m talking about the information contained in a book?

The first lesson of understanding information is that it is contextual.

It’s impossible to disentangle the two aspects of what make information: the thinking thing, which I will refer to as an epistemic agent (EA), and the interaction of the EA with data, which I will refer to as an experience. An EA changes, it feels informed, when it has an experience that is informative, so information must be related to experiences that change an EA.  An experience is interpreted through the lens of the EA, so information must be related to the state of the EA.

The question that must then be addressed is what is it about the EA that changes during an experience? Put more concisely, what is the state of an EA?

We can look to ourselves for guidance in this matter. Whenever I have an experience that I feel is informative, it is because the experience has made me see things in a new light. I believe in certain things more strongly, and in other things less so after the experience. My beliefs about the world, whatever that may be, change when I am informed. The more moved I am by an experience, the more change occurs to my beliefs. At the very least, for the most mundane of experiences, my belief in the statement ‘I have experienced that’ increases, all else remaining the same.

If belief is the key to defining the state of an EA, then the next question is what are these beliefs about? I have already mentioned belief in a particular statement, so let me expand on that in a rather Wittgensteinian fashion. The world is what it is, and EA’s have pictures of the world that they believe. It is important to discern between the world itself, and the picture of the world. The ontology of the former is important, but it is the epistemology of the latter that we must look to. The picture of the world is painted with language, and statements are the atoms through which the picture is generated. As an EAs beliefs in statements fluctuate, so does the picture. The picture of the world that an EA has is an evershifting mental masterpiece, updated through experiences from the blank slate of a child to the well colored vision of an adult.

To be more precise, let $\mathcal R = (S,L)$, be a Realm of Discourse (RoD) consisting of a set of statements about the world, $S$, and logical connectives, $L:S^n\rightarrow S$. The latter are n-ary maps from multiple statements to statements. The logical connectives imply that there is a subset of $S$ known as atomic statements, $S_0 \subset S$, which form the most basic things that can be said about the world within the RoD. The remaining statements are called compound statements, for obvious reasons. Some examples of logical connectives, for all $s,s'\in S$, are the unary function of negation, $\neg(s) = \neg s$, and the binary functions of conjunction and disjunction, $\land(s,s')=s\land s'$ and $\lor(s,s')=s\lor s'$. Logical connectives are the rules by which the RoD is constructed, generating all possible statements that could be made about the world by EAs. The key thing to get from the preceding is that an RoD establishes all possible pictures of the world that can be reached, and hence puts limits on what can be said. What is outside of an RoD, an EA cannot speak of.

With all possible pictures of the world in hand, what determines the particular picture of the world that an EA has, and hence what is the state of the EA? We are now back to thinking about belief, and are almost in the position to define it quantitatively. From the RoD, a picture of the world is the degree to which an EA believes in every possible statement. The sky is blue and vaccines cause autism are examples of statements that an EA has some belief in. Note that we are NEVER talking about the truth value of any particular statement, only the degree to which an EA believes it. The epistemology is ontologicaly neutral. What the world is is less important than the picture of the world that the EA has. It is the picture of the world that we must examine; to understand how it changes due to experience. Furthermore, the beliefs of an EA are history dependent, in that they are what they are due to the experiences that an EA has had. More on this in a bit.

An interesting aspect of belief, which I alluded to by describing it as a degree, is that it is transitive: if I believe that I am a human being more than that I am an animal, and I believe that I am an animal more than that I am a rock, then necessarily I believe more so that I am human being over that I am a rock. This transitivity of belief is incredibly powerful, because it means that beliefs are both comparable and ordered. I can take any two beliefs I have and compare them, and then I can say which one I believe in more. It may be difficult to compare two beliefs that are very similar, but upon close inspection it appears all but impossible to find a pair that are fundamentally incomparable. These two properties of belief form the first of Cox‘s axioms, whose work has heavily influenced my thinking.

The ordering of belief has a wonderful consequence: we can model the degree to which an EA believes in statements from the RoD by real numbers. A picture of the world is a mapping from the RoD to the continuum, assigning to each statement in the RoD a real number. Recalling the history dependence on past experience, we denote the real number that describes the belief in statement $s\in L$ by $b(s|E)$, where $E$ describes the set of all past experiences relevant to the statement $s$.

Where do the logical connectives of the RoD come into play? This brings us back to the difference between atomic and compound statements. A picture of the world is rational if the belief function on the RoD obeys the second and third of Cox’s axioms: Common Sense and Consistency. Common sense reduces the space of possible relationships between beliefs in atomic and compound statements. Consistency results in a pragmatic rescaling of the RoD map, rendering the common sense relationships in a simpler form. These rescalings are referred to as regraduations of belief, and the end result will look very familiar.

Moving forward, let’s examine what we mean by common sense. Imagine a unary connective acting on a single statement, keeping all else fixed. Common sense dictates that the belief in the transformed statement should somehow be related to the EAs belief in the original statement:

Similarly for binary connectives, belief in a compound statement should be related to the belief in the statements separately, and dependently on one another:

One could go ahead towards arbitrary n-ary connectives, but writing down these relationships becomes quite unwieldy. Fortunately all we need is contained in the relationships $f:\mathrm R\rightarrow\mathrm R$ and $g:\mathrm R^4\rightarrow\mathrm R$. In fact, we can simplify things even further by restricting ourselves to the particular RoD that has negation and conjunction connectives. In this RoD all other logical connectives can be written as combinations of these two (For example disjunction can be expressed as $s\lor s' = \neg (\neg s \land \neg s')$; learn more at propositional logic). In this RoD the common sense relationship for binary operators is trivial for many combinations of its arguments, and reduces the domain to a 2-dimensional subspace of the original domain. There is still a freedom to choose which two arguments. Given the commutativity of conjunction, either pair $(b(s|H),b(s'|s,H))$ or $(b(s'|H),b(s|s',H))$ lead to the same results, so we choose the former for simplicity.

The application of common sense has lead us to necessary relationships between the belief function which generates the EA’s picture of the world, and the logical connectives the RoD is equipped with which limit what the EA can speak of. One could have, of course, demanded more complicated relationships, or none at all, but then one would have to argue why my belief in it is raining is not at all related to by belief in it is not raining. Common sense is epistemically satisfying, and, as will be shown, incredibly powerful in its restrictiveness on the possible forms of belief.

Consistency is the final ingredient that must be incorporated into the quantification of belief. It demands that if there are multiple ways of constructing a statement in the RoD, then the common sense relationships should act in such a way that belief in the statement does not depend on the path used to get to it. For example I could decompose the compound statement $s\land s'\land s''$ as either $s\land (s'\land s'')$ or $(s\land s')\land s''$. Consistency requires that associatively equivalent decompositions lead to the same belief value. This alone is quite powerful; consider the implications for $g$ due to it. Denote $x = b(s|H)$, $y = b(s'|s,H)$, and $z=b(s''|s,s',H)$. The above mentioned associativity implies:

This is a functional equation, and it is a beast. Functional equations make differential equations look like adorable little cupcakes. Functional equations are beautiful. Solving them, if possible, requires far more dexterity at analysis than attacking other types of equations, and this is but the first functional equation that we will find on our journey towards an epistemic understanding of information.

The solution to the associativity functional equation requires showing the existence of a function $\psi$ that satisfies:

The existence proof is long and tedious, but checking that the associative functional equation is satisfied is straightforward. Once one is convinced of this, we can regraduate(rescale) beliefs with $\psi$, $b \rightarrow \psi \circ b$. Why would we do this? The only reasonable explanation is pragmatism. This regraduation transforms the relationship for conjunctions into simple multiplication, yielding:

If you just got a tingly sensation, you’re not alone…

Recalling that $\neg\neg s=s$, consistency produces a second functional equation from the first common sense relationship on unary operators:

This equation is sometimes referred to as Babbage’s equation, and has a long history. One can verify quickly that for any invertible function $\phi$, a solution to this equation is of the form:

Regraduating beliefs once more with this function, $b\rightarrow \phi\circ b$, the unary relationship becomes:

Let’s discuss these tingly sensations that we’re feeling in a moment. It’s a good idea to note that we have just derived that the belief in a negation is a monotonically decreasing function of the original belief. This means that the more an EA believes in a statement, the less they believe in the negation of the statement. Common sense, right?

One final consideration should be made for the bounds of the regraduated belief function. To find bounds we should consider statements that are purposefully complicated such as $s=s\land s$. We denote maximal belief by $b_T$, and apply the multiplication rule, $b(s|H) = b(s\land s|H) = b(s|H)b(s|s,H)=b(s|H)b_T$. Since an EAs belief in a statement that is part of their history is maximal, this implies that $b_T =1$. Furthermore, the negation of a maximal belief is minimal due to the monotonicity of the negation relationship. Denoting minimal belief as $b_F$ one can quickly show that $b_F = 0$. We have then that beliefs are real numbers in the interval $[0,1]$

To the astute reader the regraduated belief function satisfies two rules which are identical to those found in probability theory: Bayes’ Rule and the Normalization Condition:

Seeing this coincidence prompted Cox to wonder, as many have, on the nature of probability. The predominant view of what probability is stems from the frequentist school of thought. Probabilities are frequencies. When I say that a coin has a probability of landing heads of .5, one usually explains this by discussing an experiment. A coin is tossed many times (or an ensemble of identical coins are all tossed at once) and the number of heads is counted and divided by the total number of throws. The resulting number is not necessarily equal to .5, but it’s probably close, and the frequentist will then tell you that if you just performed the experiment an infinite number of times then the frequency of heads would approach one half.

That’s cute, but then a planetologist tells you that the probability of life on Mars is 42%. Do you envision an experiment where a multitude of Universes are created and the planetologist counts the number of them that have a Mars with life in them, and then divides by the total number of Universes that were created? Let’s not forget they have to keep making Universes forever if we want to apply the frequentist definition of probability. Clearly this interpretation cannot handle such a statement.

An opposing school of thought on the matter is that of Bayes. The Bayesian interpretation of probability eschews the use of frequencies, and casts probabilities as parts of the state of knowledge. Here at last we see the connection between the epistemological theory of belief developed by Cox( read his original 1946 paper Probability, Frequency, and Reasonable Expectation) and the Bayesian school of thought. Since belief obeys the same rules as probability, it is no leap to conjecture that $p=b$.

Probabilities ARE beliefs.

When the planetologist tells you that there is a 42% probability of life on Mars, they are telling you, based on their past experiences (education, research, exposure to popular culture, etc. ), how strongly they believe life exists on Mars. Their background is suited for them to have a well thought out belief, and so their statement has weight. Meteorologists do the exact same thing with the weather, which is why sometimes you’ll notice that different weather sites have slightly different probabilities for future weather patterns. These differences reflect the differences in belief that the meteorologists (Or I should say the models they’re using to analyze meteorological data) have in predicting the future based on the data that they have interacted with.

What has been done here is an exposition on the epistemic foundations of the Bayesian interpretation of probability theory. By grounding the interpretation of probability in an epistemic theory, we can now move forward with our main investigation of what information in. The tools that have been developed will help us in this journey, since now we see how an epistemic agent has an internal state that is defined by a belief function on the Realm of Discourse. This belief function creates a painting of the world which influences how the EA behaves when novel experiences are had. Beliefs can be analyzed via the rules of probability, and, in particular, how they change will lead us to an answer to the question What is Information?

## 4 thoughts on “Part I: Belief and Probability”

1. keaswaran says:

Really good stuff. We should talk about this more. Are you ever in Boston? I’ll be there for a few days in mid March.

Most significant – look up Joe Halpern’s paper “A Counterexample to Theorems of Cox and Fine”. It turns out that Cox’s assumptions don’t entail associativity unless you assume that the set of values of b is dense in its range (and in particular, this means that there must be infinitely many atomic sentences in the language). I happen to think there are good reasons to add this assumption, but it is a further assumption that tends to get left out in many presentations (and the gap in the proof of associativity wasn’t really noticed for about 40 years).

Second, I think that the axioms you call “common sense” and “consistency” are much more substantial than you suggest. It would certainly be nice to satisfy them, but I think we need a deeper argument than that. (I think Dennis Lindley, “Scoring Rules and the Inevitability of Probability” gives the best justification, and he does so in a way that involves the relation of belief to truth, which I think is much more satisfying than just allowing belief to float entirely free of truth! I’m arguing that Lindley’s theorem is the most relevant thing here in Chapter 4 of the book I’m working on.)

Third, the formulation you give at the beginning of connectives as arbitrary functions from the language to itself doesn’t actually give you a notion of atomic sentences. Even if we restrict to the case where the language together with the connectives forms a Boolean ring, there’s in a sense a “choice of coordinates” we can make. For instance, the Boolean ring freely generated by elements A and B is also generated by either one of them together with C=((A&B)v(~A&~B)). (In this alternate set of generators, B=((A&C)v(~A&~C)).)

1. Hey Kenny, thanks for the read!
I was just in Boston for a Faerie Gathering this past week, and I do try to get out there on occasion. We should hang out and catch up.

I’ve breezed through Halpern’s paper in the past, but haven’t taken the time to fully digest his counterexample. I’ve always implicitly assumed that the set of atomic sentences is not finite, so I’m perfectly ok with the range of belief being dense.

As much as I would like to take the credit for calling those two axioms “common sense” and “consistency”, I can’t. I’m content with justifying the relationships f and g from a pragmatic point of view, though I can see why alternate approaches using value scores related to truth would give the feeling of a deeper understanding. I’d love to see a synthesis of epistemology and ontology, and feel like this may be an excellent place to begin. Physics aside, I’m just happy I’ve discovered this bedrock that people have been working on for years.

Gahhh! I was hoping most would gloss over the sentence where I imply the existence of atomic sentences. I wrote the post in a way that implied a unique set, $S_0$, but specifically didn’t say that they were unique. Once again, I’m fine with language being flexible enough to allow for a “change of basis” in what one means by the atomic sentences. In the end it is the entire RoD that matters, and not the set that generates it. As long as the belief function is consistent over the logical connectives, the particular choice of generators shouldn’t effect the state of an EA. Though that does make me wonder what to do with the extra “coordinate invariance” freedom floating around?

Hit me up on FB and I’ll send you my number (same one from Berkeley if you still have it), and we can chat more.