Recently I came by an article on Blue Nation Review claiming that Hillary Clinton is the most truthful of the candidates based on statistics from Politifact‘s Truth-O-Meter. Sure, she has the largest percentage of True statements, but a measure of truthfulness should draw on the full distribution of statements that a candidate has spoken, not just a particular subset. Clinton has also spoken partial truths, half truths, falsehoods, and racked up two cases of “Liar, liar, pants on fire!“. What other information is lurking in the full distributions that is lost in BNR’s simplistic analysis? Being enamored by information theory and epistemology, I decided to look at Clinton, Cruz, Kasich, Sanders, and Trump‘s Truth-o-Meter’s a little closer.

To understand this analysis, imagine an idealized thinking being, an epistemic agent(EA), which is initially completely ignorant of the truthfulness of candidates: the EA has no preference for believing what a candidate says when they say it. The EA is then exposed to statements by the candidates, and uses its fact checking facility (the journalists at Politifact) to inform it of the truthfulness of those statement. As it learns more its preferences shift. Analyzing more statements moves the EA from its tabula rasa starting state to a biased one where there is a preference to believe or not believe a candidate. The more statements the EA hears, the less the belief function changes, so that after enough statements, the EA has a pretty definite belief in the truthfulness of the candidates. We begin then with an ignorant EA, which opens up the Politifact site and begins reading the analyzed statements made by the candidates in the 2016 Primary election.

Number of fact checks that have been done on each candidate by Politifact’s Truth-O-Meter Journalists.

The number of statements analyzed for each candidate varies(above). Kasich and Sanders have been the least scrutinized (61 and 75, respectively), while Hilary has been the most scrutinized (174 statements) by the site. Because of the differing amounts of data going into informing the EA, its belief in Clinton‘s truthfulness will be the least malleable, while in the case of Kasich it will be most malleable. A new datum from Clinton will not sway the EA much in terms of what it thinks of her as it would for both Kasich and Sanders. Cruz and Trump are in between these extremes. We call the EA’s initial beliefs about the candidates priors, and the belief’s after assimilating all the data on the candidate Truth-O-Meters the posteriors. You can read more about how epistemic states of knowledge change because of experiences here and here.

The posterior belief coming from an uninformative prior matches the normalized frequencies. Truth-O-Meter uses a six point scale, but in common speech truthfulness is a continuous variable due to its dependence on not just facts, but context, intensity of language, etc. The closer to 0, the more false a statement is; the closer to 1, the more true it is (recall that these terms are always with respect to the fact checking facility of the EA). Interpolation to the continuum belief distribution was accomplished by using a cubic Hermite spline, as described here. This choice does an excellent job preserving the shape of the discrete frequency distribution. Below we compare the discrete frequencies with the credence/belief distributions of the EA. The area under a curve represents the probability that a statement will have a particular range of truthfulness; the total area under any of these curves is unity.

The frequency vs. truthfulness of the repertoire of statements said by each candidate.  Truthfulness of 0 corresponds to Truth-O-Meter’s “Liar, Liar, Pants on Fire” category, and 1 is “True.”
The continuous credence(belief) function of an epistemic agent for each of the candidates. These are interpolated from the frequencies above by using a cubic spline.

These curves are interesting because they show what the beliefs of an epistemic agent look like under the assumption that it is judging the candidates solely on what they have said, and NOTHING else. Of course real people are not EAs, and past experiences end up coloring the judgements we make concerning new experiences. Nonetheless, an EA’s credence function might be a good indicator of what someone who is not very informed about the history of the candidates ends up thinking about them after reading through all the statements analyzed by Politifact.

The cumulative distribution functions (cdf) derived from the credence functions for each candidate. The shaded region is to emphasize where the median lies for each of the candidates.
Median truthfulness of each candidate. Clearly the candidates break up into two groups based on this measure.

What value of truthfulness does an EA associate with half the things that a candidate says? The answer is found by looking at the cumulative distribution functions (above). CDFs tell us how manifold statements are that come out of a candidates mouth below some value of truthfulness. We observe that the candidates cluster into groups: Clinton, Sanders, and Kasich lay in one, while Cruz and Trump occupy others. The truthfulness at which the EA believes that half of a candidate’s statements are more truthful than, and half are less truthful than, is the median.

Thus far, it is safe to say that both Trump and Cruz do not inspire much faith in the epistemic agent. The other three candidates all seem to be on equal footing, which means we need a better way to distinguish what the EA actually believes about them.

A natural next step would be to look at the moments of the credence distributions, however given the finite domain and presence of significant tails, standard interpretations of deviation, skewness, kurtosis,etc. would fail. Fortunately, rather than applying these ad hoc statistical measures, we can get a good grasp of the shape of the distributions by examining their Shannon entropy:

Shannon entropy is an informational measure that quantifies the amount of uncertainty in a distribution. Credence distributions with low entropy contain a lot of information in them, meaning that an EA is less uncertain about the truthfulness of future statements. Our EA started off with beliefs that had maximal entropy: those based on ignorance. By interacting with the candidates, the EA has acquired information that has biased its beliefs so that it can make more informed judgements. Another way to say this is that the negentropy (the difference between maximal and actual entropy) is describing how an EA views the consistency of a candidate with regards to truthfulness.

There is a problem with this line of thinking, however: It only applies to median truthfulness values that are close to a half. The reason for this is that in order to get a high(or low) median truthfulness, the distribution must necessarily be skewed and hence have a lower entropy. One can estimate the maximum possible entropy of a distribution with truthfulness \tau by considering a piecewise constant distribution. The resulting minimal negentropy can then be written in a closed analytical form: -\log 2\sqrt{\tau(1-\tau)}. The difference between the actual negentropy and this minimal value then gives a good measure of the consistency of a candidate in the eyes of the EA.

We plot both truthfulness (as measured by the median), versus consistency  (as measured by admissible negentropy), and see more of the differences between how an EA reasons about the candidates. Kasich falls out of the company of Clinton and Sanders; the EA thinks he’s nearly three times less consistent with his statements than Clinton. Cruz‘s consistency is on par with both Sanders and Clinton, while Trump beats everyone by a yuge margin in terms of how consistent he is about not being truthful. Sanders and Clinton appear grouped together, but the axes are a bit misleading. Sanders leads in truthfulness by about 3%, and in consistency by about 18% over Clinton.

Consistency, as measured by entropic deviation from maximum, versus Truthfulness, as measured by the median of the continuum credence functions. Clustering is evident.

Using information theoretic tools, one can examine how completely rational epistemic agents construct their beliefs out of experiences. In this toy study we saw that the technique can be used to analyze how journalistic derived data concerning the factuality of primary candidates’ statements during a campaign can shed light on the truthfulness and consistency of those candidates. It also shows us that running with simple ad hoc statistics (cough cough) can lead to results that are no longer valid in a more detailed analysis. Speaking of more detail… there are a million ways (Information geometry of the candidate public perception manifold, time dependent Bayesian updating of EA states, testing the effects of nonuniform priors due to coupling between socioeconomic variables, etc.) to make this analysis more complete, but there’re only so many hours in a day. If you have any questions about my methods please feel free to leave a comment.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s