Deduction, Induction, and Abduction, Oh My!

A Tour of Needless Philosophical Terms

Mon 10 August 2015 In #articles

As a scientist, I don't typically hold philosophy in that high regard. It has its uses, but can easily devolve into a word game with no real substance. My basic perspective is that philosophy is "science without data". However, in my reading and listening to debates in religious thought I come across philosophers often, and have to parse their arguments. An interesting collection of these arguments occurs in the Giunta-Dillahunty debate and subsequent appearance of these two on the Dogma Debate show.

During the Dogma Debate episode, the term "abduction" was bandied about comparing methods of inference, including deduction and induction. I was not familiar with the term "abduction", and had to look it up. After much reading, it seems that the term "abductive reasoning" is basically needless - there is only one method of inference, and these other terms are at best subsets. First, we explore the philosopher's definitions of these terms, and then look at the proper way to describe them. ;)

Definitions of Terms

I'm getting the material here from the Stanford Encyclopedia of Philosophy article on Abduction.

Deduction

In deductive inferences, what is inferred is necessarily true if the premises from which it is inferred are true; that is, the truth of the premises guarantees the truth of the conclusion.

A familiar type of example is inferences instantiating the schema

All As are Bs. a is an A. Hence, a is a B.

or, in other words

All men are mortal
Socrates is a man
Therefore, Socrates is mortal.

Induction

Inductive inferences form a somewhat heterogeneous class, but for present purposes they may be characterized as those inferences that are based purely on statistical data, such as observed frequencies of occurrences of a particular feature in a given population. An example of such an inference would be this:

96 per cent of the Flemish college students speak both Dutch and French.

Louise is a Flemish college student.

Hence, Louise speaks both Dutch and French.

However, the relevant statistical information may also be more vaguely given, as in the premise, “Most people living in Chelsea are rich.” (There is much discussion about whether the conclusion of an inductive argument can be stated in purely qualitative terms or whether it should be a quantitative one — for instance, that it holds with a probability of .96 that Louise speaks both Dutch and French — or whether it can sometimes be stated in qualitative terms — for instance, if the probability that it is true is high enough — and sometimes not.

The mere fact that an inference is based on statistical data is not enough to classify it as an inductive one. You may have observed many gray elephants and no non-gray ones, and infer from this that all elephants are gray, because that would provide the best explanation for why you have observed so many gray elephants and no non-gray ones. This would be an instance of an abductive inference.

That gets us to the final term, abduction.

Abduction

Abduction or, as it is also often called, Inference to the Best Explanation is a type of inference that assigns special status to explanatory considerations. Most philosophers agree that this type of inference is frequently employed, in some form or other, both in everyday and in scientific reasoning. [....] It suggests that the best way to distinguish between induction and abduction is this: both are ampliative, meaning that the conclusion goes beyond what is (logically) contained in the premises (which is why they are non-necessary inferences), but in abduction there is an implicit or explicit appeal to explanatory considerations, whereas in induction there is not; in induction, there is only an appeal to observed frequencies or statistics. (I emphasize “only,” because in abduction there may also be an appeal to frequencies or statistics, as the example about the elephants exhibits.)

A couple of examples of abductive reasoning

You happen to know that Tim and Harry have recently had a terrible row that ended their friendship. Now someone tells you that she just saw Tim and Harry jogging together. The best explanation for this that you can think of is that they made up. You conclude that they are friends again.

and another

One morning you enter the kitchen to find a plate and cup on the table, with breadcrumbs and a pat of butter on it, and surrounded by a jar of jam, a pack of sugar, and an empty carton of milk. You conclude that one of your house-mates got up at night to make him- or herself a midnight snack and was too tired to clear the table. This, you think, best explains the scene you are facing. To be sure, it might be that someone burgled the house and took the time to have a bite while on the job, or a house-mate might have arranged the things on the table without having a midnight snack but just to make you believe that someone had a midnight snack. But these hypotheses strike you as providing much more contrived explanations of the data than the one you infer to.

[...]

In these examples, the conclusions do not follow logically from the premises. For instance, it does not follow logically that Tim and Harry are friends again from the premises that they had a terrible row which ended their friendship and that they have just been seen jogging together; it does not even follow, we may suppose, from all the information you have about Tim and Harry. Nor do you have any useful statistical data about friendships, terrible rows, and joggers that might warrant an inference from the information that you have about Tim and Harry to the conclusion that they are friends again, or even to the conclusion that, probably (or with a certain probability), they are friends again. What leads you to the conclusion, and what according to a considerable number of philosophers may also warrant this conclusion, is precisely the fact that Tim and Harry's being friends again would, if true, best explain the fact that they have just been seen jogging together.

Explanatory Virtues

The article continues,

In textbooks on epistemology or the philosophy of science, one often encounters something like the following as a formulation of abduction:

ABD1 Given evidence $E$ and candidate explanations $encoding="application/x-tex">H_1</annotation></semantics></math>$ , $encoding="application/x-tex">\ldots</annotation></semantics></math>$ , $encoding="application/x-tex">H_n</annotation></semantics></math>$ of $E$ , infer the truth of that $encoding="application/x-tex">H_i</annotation></semantics></math>$ which best explains $E$ .

An observation that is frequently made about this rule, and that points to a potential problem for it, is that it presupposes the notions of candidate explanation and best explanation, neither of which has a straightforward interpretation. While some still hope that the former can be spelled out in purely logical, or at least purely formal, terms, it is often said that the latter must appeal to the so-called theoretical virtues, like simplicity, generality, and coherence with well-established theories; the best explanation would then be the hypothesis which, on balance, does best with respect to these virtues.

The article doesn't define these "virtues", and googling around isn't much clearer. A list from Richard Carrier's book Proving History is:

Plausibility - "conform to the expectations set by our background knowledge"
Ad Hocness or Simplicity - "it must include fewer new suppositions"
Explanatory power - "it must make the observation statements it implies more probable than any other."
Explanatory fitness - "must not contradict any evidence or well-established beliefs"
Explanatory scope - "it must imply a greater variety of observation statements"

Where does this all fit together?

E. T. Jaynes, in his great book on probability, starts with this example:

Suppose some dark night a policeman walks down a street, apparently deserted; but suddenly he hears a burglar alarm, looks across the street, and sees a jewelry store with a broken window. Then a gentleman wearing a mask comes crawling out through the broken window, carrying a bag which turns out to be full of expensive jewelry. The policeman doesn't hesitate at all in deciding that this gentleman is dishonest. But by what reasoning process does he arrive at this conclusion?

A moment's thought makes it clear that our policeman's conclusion was not a logical deduction from the evidence; for there may have been a perfectly innocent explanation for everything. It might be, for example, that this gentleman was the owner of the jewelry store and he was coming home from a masquerade party, and didn’t have the key with him. But just as he walked by his store a passing truck threw a stone through the window; and he was only protecting his own property. Now while the policeman's reasoning process was not logical deduction, we will grant that it had a certain degree of validity. The evidence did not make the gentleman's dishonesty certain, but it did make it extremely plausible. This is an example of a kind of reasoning in which we have all become more or less proficient, necessarily, long before studying mathematical theories. We are hardly able to get through one waking hour without facing some situation (e.g. will it rain or won't it?) where we do not have enough information to permit deductive reasoning; but still we must decide immediately what to do.

This sounds a lot like the scenarios outlined in the Stanford Encyclopedia of Philosophy article on Abduction, yet the word abduction is not found in any of the pages of Jaynes. Deduction and induction are both mentioned, yet abduction is not - why? It's because abduction is just a special case of induction, and a not a very interesting one at that. Deduction is also a special case of induction, but is worthy of the distinction.

Induction vs Deduction

The process of induction, or inference in general, is simply the application of the rules of probability. These are derived from a set of axioms,

Degrees of Plausibility are represented by real numbers.
Qualitative Correspondence with common sense. (aka consistency with deductive logic)
Consistency
1. If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.
2. One always takes into account all of the evidence it has relevant to a question. One does not arbitrarily ignore some of the information, basing its conclusions only on what remains. In other words, one should be completely non-ideological
3. One always represents equivalent states of knowledge by equivalent plausibility assignments. That is, if in two problems one's state of knowledge is the same (except perhaps for the labeling of the propositions), then one must assign the same plausibilities in both.

Deduction is simply this process applied exclusively to true/false propositions - essentially degrees of plausibility of 0 or 1. In this way, the total certainty of mathematical and philosophical proofs are a subset of the general inductive framework known as probability theory.

The Rules of Probability

Mathematically, the axioms above lead directly to a few simple quantitative rules. A very nice derivation is in Tom Loredo's article From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics.

The Definition Rule

$P (A)$ is a number between 0 and 1, representing the strength of belief in a statement, $A$ .

The Negation Rule \begin{eqnarray} P(A) + P({\rm\bf not}\, A) = 1 \end{eqnarray}
The Sum Rule

\begin{eqnarray} P(A \,{\rm\bf or}\, B) = P(A) + P(B) - P(A \,{\rm\bf and}\, B) \end{eqnarray}

The Conjunction Rule

\begin{eqnarray} P(A \,{\rm\bf and}\, B) = P(B|A)P(A) \end{eqnarray}

Bayes Rule (derived from the Conjunction Rule)

\begin{eqnarray} P(A|B) = \frac{P(B|A)P(A)}{P(B)} \end{eqnarray}

These are then applied to all cases.

Abduction mapped to Probability

The basic idea of abduction is to find the "best explanation" given the data we have. This reads like the following,

\begin{eqnarray} P(\mbox{explanation 1}|\mbox{data})\ P(\mbox{explanation 2}|\mbox{data})\ P(\mbox{explanation 3}|\mbox{data})\ \vdots \end{eqnarray} or with shorter notation \begin{eqnarray} P(E_1|\mbox{data})\ P(E_2|\mbox{data})\ P(E_3|\mbox{data})\ \vdots \end{eqnarray}

Then we find the one with the largest probability. Setting one of these terms up with Bayes Rule, we can see where the explanatory virtues come in. \begin{eqnarray} P(\mbox{explanation 1}|\mbox{data})=\frac{P(\mbox{data}|\mbox{explanation 1})\times P(\mbox{explanation 1})}{P(\mbox{data})} \end{eqnarray} where the denominator includes terms for all other explanations, again written with shorter notation for convenience

\begin{eqnarray} P(E_1|\mbox{data})=\frac{P(\mbox{data}|E_1)\times P(E_1)}{P(\mbox{data}|E_1)\times P(E_1) + P(\mbox{data}|E_2)\times P(E_2)+\cdots} \end{eqnarray}

For any of the explanations, we have the following

Plausibility - "conform to the expectations set by our background knowledge" - high prior probability, $mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mi>l</mi><mi>a</mi><mi>n</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mn>1</mn></mrow><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(\mbox{explanation 1})</annotation></semantics></math>$ in the numerator.
Ad Hocness or Simplicity - "it must include fewer new suppositions" (see below on simplicity)
Explanatory power - "it must make the observation statements it implies more probable than any other." - high likelihood, $mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi></mrow><mi mathvariant="normal">∣</mi><mstyle mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mi>l</mi><mi>a</mi><mi>n</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mn>1</mn></mrow><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(\mbox{data}|\mbox{explanation 1})</annotation></semantics></math>$ especially compared to others. This appears both in the numerator and denominator of Bayes Rule.
Explanatory fitness - "must not contradict any evidence or well-established beliefs" - must not have a low likelihood, $mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi></mrow><mi mathvariant="normal">∣</mi><mstyle mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mi>l</mi><mi>a</mi><mi>n</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mn>1</mn></mrow><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(\mbox{data}|\mbox{explanation 1})</annotation></semantics></math>$ , in the numerator (and one term in the denominator).
Explanatory scope - "it must imply a greater variety of observation statements" - again, high likelihood in the numerator (and one term in the denominator), $mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi></mrow><mi mathvariant="normal">∣</mi><mstyle mathcolor="#cc0000"><mtext>\mbox</mtext></mstyle><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mi>l</mi><mi>a</mi><mi>n</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mn>1</mn></mrow><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(\mbox{data}|\mbox{explanation 1})</annotation></semantics></math>$ - it is higher for more data explained with the same idea.

What is up with simplicity?

Perhaps the concept in the process of inference which is most often mistaken, especially with apologetic arguments, is the notion of simplicity. In the debate, Blake Giunta argues that God has very few properties, and is thus simple. Simple typically means higher prior probability. However, it isn't the number of components that is important, but their flexibility. One can explain anything by saying "magic did it" using only one component (magic). This is simple in the sense of number of components, but not simple in the probabilistic sense. Why is that? It's because "magic" has a high flexibility. For a process, this can take the form of different mechanisms. For a parameter, it can take the form of different numerical values. In Loredo's article, it is phrased as

Crudely speaking a complicated model can explain anything; thus, its prior predictive probability for any particular outcome is small, because the predictive probability is spread out more or less evenly among the many possible outcomes. But a simpler model is more constrained and limited in its ability to explain or fit data. As a result, its predictive distribution is concentrated on a subset of the possible outcomes.

In Bayesian probability analysis, there is an automatic penalty for models with larger parameter ranges compared to models with narrow ranges (see Mackay here for one treatment). Blake Giunta's "simple" definition of God is anything but - it is so ill-specified as to be consistent with nearly any result in the world. The "fewer new suppositions" definition of Ad Hocness may work, but it doesn't work for the virtue of Simplicity. Bayesian probability analysis captures both.

Final Thoughts

As far as I can tell, the term abduction is an unnecessary term. It seems to me to be just a renaming of induction and doesn't add anything useful to it. The explanatory virtues are themselves somewhat ad hoc and not properly specified, and where they are they are a trivial consequence of the well-specified process of induction. I had never heard of abduction, except in the realm of Christian apologetics, which makes me wonder what purpose it serves there that properly doing induction doesn't. Perhaps it lets one distance oneself from the extremely low prior probability of the claims made, and focus on fancy sounding terms like "explanatory power" and "ad hocness".