Rock It 《ML》難察因果!!

當人們已從『線性非時變系統理論』 LTI 中知道『因果關係』並非空洞無物!

※ 參讀︰

Causality

Causality (also referred to as causation,[1] or cause and effect) is what connects one process (the cause) with another process or state (the effect),[citation needed] where the first is partly responsible for the second, and the second is partly dependent on the first. In general, a process has many causes,[2] which are said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Causality is metaphysically prior to notions of time and space.[3][4]

Causality is an abstraction that indicates how the world progresses,[citation needed] so basic a concept that it is more apt as an explanation of other concepts of progression than as something to be explained by others more basic. The concept is like those of agency and efficacy. For this reason, a leap of intuition may be needed to grasp it.[5] Accordingly, causality is implicit in the logic and structure of ordinary language.[6]

In the English language, as distinct from Aristotle’s own language, Aristotelian philosophy uses the word “cause” to mean “explanation” or “answer to a why question”, including Aristotle‘s material, formal, efficient, and final “causes”; then the “cause” is the explanans for the explanandum. In this case, failure to recognize that different kinds of “cause” are being considered can lead to futile debate. Of Aristotle’s four explanatory modes, the one nearest to the concerns of the present article is the “efficient” one.

The topic of causality remains a staple in contemporary philosophy.

Theories

Counterfactual theories

Subjunctive conditionals are familiar from ordinary language. They are of the form, if A were the case, then B would be the case, or if A had been the case, then B would have been the case. Counterfactual conditionals are specifically subjunctive conditionals whose antecedents are in fact false, hence the name. However the term used technically may apply to conditionals with true antecedents as well.

Psychological research shows that people’s thoughts about the causal relationships between events influences their judgments of the plausibility of counterfactual alternatives, and conversely, their counterfactual thinking about how a situation could have turned out differently changes their judgments of the causal role of events and agents. Nonetheless, their identification of the cause of an event, and their counterfactual thought about how the event could have turned out differently do not always coincide.[19] People distinguish between various sorts of causes, e.g., strong and weak causes.[20] Research in the psychology of reasoning shows that people make different sorts of inferences from different sorts of causes, as found in the fields of cognitive linguistics[21] and accident analysis[22][23] for example.

In the philosophical literature, the suggestion that causation is to be defined in terms of a counterfactual relation is made by the 18th-century Scottish philosopher David Hume. Hume remarks that we may define the relation of cause and effect such that “where, if the first object had not been, the second never had existed.”[24]

More full-fledged analysis of causation in terms of counterfactual conditionals only came in the 20th century after development of the possible world semantics for the evaluation of counterfactual conditionals. In his 1973 paper “Causation,” David Lewis proposed the following definition of the notion of causal dependence:[25]

An event E causally depends on C if, and only if, (i) if C had occurred, then E would have occurred, and (ii) if C had not occurred, then E would not have occurred.

Causation is then defined as a chain of causal dependence. That is, C causes E if and only if there exists a sequence of events C, D1, D2, … Dk, E such that each event in the sequence depends on the previous.

Note that the analysis does not purport to explain how we make causal judgements or how we reason about causation, but rather to give a metaphysical account of what it is for there to be a causal relation between some pair of events. If correct, the analysis has the power to explain certain features of causation. Knowing that causation is a matter of counterfactual dependence, we may reflect on the nature of counterfactual dependence to account for the nature of causation. For example, in his paper “Counterfactual Dependence and Time’s Arrow,” Lewis sought to account for the time-directedness of counterfactual dependence in terms of the semantics of the counterfactual conditional.[26] If correct, this theory can serve to explain a fundamental part of our experience, which is that we can only causally affect the future but not the past.

Probabilistic causation

Interpreting causation as a deterministic relation means that if A causes B, then A must always be followed by B. In this sense, war does not cause deaths, nor does smoking cause cancer or emphysema. As a result, many turn to a notion of probabilistic causation. Informally, A (“The person is a smoker”) probabilistically causes B (“The person has now or will have cancer at some time in the future”), if the information that A occurred increases the likelihood of Bs occurrence. Formally, P{B|A}≥ P{B} where P{B|A} is the conditional probability that B will occur given the information that A occurred, and P{B}is the probability that B will occur having no knowledge whether A did or did not occur. This intuitive condition is not adequate as a definition for probabilistic causation because of its being too general and thus not meeting our intuitive notion of cause and effect. For example, if A denotes the event “The person is a smoker,” B denotes the event “The person now has or will have cancer at some time in the future” and C denotes the event “The person now has or will have emphysema some time in the future,” then the following three relationships hold: P{B|A} ≥ P{B}, P{C|A} ≥ P{C} and P{B|C} ≥ P{B}. The last relationship states that knowing that the person has emphysema increases the likelihood that he will have cancer. The reason for this is that having the information that the person has emphysema increases the likelihood that the person is a smoker, thus indirectly increasing the likelihood that the person will have cancer. However, we would not want to conclude that having emphysema causes cancer. Thus, we need additional conditions such as temporal relationship of A to B and a rational explanation as to the mechanism of action. It is hard to quantify this last requirement and thus different authors prefer somewhat different definitions.[citation needed]

Causal calculus

When experimental interventions are infeasible or illegal, the derivation of cause effect relationship from observational studies must rest on some qualitative theoretical assumptions, for example, that symptoms do not cause diseases, usually expressed in the form of missing arrows in causal graphs such as Bayesian networks or path diagrams. The theory underlying these derivations relies on the distinction between conditional probabilities, as in \displaystyle P(cancer|smoking) , and interventional probabilities, as in \displaystyle P(cancer|do(smoking)) . The former reads: “the probability of finding cancer in a person known to smoke, having started, unforced by the experimenter, to do so at an unspecified time in the past”, while the latter reads: “the probability of finding cancer in a person forced by the experimenter to smoke at a specified time in the past”. The former is a statistical notion that can be estimated by observation with negligible intervention by the experimenter, while the latter is a causal notion which is estimated in an experiment with an important controlled randomized intervention. It is specifically characteristic of quantal phenomena that observations defined by incompatible variables always involve important intervention by the experimenter, as described quantitatively by the Heisenberg uncertainty principle.[vague] In classical thermodynamics, processes are initiated by interventions called thermodynamic operations. In other branches of science, for example astronomy, the experimenter can often observe with negligible intervention.

The theory of “causal calculus”[27] permits one to infer interventional probabilities from conditional probabilities in causal Bayesian networks with unmeasured variables. One very practical result of this theory is the characterization of confounding variables, namely, a sufficient set of variables that, if adjusted for, would yield the correct causal effect between variables of interest. It can be shown that a sufficient set for estimating the causal effect of \displaystyle X on \displaystyle Y is any set of non-descendants of \displaystyle X that \displaystyle d-separate \displaystyle X from \displaystyle Y after removing all arrows emanating from \displaystyle X . This criterion, called “backdoor”, provides a mathematical definition of “confounding” and helps researchers identify accessible sets of variables worthy of measurement.

………

 

也已經明白辛普森悖論現象︰

Simpson’s paradox

Simpson’s paradox for quantitative data: a positive trend ( ,  ) appears for two separate groups, whereas a negative trend ( ) appears when the groups are combined.

Simpson’s paradox (or Simpson’s reversal, Yule–Simpson effect, amalgamation paradox, or reversal paradox[1]), is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.

This result is often encountered in social-science and medical-science statistics[2][3][4] and is particularly problematic when frequency data is unduly given causal interpretations.[5] The paradoxical elements disappear when causal relations are brought into consideration.[6] It has been used to try to inform the non-specialist or public audience about the kind of misleading results mis-applied statistics can generate.[7][8] Martin Gardner wrote a popular account of Simpson’s paradox in his March 1976 Mathematical Games column in Scientific American.[9]

Edward H. Simpson first described this phenomenon in a technical paper in 1951,[10] but the statisticians Karl Pearson et al., in 1899,[11] and Udny Yule, in 1903,[12] had mentioned similar effects earlier. The name Simpson’s paradox was introduced by Colin R. Blyth in 1972.[13]

………

 

焉能不想了解『機器心智』能否超越人類哩☺

What are the limits of machine learning, and is there a possibility to make robots learn languages?

Sridhar Mahadevan, Fellow of AAAI

 

There are plenty of well known limitations of machine learning. These shortcomings are usually associated with specific formal ways of defining ML paradigms.

  1. As the original question mentioned language, let’s begin with a classical result from Gold: the set of context-free languages is not learnable from positive examples. Let’s unpack this theorem and explain its deep impact on linguists like Noam Chomsky. First, the concept of learning here is formalized as “identification in the limit with zero error”. So, imagine that I as a teacher choose a particular context-free language in my head, and you as the learner have to guess what the language is (say by inferring the context-free grammar that generates it). You can ask me for as many strings that are part of the specific CFL (viewed as a set). Identification in the limit means that even if you spend as much computational power as you need and take as much time as you need, there will never come a time when you would have correctly guessed an arbitrary CFL from a potentially infinite series of examples. This result proved by Gold in 1967 (“Language identification in the limit”, Information and Control, vol. 10, pp. 447–474) was as stunning in its impact in machine learning as Gödel’s incompleteness theorem was to computation and logic. It was well known that children by and large only get positive examples of their native language (English, Chinese, Hindi, Hebrew etc.). So, linguists realized that since children learn language by 3 or 4, there must be severe innate constraints on the space of learnable languages. There has been a five decade long search for this so called “universal grammar”. It is still under active research.

……

5.  The last and most recent breakthrough is the work on causal models, principally the work by Pearl in 2000. Here was another blow to the power of statistical learning. What Pearl showed convincingly in his book and many papers is that statistical learning is fundamentally limited. It can not discover causal structures. The simple property that diseases cause symptoms, not the other way around, or that lightning causes thunder, cannot be discovered by any statistical learner, no matter how many layers is present in a deep learning neural network. The fundamental problem is representational: one cannot express causality using probability theory. You need extra probabilistic machinery to discover causality (e.g., Pearl’s do-calculus, Rubin and Neyman’s potential outcomes, or Fisher’s randomization protocols, all of which go beyond traditional statistics).

………