Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Inferential Statistics: Controlled Experiments, Hypothesis Testing, and the z Distribution, Study notes of Statistics

An introduction to inferential statistics, focusing on hypothesis testing in controlled experiments. It covers the importance of the null hypothesis, nondirectional and directional alternative hypotheses, and the four possible outcomes in hypothesis testing. The document also touches upon the basics of probability theory and probability distributions.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

virtualplayer
virtualplayer 🇬🇧

4.2

(12)

302 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Inferential Statistics
The Controlled Experiment,
Hypothesis Testing,
and the zDistribution
5
Chapter 5 Goals
Understand hypothesis testing in controlled experiments
Understand why the null hypothesis is usually a conservative beginning
Understand nondirectional and directional alternative hypotheses
and their advantages and disadvantages
Learn the four possible outcomes in hypothesis testing
Learn the difference between significant and nonsignificant statisti-
cal findings
Learn the fine art of baloney detection
Learn again why experimental designs are more important than the
statistical analyses
Learn the basics of probability theory, some theorems, and proba-
bility distributions
119
Recently, when I was shopping at the grocery store, I became aware that
music was softly playing throughout the store (in this case, the ancient
rock group Strawberry Alarm Clock’s “Incense and Peppermint”). In a curi-
ous mood, I asked the store manager, “Why music?” and “why this type
of music?” In a very serious manner, he told me that “studies” had shown
people buy more groceries listening to this type of music. Perhaps more
05-Coolidge-4857.qxd 1/2/2006 6:52 PM Page 119
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Inferential Statistics: Controlled Experiments, Hypothesis Testing, and the z Distribution and more Study notes Statistics in PDF only on Docsity!

Inferential Statistics

The Controlled Experiment,

Hypothesis Testing,

and the z Distribution

Chapter 5 Goals

  • Understand hypothesis testing in controlled experiments
  • Understand why the null hypothesis is usually a conservative beginning
  • Understand nondirectional and directional alternative hypotheses and their advantages and disadvantages
  • Learn the four possible outcomes in hypothesis testing
  • Learn the difference between significant and nonsignificant statisti- cal findings
  • Learn the fine art of baloney detection
  • Learn again why experimental designs are more important than the statistical analyses
  • Learn the basics of probability theory, some theorems, and proba- bility distributions

119

R

ecently, when I was shopping at the grocery store, I became aware that music was softly playing throughout the store (in this case, the ancient rock group Strawberry Alarm Clock’s “Incense and Peppermint”). In a curi- ous mood, I asked the store manager, “Why music?” and “why this type of music?” In a very serious manner, he told me that “studies” had shown people buy more groceries listening to this type of music. Perhaps more

businesses would stay in business if they were more skeptical and fell less for scams that promise “a buying atmosphere.” In this chapter on inferential statistics, you will learn how to test hypotheses such as “music makes people buy more,” or “HIV does not cause AIDS,” or “moving one’s eyes back and forth helps to forget traumatic events.” Inferential statistics is concerned with making conclusions about popula- tions from smaller samples drawn from the population. In descriptive statis- tics, we were primarily concerned with simple descriptions of numbers by graphs, tables, and parameters that summarized sets of numbers such as the mean and standard deviation. In inferential statistics, our primary concern will be testing hypotheses on samples and hoping that these hypotheses, if true of the sample, will be true and generalize to the population. Remember that a population is defined as the mostly hypothetical group to whom we wish to generalize. The population is hypothetical for two reasons: First, we will rarely, if ever, have the time or money, or it will not be feasible to test everyone in the population. Second, we will attempt to generalize from a current sample to future members of the population. For example, if we were able to determine a complete cure for AIDS, we would hope that the cure would not only work for the current population of AIDS patients in the world but also any future AIDS patients. The most common research designs in inferential statistics are actually very simple: We will test whether two different variables are related to each other (through correlation and the chi-square test) or whether two or more groups treated differently will have different means on a response (or out- come) variable (through t tests and analyses of variance). Examples of whether two different variables are related to each other are plentiful throughout science and its many disciplines. We may wish to know whether cigarettes are related to cancer, whether violent crime rates are related to crack cocaine use, whether breast implants are related to immunodeficiency disease, whether twins’ IQs are more highly related than siblings’ IQs, and so on. Note that finding a relationship between two variables does not mean that the two variables are causally related. However, sometimes determin- ing whether relationships exist between two variables, such as smoking and rates of lung cancer, may give up clues that allow us to set up controlled experiments where causality may be determined. Controlled experiments, typically with two or more groups treated differently, are the most powerful experimental designs in all of statistics. Whereas correlational designs, which determine whether two variables are related, are very common and useful, they pale in comparison to the power of a well-designed experiment with two or more groups. It is perhaps unfortunate (maybe statisticians should hire a public rela- tions firm) that the most powerful experimental design is simply called a controlled experiment. There are other theories in science that have much better names, such as the big bang theory, which attempts to explain the ori- gins of the universe. Nonetheless, for the present, we are stuck with the name

120 STATISTICS: A GENTLE INTRODUCTION

hypothesis will be the opposite of what the scientist believes or hopes to be true. The prior research hunch or belief about what is true is called the alter- native hypothesis (abbreviated Ha). As noted earlier in the book, science must work slowly and conservatively. The repercussions of poorly performed science are deadly or even worse. Thus, the null hypothesis is usually a safe, conservative position, which says that there is no relationship between the variables or, in the case of the drug experiment, that the drug does not affect the experimental group differently on the dependent variable compared to the control group.

Hypothesis Testing: The Big Decision ___________________

All experiments begin with the statement of the null and alternative hypothe- ses (at least in the experimenter’s mind, but not usually in the published arti- cle). However, the null hypothesis is like a default position: We will retain the null hypothesis (or we will fail to reject the null hypothesis) unless our statistical test tells us to do otherwise. If there is no statistical difference between the two means, then the null hypothesis is retained. If the statistical test determines that there is a difference between the means (beyond just chance differences), then the null hypothesis will be rejected. In summary, when a statistical test is employed, one of two possible deci- sions must be made: (a) retain the null hypothesis, which means that there are no differences between the two means other than chance differences, or (b) reject the null hypothesis, which means that the means are different from each other well beyond what would be expected by chance.

How the Big Decision Is Made:

Back to the z Distribution ____________________________

A statistical test of the classic two-group experiment will analyze the difference between the two means to determine whether the observed difference could have occurred by chance alone. The z distribution, or a similar distribution, will be used to make the decision to retain or reject the null hypothesis. To appreciate how this occurs, imagine a large vat of 10,000 ping-pong balls (see Figure 5.1). Let us suppose that each ping-pong ball has a z score written on it. Each z score on a ball occurs with the same frequency as in the z distribution. Remember that the z distribution reveals that exactly 68.26% of the 10, balls will fall within ±1 standard deviation of the mean z score of 0.00. This means that 6,826 of the 10,000 ping-pong balls will have numbers ranging from −1.00 to +1.00. Also, 95.44% of all the balls will fall within ±2 stan- dard deviations of the mean. Therefore, 9,544 of the ping-pong balls will range between −2.00 and +2.00. Finally, we know that 9,974 ping-pong balls will be numbered from −3.00 to +3.00.

122 STATISTICS: A GENTLE INTRODUCTION

Now, let us play a game of chance. If blindfolded and I dig into the vat of balls and pull out one ball in random fashion, what is the probability that it will be a number between −1.00 and +1.00? If I bet you $20 that the number would be greater than +1.00 or less than −1.00, would you take my bet? You should take my bet because the probability that the ball has a number between −1.00 and +1.00 is 68.26%. Therefore, you would roughly have a 68% chance of winning, and I would only have a 32% chance of winning. How about if we up the stakes? I will bet you $100 that a z score on a randomly chosen ball is greater than +2.00 or less than −2.00. Would you take this bet? You should (and quickly) because now there is a 95.44% you would win and less than a 5% chance that I would win. What would happen if we finally decided to play the game officially, and I bet a randomly chosen ball is greater than +3.00 or less than −3.00? You put your money next to my money. A fair and neutral party is chosen to select a ball and is blindfolded. What would be your conclusion if the result- ing ball had a +3.80 on it? There are two possibilities: Either we both have witnessed an extremely unlikely event (only 1 ball out of 10,000 has a +3.80 on it), or something is happening beyond what would be expected by chance alone (namely, that the game is rigged and I am cheating in some unseen way). Now, let us use this knowledge to understand the big decision (retain or reject the null hypothesis). The decision to retain or reject the null hypothe- sis will be tied to the z distribution. Each of the individual subject’s scores in the two-group experiment will be cast into a large and complicated formula, and a single z- like number will result. In part, the size of this single z -like number will be based on the difference between the two groups’ means.

Chapter 5: Inferential Statistics 123

  • 1.00 +.

−.

Three Randomly Chosen Ping-Pong Balls

10,000 Ping-Pong Balls

Figure 5.1 A Vat of 10,000 Ping-Pong Balls, Each With a Single Value of z, Occurring With the Same Frequency as in the z Distribution

Another way of thinking about the two means is whether they were both drawn from the same population distribution (in other words, the treatment did not work to make one sample different from another) or whether the two means came from different populations (because the treatment did work on one group and made its mean much larger or much smaller than the other group’s mean). The alternative hypothesis is often what we hope is true in our experi- ment. The alternative hypothesis is most often stated as

Ha: μ 1 = μ 2

Note that the alternative hypothesis is stated as “Mean 1 does not equal Mean 2.” This is its most common form, and it is called a nondirectional alternative hypothesis. Logically, the “does not equal” sign allows for two possibilities. One possibility is that Mean 1 is greater than Mean 2, and the other is Mean 1 can be less than Mean 2. Because the controlled experiment involves making inferences about popu- lations, the analysis of the experiment involves inferential statistics. Thus, the mean is an essential parameter in both descriptive and inferential statistics.

Nondirectional and Directional

__________________________________ Alternative Hypotheses

An experimenter has a choice between two types of alternative hypothe- ses when hypothesis testing, a nondirectional or a directional alternative hypothesis. A directional alternative hypothesis, in the two-group experi- ment, states the explicit results of the difference between the two means. For example, one alternative hypothesis could be

Ha: μ 1 > μ 2

Here, the experimenter predicts that the mean for Group 1 will be higher than the mean for Group 2. Another possibility is that the experimenter predicts

Ha: μ 1 < μ 2

Here, the experimenter predicts that Mean 1 will be less than Mean 2. In practice, however, most statisticians choose a nondirectional alternative hypothesis. One of the reasons for this is that the nondirectional alternative hypothesis is less influenced by chance. Directional alternative hypotheses, however, are not all bad. They are more sensitive to small but real differ- ences between the two groups’ means. Most statisticians agree that the direc- tional alternative hypothesis should be reserved for situations where the

Chapter 5: Inferential Statistics 125

experimenter is relatively certain of the outcome. It is legitimate to wonder, however, why the experimenter was conducting the experiment in the first place, if he or she was so certain of the outcome.

A Debate: Retain the Null Hypothesis

or Fail to Reject the Null Hypothesis __________________

Remember that the classic two-group experiment begins with the statement of the null and the alternative hypotheses. Some statisticians are concerned about the wording of the decision that is to be made. Some say, “The null hypothesis was retained.” Others insist that it should be worded, “The null hypothesis was not rejected.” Although it may seem to be a trivial point, it has important implications for the entire meaning of the experiment. After an experiment has been performed and statistically analyzed, and the null hypothesis was retained (or we failed to reject it), what is the overall con- clusion? Does it really mean that your chosen independent variable has no effect whatsoever on your chosen dependent variable? Under any circum- stances? With any kind of subjects? No! The conclusion is really limited to this particular sample of subjects. Perhaps the null hypothesis was retained because your sample of subjects (although it was randomly chosen) acted differently from another or larger sample of subjects. There are other possibilities for why the null hypothesis might have been retained besides the sample of subjects. Suppose that your chosen indepen- dent variable does affect your subjects but you chose the wrong dependent variable. One famous example of this type of error was in studies of the effectiveness of Vitamin C against the common cold. Initial studies chose the dependent variable to be the number of new colds per time period (e.g., per year). In this case, the null hypothesis was retained. Does this mean that Vitamin C has no effect on the common cold? No! When the dependent vari- able was the number of days sick within a given time period, the null hypoth- esis was rejected, and it was preliminarily concluded that Vitamin C appears to reduce the number of days that people are sick with a cold. It is important to remember that just because you do not find an effect does not mean it does not exist. You might be looking in the wrong place (using the wrong subjects, using the wrong experimental design) and/or you might be using the wrong dependent variable to measure the effect. Thus, some statisticians recommend that it be stated, “The null hypothe- sis was not rejected.” This variation of the statement has the connotation that there still may be a significant effect somewhere, but it just was not found this time. More important, it has the connotation that, although the null hypothesis was retained, it is not necessarily being endorsed as true. Again, this reflects the conservative nature of most statisticians.

126 STATISTICS: A GENTLE INTRODUCTION

relationship between the two variables. A statistical test (such as correlation) is performed on the data from a sample, and it is concluded that any rela- tionship that is observed is due to chance. In this case, we retain H 0 and infer that there is no relationship between these two variables in the population from which the sample was drawn. In reality, we do not know whether H 0 is true. However, if it is true for the population and we retain H 0 for the sam- ple, then we have made a correct decision.

2. Type I Error: Reject H 0 , When H 0 Is Actually True

The Type I error is considered to be the more dangerous of the two types of errors in hypothesis testing. When researchers commit a Type I error, they are claiming that their research hypothesis is true when it really is not true. This is considered to be a serious error because it misleads people. Imagine, for example, a new drug for the cure of AIDS. A researcher who commits a Type I error is claiming that the new drug works when it really does not work. People with AIDS are being given false hopes, and resources that should be spent on a drug that really works will be spent on this bogus drug. The probability of committing a Type I error should be less than 5 chances out of 100 or p < .05. The probability of committing a Type I error is also called alpha ( αα ).

3. Correct Decision: Reject H 0 , When H 0 Is Actually False

In this case, we have concluded that there is a real relationship between the two variables, and it is probably not due to chance (or that there is a very small probability that our results may be attributed to chance). Therefore, we reject H 0 and assume that there is a relationship between these two variables in the population. If in the population there is a real relationship between the two variables, then by rejecting H 0 , we have made the correct decision.

4. Type II Error: Retain H 0 , When H 0 Is Actually False

A Type II error occurs when a researcher claims that a drug does not work when, in reality, it does work. This is not considered to be as serious an error as the Type I error. Researchers may not ever discover anything new or become famous if they frequently commit Type II errors, but at least they have not misled the public and other researchers. The probability of a Type II error is also called beta ( ββ ). A summary of these decisions appears in Table 5.1.

128 STATISTICS: A GENTLE INTRODUCTION

__________________________________ Significance Levels

A test of significance is used to determine whether we retain or reject H 0. The significance test will result in a final test statistic or some single number. If this number is small, then it is more likely that our results are due to chance, and we will retain H 0. If this number is large, then we will reject H 0 and conclude that there is a very small probability that our results are due to chance. The minimum conventional level of significance is p or α = .05. This final test statistic is compared to a distribution of numbers, which are called critical values. The test statistic must exceed the critical value in order to reject H 0.

_________________ Significant and Nonsignificant Findings

When significant findings have been reported in an experiment, it means that the null hypothesis has been rejected. The word nonsignificant is the oppo- site of significant. When the word nonsignificant appears, it means that the null hypothesis has been retained. Do not use the word insignificant to report nonsignificant statistical findings. Insignificant is a value judgment, and it has no place in the statistical analysis section of a paper. In the results section of a research paper, significant findings are reported if the data meet an alpha level of .05 or less. If the findings are significant, it is a statistical convention to report them significant at the lowest alpha level possible. Thus, although H 0 is rejected at the .05 level (or less), researchers will check to see if their results are significant at the .01 or .001 alpha levels. It appears more impressive if a researcher can conclude that the probability that his or her findings are due to chance is p < .01 or p < .001. It is impor- tant to note that this does not mean that results with alphas at .01 or. are any more important or meaningful than results reported at the .05 level. Some statisticians also object to reporting results that are “highly signifi- cant.” By this, they mean that their findings were significant not only at p < .05 but also at p < .001. These statisticians would argue that the null hypoth- esis is rejected at .05, and thus one’s job is simply to report the lowest sig- nificance possible (e.g., p < .01 or p <. 001). They find it inappropriate, therefore, to use the word highly before the word significant.

Chapter 5: Inferential Statistics 129

Table 5.

Our Decision In Reality The Result

Retain H 0 H 0 is true Correct decision Reject H 0 H 0 is true Type I error (alpha = α) Reject H 0 H 0 is false Correct decision Retain H 0 H 0 is false Type II error (beta = β)

Thus, it also follows that the directional alternative hypothesis has the advantage that it is more sensitive to real differences in the data. In other words, if there is a real difference between two groups’ means, it is more likely to be detected with a directional alternative hypothesis. However, its major disadvantage is that it is also more sensitive to just chance differences between two groups’ means.

__________________________ Did Nuclear Fusion Occur?

In 1989, two chemists claimed that they produced nuclear fusion in a labo- ratory under “cold” conditions; that is, they claimed to have produced a vast amount of energy by fusing atoms and without having to provide large amounts of energy to do so. Their claims can still be analyzed in the hypoth- esis testing situation, although it is not absolutely known whether they did or did not produce fusion. However, most subsequent replications of their work were unsuccessful (see Park, 2000, for a fascinating discussion of the controversy). The null and alternative hypotheses in this situation are as follows:

H 0 : Fusion has not been produced.

Ha: Fusion has been produced.

Situation 1. If subsequent research supports their claims, then the two chemists made the correct decision to reject H 0. Thus, they will probably receive the Nobel Prize, and their names will be immortalized.

Situation 2. If subsequent research shows that they did not really produce fusion, then they rejected H 0 when H 0 was true, and thus they committed the grievous Type I error. Why is this a serious error? They may have misled thousands of researchers, and millions of dollars may have been wasted. The money and resources might have been better spent pursuing other lines of research to demonstrate cold fusion (because physicists claim cold fusion is theoretically possible) rather than these chemists’ mistake.

What about the quiet researcher who actually did demonstrate a small but real amount of fusion in the laboratory but used a nondirectional alternative hypothesis? The researcher failed to reject H 0 when Ha was true, and thus the researcher committed a Type II error. What was the researcher’s name? We do not know. Fame will elude a researcher if there is a continual com- mission of Type II errors because of an inordinate fear of a Type I error! Remember, sometimes scientists must dare to be wrong.

Chapter 5: Inferential Statistics 131

Baloney Detection __________________________________

The late astronomer Carl Sagan, in his 1996 book The Demon-Haunted World: Science as a Candle in the Dark, proposed a baloney detection kit. The purpose of the kit was to evaluate new ideas. The primary tool in the kit was simply skeptical thinking, that is, to understand an argument and to rec- ognize when it may be fallacious or fraudulent. The baloney detection kit would be exceptionally useful in all aspects of our lives, especially in regards to our health, where sometimes the quest for profit may outweigh the dan- gers of a product or when the product is an outright fraud. In the traditional natural sciences, the baloney detection kit can help draw boundaries between real science and pseudoscience. Michael Shermer, publisher of Skeptic mag- azine (www.skeptic.com), has modified Sagan’s baloney detection kit. Let’s use some of Sagan’s and Shermer’s suggestions to investigate three claims: (a) magician David Copperfield’s recent announcement that he predicted Germany’s national lottery numbers 7 months before the drawing; (b) man- gosteen, a South Asian fruit, cures cancer, diabetes, and a plethora of other diseases and illnesses, and it works as well or better than more than 50 pre- scription drugs; and (c) therapeutic touch (TT), a therapy in which a med- ical patient is not actually touched but the patient’s negative energy aura is manipulated by a trained TT therapist in order to relieve pain.

How Reliable Is the Source of the Claim?

A corollary of this criterion would be, Does the claimant have a financial (or fame) interest in the outcome? Pseudoscientists may, on the surface, appear to be reliable, but when we examine their facts and figures, they are often dis- torted, taken out of context, or even fabricated. Often, the claims are merely based on a desire for money and/or fame. Copperfield is a professional magi- cian. He specializes in illusions such as making large jet planes disappear. How reliable is his claim to have predicted lottery numbers in advance? Not very. Would his claim advance his fame (and fortune)? Of course! The chief promoter of mangosteen is identified as a prominent medical doctor and medical researcher. In reality, the doctor is a Georgia family physician who has not published even a single clinical study in any medical journal. He has written a self-published book on mangosteen, touting near- miraculous cures for a variety of diseases with his patients. We noted earlier in Chapter 1 that books, particularly self-published and those published by commercial presses, have no scientific standards to meet; therefore, they often fail to supply us with any acceptable scientific evidence whatsoever! Claiming something is true or saying something is true does not make it so. Mangosteen is being marketed for $37 a bottle. Distributorships are being sold. Mangosteen’s proponents are clearly interested in financial gain. The latter is not a heinous crime, but it becomes one if its proponents know there are no clinical studies with humans that support their outlandish claims.

132 STATISTICS: A GENTLE INTRODUCTION

If TT therapists were serious about the scientific establishment of TT, they would employ acceptable scientific standards in their research. They would show that the results of TT are not due to the placebo effect. They would demonstrate scientifically that trained TT therapists can detect energy fields. To date, only one published TT study has attempted to determine whether TT therapists can detect energy auras better than chance. That study was published by a 9-year-old girl as a fourth-grade science fair proj- ect, and she found that experienced TT therapists could do no better than chance in detecting which hand she held over one of the therapist’s hands (when the TT therapists could not see their hands). TT proponents tend to seek out other proponents. They cite research with positive outcomes. They ignore or deny claims to the contrary.

How Does the Claim Fit With
Known Natural Scientific Laws?

A corollary of this criterion would be, Does the finding seem too good to be true? Copperfield claims his lottery prediction was not a trick. He said it was more like an experiment or a mental exercise. If it was, how does it fit into any known or replicated scientific principle? It simply does not. We would have to create a new principle to explain his mind/matter experiment or use an old one that is without any scientific merit (such as clairvoyance). There is no accepted scientific principle that explains how one would predict lottery numbers in advance. That is simply too good to be true. The health claims for mangosteen actually pass this criterion but not its corollary. The fruit does appear to contain known antioxidants called xan- thones. Xanthones from mangosteen do appear to have some antibacterial and antiviral properties in test tubes only! Where mangosteen fails to live up to its excessive hype is that there has not been one human clinical study to date that has demonstrated that the xanthones in mangosteen have helped or cured a disease. TT proponents propose that humans have an energy field, which can be detected by other “trained” humans. They propose that imbalances in the patient’s energy field cause disease and pain. TT therapists claim they can restore these imbalances by sweeping their hands about 3 inches over the patients’ bodies and in order to get rid of their excess negative energy. Does this fit with any scientifically supported natural laws? No. Does it seem too good to be true? Yes. This is a common ploy in pseudoscience: Concoct exaggerated claims around a kernel of scientific truth. Some fruits (those containing Vitamin C) do appear to aid physical health. Some cancer drugs have been created from plants. But it is not scientifically ethical to claim that mangosteen prevents and cures cancer, as well as lowers cholesterol and prevents heart disease, without acceptable scientific proof, and theoretical proof (i.e., mangosteen

134 STATISTICS: A GENTLE INTRODUCTION

has xanthones, xanthones have antioxidant properties, and antioxidants are thought to aid physical health) is not sufficient. Its power to prevent and cure disease must be demonstrated in empirical studies with humans. The same is true of TT. For example, there is some evidence that humans can interact with energy fields. For example, have you ever noticed that when straightening an antenna, you can sometimes get better reception when you are holding the antenna? However, it is a severe stretch (and pseudo- scientific) to claim humans generate energy fields, that imbalances in these fields cause pain, and that restoring balance by eliminating negative energy is a skill that can be learned. Sagan noted that we tell children about Santa Claus, the Easter Bunny, and the Tooth Fairy, but we retract these myths before they become adults. However, the desire to believe in something wonderful and magical remains in many adults. Wouldn’t it be wonderful if there were super-intelligent, super-nice beings in spaceships visiting the Earth who might give us the secrets to curing cancer and Alzheimer’s disease? Wouldn’t it be great if we only had to drink 3 ounces of mangosteen twice a day to ward off nearly all diseases and illnesses? Wouldn’t it be great if playing a classical CD to a baby boosted his or her IQ? Wouldn’t it be amazing if a person could really relieve pain without touching someone else? But let us return to the essential tool in the baloney detection kit—skeptical thinking. If something seems too good to be true, we should probably be even more skeptical than usual. Perhaps we should demand even higher scientific standards of evidence than usual, espe- cially if the claims appear to fall outside known natural laws. It has been said that extraordinary claims should require extraordinary evidence. An extraor- dinary claim, however, might not always have to provide extraordinary evidence if the evidence for the claim was ordinary but plentiful. A prepon- derance of ordinary evidence will suffice to support the scientific credibility of a theory. Thus, the theory of evolution has no single extraordinary piece of evidence. However, a plethora of studies and observations help to support it overall. I tell my students not to be disappointed when wonderful and magical claims are debunked. There are plenty of real wonders and magic in science yet to be discovered. We do not have to make them up. Francis Crick, Nobel Prize winner for unraveling DNA, reportedly told his mother when he was young that by the time he was older, everything will have been discov- ered. She is said to have replied, “There’ll be plenty left, Ducky.”

Can the Claim Be Disproven or
Has Only Supportive Evidence Been Sought?

Remember, good scientists are highly skeptical. They would always fear a Type I error, that is, telling people something is true when it is not. Pseudoscientists are not typically skeptical. They believe in what they propose without any doubts that they are wrong. Pseudoscientists typically seek only

Chapter 5: Inferential Statistics 135

beliefs but on the lack of even a “shred” of scientific evidence that sexual orientation is biologically determined. Because there is clear and increasing empirical evidence that sexual identity and sexual orientation are highly heritable and biologically based (e.g., Bailey, Pillard, Neale, & Agyei, 1993; Bailey et al., 1999; Bailey, Dunne, & Martin, 2000; Coolidge, Thede, & Young, 2002), it might be concluded that this religious leader is woefully ignorant of such studies, he is unconsciously unaware that his religious beliefs are driving his conclusions, or he is lying about his religious beliefs not biasing his conclusions.

__________ Conclusions About Science and Pseudoscience

As noted earlier, skeptical thinking helps to clear a boundary between science and pseudoscience. As Shermer noted, it is the nature of science to be skeptical yet open-minded and flexible. Thus, sometimes science seems mad- deningly slow and even contradictory. Good scientists may even offer poten- tial flaws or findings that would disconfirm their own hypotheses! Good science involves guessing (hypothesizing), testing (experimentation), and retesting (replication). The latter may be the most critical element in the link. Can the results of a particular experiment be duplicated (replication) by other researchers in other locations? There may not always be a very clear boundary between science and pseudoscience, but the application of the skeptical thinking offered by the principles in the baloney detection kit may help to light the way.

The Most Critical Elements in the

Detection of Baloney in Suspicious

________________________ Studies and Fraudulent Claims

In my opinion, there are two most critical elements for detecting baloney in any experiment or claim. The first and most important in the social and med- ical sciences is, Has the placebo effect been adequately controlled for? For example, in a highly controversial psychotherapeutic technique, eye move- ment desensitization reprocessing (EMDR), intensively trained therapists teach their patients to move their eyes back and forth while discussing their traumatic experience (see Herbert et al., 2000, for a critical review). Despite calls for studies to control for the placebo effect—in this case, the therapist’s very strong belief that the treatment works—there are few, if any, EMDR studies in which the placebo effect has been adequately controlled. In addi- tion, there are obviously demand characteristics associated with the delivery of EMDR. Demand characteristics are the subtle hints and cues in human interactions (experiments, psychotherapy, etc.) that prompt participants to

Chapter 5: Inferential Statistics 137

act in ways consistent with the beliefs of the experimenter or therapist. Demand characteristics usually operate below one’s level of awareness. Psychologist Martin Orne has repeatedly demonstrated that demand char- acteristics can be very powerful. For example, if a devoted EMDR therapist worked for an hour on your traumatic experience and then asked you how much it helped (with a very kind and expectant facial expression), would you not be at least slightly inclined to say “yes” or “a little bit” even if in reality it did not help at all because you do not wish to disappoint the EMDR therapist? Controlling for the gleam, glow, and religiosity of some devotees of new techniques and the demand characteristics of their methods can be experimentally difficult. However, as Sagan and Shermer have noted, often these questionable studies do not seek evidence to disconfirm their claims, and only supporting evidence is sought. As I have already stated, this is particularly true where strong placebo effects are suspected. The second most important element of baloney detection for your author is Sagan’s and Shermer’s fourth principle: How does the claim fit with known natural scientific laws? In the case of EMDR, its rationale relies on physiologi- cal and neurological processes such as information processing and eye move- ments somehow related to rapid eye movement (REM) sleep. Certainly, there is good support for cognitive behavioral models of therapy, and there is a wealth of evidence for REM sleep. However, the direct connection between information-processing models, cognitive behavior techniques, and eye move- ments in the relief of psychological distress arising from traumatic experiences has not been demonstrated. In each case where I hear of a new technique that seems too good to be true, I find that the scientific or natural explanation for why the technique works is unstated, vague, or questionable. In the case of new psychotherapeutic techniques, I always think that the absence of clear scientific explanations with specifically clear sequences for how the therapy works makes me wonder about how strong the placebo effect plays in the therapy’s outcome. In their defense, I will state that it is sometimes difficult to explain how some traditionally accepted therapies, such as psychoanalysis, work. However, that defense is no excuse for not searching for reasonable and scientific explanations for how a new therapy works. It is also absolutely imperative, in cases where scientific explanations for the therapeutic mechanism are somewhat difficult to demonstrate, that the placebo effects are completely and adequately controlled for and that disconfirming evidence has been sincerely and actively sought.

Can Statistics Solve Every Problem? ___________________

Of course not! In fact, I must warn you that sometimes statistics may even muddle a problem. It has been pointed out that it is not statistics that lie; it is people who lie. Often, when I drive across the country, I listen to “talk radio.” I’ve heard many extremely sophisticated statistical arguments from both sides of the gun control issue. I am always impressed at the way each

138 STATISTICS: A GENTLE INTRODUCTION