


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An introduction to inferential statistics, focusing on hypothesis testing in controlled experiments. It covers the importance of the null hypothesis, nondirectional and directional alternative hypotheses, and the four possible outcomes in hypothesis testing. The document also touches upon the basics of probability theory and probability distributions.
Typology: Study notes
1 / 34
This page cannot be seen from the preview
Don't miss anything!
119
R
ecently, when I was shopping at the grocery store, I became aware that music was softly playing throughout the store (in this case, the ancient rock group Strawberry Alarm Clock’s “Incense and Peppermint”). In a curi- ous mood, I asked the store manager, “Why music?” and “why this type of music?” In a very serious manner, he told me that “studies” had shown people buy more groceries listening to this type of music. Perhaps more
businesses would stay in business if they were more skeptical and fell less for scams that promise “a buying atmosphere.” In this chapter on inferential statistics, you will learn how to test hypotheses such as “music makes people buy more,” or “HIV does not cause AIDS,” or “moving one’s eyes back and forth helps to forget traumatic events.” Inferential statistics is concerned with making conclusions about popula- tions from smaller samples drawn from the population. In descriptive statis- tics, we were primarily concerned with simple descriptions of numbers by graphs, tables, and parameters that summarized sets of numbers such as the mean and standard deviation. In inferential statistics, our primary concern will be testing hypotheses on samples and hoping that these hypotheses, if true of the sample, will be true and generalize to the population. Remember that a population is defined as the mostly hypothetical group to whom we wish to generalize. The population is hypothetical for two reasons: First, we will rarely, if ever, have the time or money, or it will not be feasible to test everyone in the population. Second, we will attempt to generalize from a current sample to future members of the population. For example, if we were able to determine a complete cure for AIDS, we would hope that the cure would not only work for the current population of AIDS patients in the world but also any future AIDS patients. The most common research designs in inferential statistics are actually very simple: We will test whether two different variables are related to each other (through correlation and the chi-square test) or whether two or more groups treated differently will have different means on a response (or out- come) variable (through t tests and analyses of variance). Examples of whether two different variables are related to each other are plentiful throughout science and its many disciplines. We may wish to know whether cigarettes are related to cancer, whether violent crime rates are related to crack cocaine use, whether breast implants are related to immunodeficiency disease, whether twins’ IQs are more highly related than siblings’ IQs, and so on. Note that finding a relationship between two variables does not mean that the two variables are causally related. However, sometimes determin- ing whether relationships exist between two variables, such as smoking and rates of lung cancer, may give up clues that allow us to set up controlled experiments where causality may be determined. Controlled experiments, typically with two or more groups treated differently, are the most powerful experimental designs in all of statistics. Whereas correlational designs, which determine whether two variables are related, are very common and useful, they pale in comparison to the power of a well-designed experiment with two or more groups. It is perhaps unfortunate (maybe statisticians should hire a public rela- tions firm) that the most powerful experimental design is simply called a controlled experiment. There are other theories in science that have much better names, such as the big bang theory, which attempts to explain the ori- gins of the universe. Nonetheless, for the present, we are stuck with the name
120 STATISTICS: A GENTLE INTRODUCTION
hypothesis will be the opposite of what the scientist believes or hopes to be true. The prior research hunch or belief about what is true is called the alter- native hypothesis (abbreviated Ha). As noted earlier in the book, science must work slowly and conservatively. The repercussions of poorly performed science are deadly or even worse. Thus, the null hypothesis is usually a safe, conservative position, which says that there is no relationship between the variables or, in the case of the drug experiment, that the drug does not affect the experimental group differently on the dependent variable compared to the control group.
All experiments begin with the statement of the null and alternative hypothe- ses (at least in the experimenter’s mind, but not usually in the published arti- cle). However, the null hypothesis is like a default position: We will retain the null hypothesis (or we will fail to reject the null hypothesis) unless our statistical test tells us to do otherwise. If there is no statistical difference between the two means, then the null hypothesis is retained. If the statistical test determines that there is a difference between the means (beyond just chance differences), then the null hypothesis will be rejected. In summary, when a statistical test is employed, one of two possible deci- sions must be made: (a) retain the null hypothesis, which means that there are no differences between the two means other than chance differences, or (b) reject the null hypothesis, which means that the means are different from each other well beyond what would be expected by chance.
A statistical test of the classic two-group experiment will analyze the difference between the two means to determine whether the observed difference could have occurred by chance alone. The z distribution, or a similar distribution, will be used to make the decision to retain or reject the null hypothesis. To appreciate how this occurs, imagine a large vat of 10,000 ping-pong balls (see Figure 5.1). Let us suppose that each ping-pong ball has a z score written on it. Each z score on a ball occurs with the same frequency as in the z distribution. Remember that the z distribution reveals that exactly 68.26% of the 10, balls will fall within ±1 standard deviation of the mean z score of 0.00. This means that 6,826 of the 10,000 ping-pong balls will have numbers ranging from −1.00 to +1.00. Also, 95.44% of all the balls will fall within ±2 stan- dard deviations of the mean. Therefore, 9,544 of the ping-pong balls will range between −2.00 and +2.00. Finally, we know that 9,974 ping-pong balls will be numbered from −3.00 to +3.00.
122 STATISTICS: A GENTLE INTRODUCTION
Now, let us play a game of chance. If blindfolded and I dig into the vat of balls and pull out one ball in random fashion, what is the probability that it will be a number between −1.00 and +1.00? If I bet you $20 that the number would be greater than +1.00 or less than −1.00, would you take my bet? You should take my bet because the probability that the ball has a number between −1.00 and +1.00 is 68.26%. Therefore, you would roughly have a 68% chance of winning, and I would only have a 32% chance of winning. How about if we up the stakes? I will bet you $100 that a z score on a randomly chosen ball is greater than +2.00 or less than −2.00. Would you take this bet? You should (and quickly) because now there is a 95.44% you would win and less than a 5% chance that I would win. What would happen if we finally decided to play the game officially, and I bet a randomly chosen ball is greater than +3.00 or less than −3.00? You put your money next to my money. A fair and neutral party is chosen to select a ball and is blindfolded. What would be your conclusion if the result- ing ball had a +3.80 on it? There are two possibilities: Either we both have witnessed an extremely unlikely event (only 1 ball out of 10,000 has a +3.80 on it), or something is happening beyond what would be expected by chance alone (namely, that the game is rigged and I am cheating in some unseen way). Now, let us use this knowledge to understand the big decision (retain or reject the null hypothesis). The decision to retain or reject the null hypothe- sis will be tied to the z distribution. Each of the individual subject’s scores in the two-group experiment will be cast into a large and complicated formula, and a single z- like number will result. In part, the size of this single z -like number will be based on the difference between the two groups’ means.
Chapter 5: Inferential Statistics 123
−.
Three Randomly Chosen Ping-Pong Balls
10,000 Ping-Pong Balls
Figure 5.1 A Vat of 10,000 Ping-Pong Balls, Each With a Single Value of z, Occurring With the Same Frequency as in the z Distribution
Another way of thinking about the two means is whether they were both drawn from the same population distribution (in other words, the treatment did not work to make one sample different from another) or whether the two means came from different populations (because the treatment did work on one group and made its mean much larger or much smaller than the other group’s mean). The alternative hypothesis is often what we hope is true in our experi- ment. The alternative hypothesis is most often stated as
Ha: μ 1 = μ 2
Note that the alternative hypothesis is stated as “Mean 1 does not equal Mean 2.” This is its most common form, and it is called a nondirectional alternative hypothesis. Logically, the “does not equal” sign allows for two possibilities. One possibility is that Mean 1 is greater than Mean 2, and the other is Mean 1 can be less than Mean 2. Because the controlled experiment involves making inferences about popu- lations, the analysis of the experiment involves inferential statistics. Thus, the mean is an essential parameter in both descriptive and inferential statistics.
An experimenter has a choice between two types of alternative hypothe- ses when hypothesis testing, a nondirectional or a directional alternative hypothesis. A directional alternative hypothesis, in the two-group experi- ment, states the explicit results of the difference between the two means. For example, one alternative hypothesis could be
Ha: μ 1 > μ 2
Here, the experimenter predicts that the mean for Group 1 will be higher than the mean for Group 2. Another possibility is that the experimenter predicts
Ha: μ 1 < μ 2
Here, the experimenter predicts that Mean 1 will be less than Mean 2. In practice, however, most statisticians choose a nondirectional alternative hypothesis. One of the reasons for this is that the nondirectional alternative hypothesis is less influenced by chance. Directional alternative hypotheses, however, are not all bad. They are more sensitive to small but real differ- ences between the two groups’ means. Most statisticians agree that the direc- tional alternative hypothesis should be reserved for situations where the
Chapter 5: Inferential Statistics 125
experimenter is relatively certain of the outcome. It is legitimate to wonder, however, why the experimenter was conducting the experiment in the first place, if he or she was so certain of the outcome.
Remember that the classic two-group experiment begins with the statement of the null and the alternative hypotheses. Some statisticians are concerned about the wording of the decision that is to be made. Some say, “The null hypothesis was retained.” Others insist that it should be worded, “The null hypothesis was not rejected.” Although it may seem to be a trivial point, it has important implications for the entire meaning of the experiment. After an experiment has been performed and statistically analyzed, and the null hypothesis was retained (or we failed to reject it), what is the overall con- clusion? Does it really mean that your chosen independent variable has no effect whatsoever on your chosen dependent variable? Under any circum- stances? With any kind of subjects? No! The conclusion is really limited to this particular sample of subjects. Perhaps the null hypothesis was retained because your sample of subjects (although it was randomly chosen) acted differently from another or larger sample of subjects. There are other possibilities for why the null hypothesis might have been retained besides the sample of subjects. Suppose that your chosen indepen- dent variable does affect your subjects but you chose the wrong dependent variable. One famous example of this type of error was in studies of the effectiveness of Vitamin C against the common cold. Initial studies chose the dependent variable to be the number of new colds per time period (e.g., per year). In this case, the null hypothesis was retained. Does this mean that Vitamin C has no effect on the common cold? No! When the dependent vari- able was the number of days sick within a given time period, the null hypoth- esis was rejected, and it was preliminarily concluded that Vitamin C appears to reduce the number of days that people are sick with a cold. It is important to remember that just because you do not find an effect does not mean it does not exist. You might be looking in the wrong place (using the wrong subjects, using the wrong experimental design) and/or you might be using the wrong dependent variable to measure the effect. Thus, some statisticians recommend that it be stated, “The null hypothe- sis was not rejected.” This variation of the statement has the connotation that there still may be a significant effect somewhere, but it just was not found this time. More important, it has the connotation that, although the null hypothesis was retained, it is not necessarily being endorsed as true. Again, this reflects the conservative nature of most statisticians.
126 STATISTICS: A GENTLE INTRODUCTION
relationship between the two variables. A statistical test (such as correlation) is performed on the data from a sample, and it is concluded that any rela- tionship that is observed is due to chance. In this case, we retain H 0 and infer that there is no relationship between these two variables in the population from which the sample was drawn. In reality, we do not know whether H 0 is true. However, if it is true for the population and we retain H 0 for the sam- ple, then we have made a correct decision.
The Type I error is considered to be the more dangerous of the two types of errors in hypothesis testing. When researchers commit a Type I error, they are claiming that their research hypothesis is true when it really is not true. This is considered to be a serious error because it misleads people. Imagine, for example, a new drug for the cure of AIDS. A researcher who commits a Type I error is claiming that the new drug works when it really does not work. People with AIDS are being given false hopes, and resources that should be spent on a drug that really works will be spent on this bogus drug. The probability of committing a Type I error should be less than 5 chances out of 100 or p < .05. The probability of committing a Type I error is also called alpha ( αα ).
In this case, we have concluded that there is a real relationship between the two variables, and it is probably not due to chance (or that there is a very small probability that our results may be attributed to chance). Therefore, we reject H 0 and assume that there is a relationship between these two variables in the population. If in the population there is a real relationship between the two variables, then by rejecting H 0 , we have made the correct decision.
A Type II error occurs when a researcher claims that a drug does not work when, in reality, it does work. This is not considered to be as serious an error as the Type I error. Researchers may not ever discover anything new or become famous if they frequently commit Type II errors, but at least they have not misled the public and other researchers. The probability of a Type II error is also called beta ( ββ ). A summary of these decisions appears in Table 5.1.
128 STATISTICS: A GENTLE INTRODUCTION
A test of significance is used to determine whether we retain or reject H 0. The significance test will result in a final test statistic or some single number. If this number is small, then it is more likely that our results are due to chance, and we will retain H 0. If this number is large, then we will reject H 0 and conclude that there is a very small probability that our results are due to chance. The minimum conventional level of significance is p or α = .05. This final test statistic is compared to a distribution of numbers, which are called critical values. The test statistic must exceed the critical value in order to reject H 0.
When significant findings have been reported in an experiment, it means that the null hypothesis has been rejected. The word nonsignificant is the oppo- site of significant. When the word nonsignificant appears, it means that the null hypothesis has been retained. Do not use the word insignificant to report nonsignificant statistical findings. Insignificant is a value judgment, and it has no place in the statistical analysis section of a paper. In the results section of a research paper, significant findings are reported if the data meet an alpha level of .05 or less. If the findings are significant, it is a statistical convention to report them significant at the lowest alpha level possible. Thus, although H 0 is rejected at the .05 level (or less), researchers will check to see if their results are significant at the .01 or .001 alpha levels. It appears more impressive if a researcher can conclude that the probability that his or her findings are due to chance is p < .01 or p < .001. It is impor- tant to note that this does not mean that results with alphas at .01 or. are any more important or meaningful than results reported at the .05 level. Some statisticians also object to reporting results that are “highly signifi- cant.” By this, they mean that their findings were significant not only at p < .05 but also at p < .001. These statisticians would argue that the null hypoth- esis is rejected at .05, and thus one’s job is simply to report the lowest sig- nificance possible (e.g., p < .01 or p <. 001). They find it inappropriate, therefore, to use the word highly before the word significant.
Chapter 5: Inferential Statistics 129
Table 5.
Our Decision In Reality The Result
Retain H 0 H 0 is true Correct decision Reject H 0 H 0 is true Type I error (alpha = α) Reject H 0 H 0 is false Correct decision Retain H 0 H 0 is false Type II error (beta = β)
Thus, it also follows that the directional alternative hypothesis has the advantage that it is more sensitive to real differences in the data. In other words, if there is a real difference between two groups’ means, it is more likely to be detected with a directional alternative hypothesis. However, its major disadvantage is that it is also more sensitive to just chance differences between two groups’ means.
In 1989, two chemists claimed that they produced nuclear fusion in a labo- ratory under “cold” conditions; that is, they claimed to have produced a vast amount of energy by fusing atoms and without having to provide large amounts of energy to do so. Their claims can still be analyzed in the hypoth- esis testing situation, although it is not absolutely known whether they did or did not produce fusion. However, most subsequent replications of their work were unsuccessful (see Park, 2000, for a fascinating discussion of the controversy). The null and alternative hypotheses in this situation are as follows:
H 0 : Fusion has not been produced.
Ha: Fusion has been produced.
Situation 1. If subsequent research supports their claims, then the two chemists made the correct decision to reject H 0. Thus, they will probably receive the Nobel Prize, and their names will be immortalized.
Situation 2. If subsequent research shows that they did not really produce fusion, then they rejected H 0 when H 0 was true, and thus they committed the grievous Type I error. Why is this a serious error? They may have misled thousands of researchers, and millions of dollars may have been wasted. The money and resources might have been better spent pursuing other lines of research to demonstrate cold fusion (because physicists claim cold fusion is theoretically possible) rather than these chemists’ mistake.
What about the quiet researcher who actually did demonstrate a small but real amount of fusion in the laboratory but used a nondirectional alternative hypothesis? The researcher failed to reject H 0 when Ha was true, and thus the researcher committed a Type II error. What was the researcher’s name? We do not know. Fame will elude a researcher if there is a continual com- mission of Type II errors because of an inordinate fear of a Type I error! Remember, sometimes scientists must dare to be wrong.
Chapter 5: Inferential Statistics 131
The late astronomer Carl Sagan, in his 1996 book The Demon-Haunted World: Science as a Candle in the Dark, proposed a baloney detection kit. The purpose of the kit was to evaluate new ideas. The primary tool in the kit was simply skeptical thinking, that is, to understand an argument and to rec- ognize when it may be fallacious or fraudulent. The baloney detection kit would be exceptionally useful in all aspects of our lives, especially in regards to our health, where sometimes the quest for profit may outweigh the dan- gers of a product or when the product is an outright fraud. In the traditional natural sciences, the baloney detection kit can help draw boundaries between real science and pseudoscience. Michael Shermer, publisher of Skeptic mag- azine (www.skeptic.com), has modified Sagan’s baloney detection kit. Let’s use some of Sagan’s and Shermer’s suggestions to investigate three claims: (a) magician David Copperfield’s recent announcement that he predicted Germany’s national lottery numbers 7 months before the drawing; (b) man- gosteen, a South Asian fruit, cures cancer, diabetes, and a plethora of other diseases and illnesses, and it works as well or better than more than 50 pre- scription drugs; and (c) therapeutic touch (TT), a therapy in which a med- ical patient is not actually touched but the patient’s negative energy aura is manipulated by a trained TT therapist in order to relieve pain.
A corollary of this criterion would be, Does the claimant have a financial (or fame) interest in the outcome? Pseudoscientists may, on the surface, appear to be reliable, but when we examine their facts and figures, they are often dis- torted, taken out of context, or even fabricated. Often, the claims are merely based on a desire for money and/or fame. Copperfield is a professional magi- cian. He specializes in illusions such as making large jet planes disappear. How reliable is his claim to have predicted lottery numbers in advance? Not very. Would his claim advance his fame (and fortune)? Of course! The chief promoter of mangosteen is identified as a prominent medical doctor and medical researcher. In reality, the doctor is a Georgia family physician who has not published even a single clinical study in any medical journal. He has written a self-published book on mangosteen, touting near- miraculous cures for a variety of diseases with his patients. We noted earlier in Chapter 1 that books, particularly self-published and those published by commercial presses, have no scientific standards to meet; therefore, they often fail to supply us with any acceptable scientific evidence whatsoever! Claiming something is true or saying something is true does not make it so. Mangosteen is being marketed for $37 a bottle. Distributorships are being sold. Mangosteen’s proponents are clearly interested in financial gain. The latter is not a heinous crime, but it becomes one if its proponents know there are no clinical studies with humans that support their outlandish claims.
132 STATISTICS: A GENTLE INTRODUCTION
If TT therapists were serious about the scientific establishment of TT, they would employ acceptable scientific standards in their research. They would show that the results of TT are not due to the placebo effect. They would demonstrate scientifically that trained TT therapists can detect energy fields. To date, only one published TT study has attempted to determine whether TT therapists can detect energy auras better than chance. That study was published by a 9-year-old girl as a fourth-grade science fair proj- ect, and she found that experienced TT therapists could do no better than chance in detecting which hand she held over one of the therapist’s hands (when the TT therapists could not see their hands). TT proponents tend to seek out other proponents. They cite research with positive outcomes. They ignore or deny claims to the contrary.
A corollary of this criterion would be, Does the finding seem too good to be true? Copperfield claims his lottery prediction was not a trick. He said it was more like an experiment or a mental exercise. If it was, how does it fit into any known or replicated scientific principle? It simply does not. We would have to create a new principle to explain his mind/matter experiment or use an old one that is without any scientific merit (such as clairvoyance). There is no accepted scientific principle that explains how one would predict lottery numbers in advance. That is simply too good to be true. The health claims for mangosteen actually pass this criterion but not its corollary. The fruit does appear to contain known antioxidants called xan- thones. Xanthones from mangosteen do appear to have some antibacterial and antiviral properties in test tubes only! Where mangosteen fails to live up to its excessive hype is that there has not been one human clinical study to date that has demonstrated that the xanthones in mangosteen have helped or cured a disease. TT proponents propose that humans have an energy field, which can be detected by other “trained” humans. They propose that imbalances in the patient’s energy field cause disease and pain. TT therapists claim they can restore these imbalances by sweeping their hands about 3 inches over the patients’ bodies and in order to get rid of their excess negative energy. Does this fit with any scientifically supported natural laws? No. Does it seem too good to be true? Yes. This is a common ploy in pseudoscience: Concoct exaggerated claims around a kernel of scientific truth. Some fruits (those containing Vitamin C) do appear to aid physical health. Some cancer drugs have been created from plants. But it is not scientifically ethical to claim that mangosteen prevents and cures cancer, as well as lowers cholesterol and prevents heart disease, without acceptable scientific proof, and theoretical proof (i.e., mangosteen
134 STATISTICS: A GENTLE INTRODUCTION
has xanthones, xanthones have antioxidant properties, and antioxidants are thought to aid physical health) is not sufficient. Its power to prevent and cure disease must be demonstrated in empirical studies with humans. The same is true of TT. For example, there is some evidence that humans can interact with energy fields. For example, have you ever noticed that when straightening an antenna, you can sometimes get better reception when you are holding the antenna? However, it is a severe stretch (and pseudo- scientific) to claim humans generate energy fields, that imbalances in these fields cause pain, and that restoring balance by eliminating negative energy is a skill that can be learned. Sagan noted that we tell children about Santa Claus, the Easter Bunny, and the Tooth Fairy, but we retract these myths before they become adults. However, the desire to believe in something wonderful and magical remains in many adults. Wouldn’t it be wonderful if there were super-intelligent, super-nice beings in spaceships visiting the Earth who might give us the secrets to curing cancer and Alzheimer’s disease? Wouldn’t it be great if we only had to drink 3 ounces of mangosteen twice a day to ward off nearly all diseases and illnesses? Wouldn’t it be great if playing a classical CD to a baby boosted his or her IQ? Wouldn’t it be amazing if a person could really relieve pain without touching someone else? But let us return to the essential tool in the baloney detection kit—skeptical thinking. If something seems too good to be true, we should probably be even more skeptical than usual. Perhaps we should demand even higher scientific standards of evidence than usual, espe- cially if the claims appear to fall outside known natural laws. It has been said that extraordinary claims should require extraordinary evidence. An extraor- dinary claim, however, might not always have to provide extraordinary evidence if the evidence for the claim was ordinary but plentiful. A prepon- derance of ordinary evidence will suffice to support the scientific credibility of a theory. Thus, the theory of evolution has no single extraordinary piece of evidence. However, a plethora of studies and observations help to support it overall. I tell my students not to be disappointed when wonderful and magical claims are debunked. There are plenty of real wonders and magic in science yet to be discovered. We do not have to make them up. Francis Crick, Nobel Prize winner for unraveling DNA, reportedly told his mother when he was young that by the time he was older, everything will have been discov- ered. She is said to have replied, “There’ll be plenty left, Ducky.”
Remember, good scientists are highly skeptical. They would always fear a Type I error, that is, telling people something is true when it is not. Pseudoscientists are not typically skeptical. They believe in what they propose without any doubts that they are wrong. Pseudoscientists typically seek only
Chapter 5: Inferential Statistics 135
beliefs but on the lack of even a “shred” of scientific evidence that sexual orientation is biologically determined. Because there is clear and increasing empirical evidence that sexual identity and sexual orientation are highly heritable and biologically based (e.g., Bailey, Pillard, Neale, & Agyei, 1993; Bailey et al., 1999; Bailey, Dunne, & Martin, 2000; Coolidge, Thede, & Young, 2002), it might be concluded that this religious leader is woefully ignorant of such studies, he is unconsciously unaware that his religious beliefs are driving his conclusions, or he is lying about his religious beliefs not biasing his conclusions.
As noted earlier, skeptical thinking helps to clear a boundary between science and pseudoscience. As Shermer noted, it is the nature of science to be skeptical yet open-minded and flexible. Thus, sometimes science seems mad- deningly slow and even contradictory. Good scientists may even offer poten- tial flaws or findings that would disconfirm their own hypotheses! Good science involves guessing (hypothesizing), testing (experimentation), and retesting (replication). The latter may be the most critical element in the link. Can the results of a particular experiment be duplicated (replication) by other researchers in other locations? There may not always be a very clear boundary between science and pseudoscience, but the application of the skeptical thinking offered by the principles in the baloney detection kit may help to light the way.
In my opinion, there are two most critical elements for detecting baloney in any experiment or claim. The first and most important in the social and med- ical sciences is, Has the placebo effect been adequately controlled for? For example, in a highly controversial psychotherapeutic technique, eye move- ment desensitization reprocessing (EMDR), intensively trained therapists teach their patients to move their eyes back and forth while discussing their traumatic experience (see Herbert et al., 2000, for a critical review). Despite calls for studies to control for the placebo effect—in this case, the therapist’s very strong belief that the treatment works—there are few, if any, EMDR studies in which the placebo effect has been adequately controlled. In addi- tion, there are obviously demand characteristics associated with the delivery of EMDR. Demand characteristics are the subtle hints and cues in human interactions (experiments, psychotherapy, etc.) that prompt participants to
Chapter 5: Inferential Statistics 137
act in ways consistent with the beliefs of the experimenter or therapist. Demand characteristics usually operate below one’s level of awareness. Psychologist Martin Orne has repeatedly demonstrated that demand char- acteristics can be very powerful. For example, if a devoted EMDR therapist worked for an hour on your traumatic experience and then asked you how much it helped (with a very kind and expectant facial expression), would you not be at least slightly inclined to say “yes” or “a little bit” even if in reality it did not help at all because you do not wish to disappoint the EMDR therapist? Controlling for the gleam, glow, and religiosity of some devotees of new techniques and the demand characteristics of their methods can be experimentally difficult. However, as Sagan and Shermer have noted, often these questionable studies do not seek evidence to disconfirm their claims, and only supporting evidence is sought. As I have already stated, this is particularly true where strong placebo effects are suspected. The second most important element of baloney detection for your author is Sagan’s and Shermer’s fourth principle: How does the claim fit with known natural scientific laws? In the case of EMDR, its rationale relies on physiologi- cal and neurological processes such as information processing and eye move- ments somehow related to rapid eye movement (REM) sleep. Certainly, there is good support for cognitive behavioral models of therapy, and there is a wealth of evidence for REM sleep. However, the direct connection between information-processing models, cognitive behavior techniques, and eye move- ments in the relief of psychological distress arising from traumatic experiences has not been demonstrated. In each case where I hear of a new technique that seems too good to be true, I find that the scientific or natural explanation for why the technique works is unstated, vague, or questionable. In the case of new psychotherapeutic techniques, I always think that the absence of clear scientific explanations with specifically clear sequences for how the therapy works makes me wonder about how strong the placebo effect plays in the therapy’s outcome. In their defense, I will state that it is sometimes difficult to explain how some traditionally accepted therapies, such as psychoanalysis, work. However, that defense is no excuse for not searching for reasonable and scientific explanations for how a new therapy works. It is also absolutely imperative, in cases where scientific explanations for the therapeutic mechanism are somewhat difficult to demonstrate, that the placebo effects are completely and adequately controlled for and that disconfirming evidence has been sincerely and actively sought.
Of course not! In fact, I must warn you that sometimes statistics may even muddle a problem. It has been pointed out that it is not statistics that lie; it is people who lie. Often, when I drive across the country, I listen to “talk radio.” I’ve heard many extremely sophisticated statistical arguments from both sides of the gun control issue. I am always impressed at the way each
138 STATISTICS: A GENTLE INTRODUCTION