Sea Change in Social Science Research? [I am considering adding these topics]
Downfall of NHST
Numbers Must Mean Something
Replication = Exact Replication
Experimentation and Causation
Qualitative vs. Quantitative Illusion
Why Observation Oriented Modeling is needed
Writing in 1936 Johnson stated, "Those data should be measured which can be measured; those which cannot be measured should be treated otherwise. Much remains to be discovered in scientific methodology about valid treatment and adequate and economic description of non-measurable facts." (p. 351). We would quibble with "Those data should be measured..." and change it to "Those attributes should be measured...", but the point is essentially valid; namely, psychologists in the early 1900s had not demonstrated the continuous quantitative structure of the attributes they were studying. Without continuous quantitative structure, the application of t-tests, ANOVA, least squares regression, factor analysis, etc. to psychological data is dubious, at best. Newer methods like SEM and multilevel modeling are no more valid because psychologists have not to date demonstrated they are measuring the attributes they hold dear (e.g., intelligence, depression, anxiety, the Big Five personality traits, etc.). In this sense psychology has failed to progress as a science because it has failed to understand the difference between quantity and quality as distinguishable modes of being. None of these facts are secrets, and a good place for the reader to begin is with Joel Michell's Measurement in psychology: Critical history of a methodological concept (1999, Cambridge University Press).
Summarizing the measurement dilemma in psychology Paul Barrett (2003) argues that psychology can only move forward if it (1) demonstrates the continuous quantitative structure of its cherished attributes, (2) develops non-quantitative techniques, or (3) behaves in a more brutally honest fashion regarding the serious limitations of current methods, treating them as -- at best -- crude approximations of attributes. In Observation Oriented Modeling, we have chosen the second route in developing a philosophy, body of techniques, and software that do not depend on assumed continuous quantities. OOM also completely breaks away from the abstract and convoluted Null Hypothesis Significance Testing paradigm that has stymied psychological research for some 70 years. NHST is furthermore a product of positivistic thinking in the late 1800s and early 1900s, and it is time to let it all go. Countless authors have discussed these and other serious issues, and a few classic and recent examples follow:
- Paul Meehl's classic paper (Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology) in which he states that NHST is the worst thing to happen to psychology (p. 817).
- David Lykken's classic book chapter, What's Wrong with Psychology Anyway?
- Gerd Gigerenzer's Mindless Statistics paper reminding us that most psychologists (that includes you and I) don't even know what the p-value actually means...yet it is the gatekeeper to publishable results.
- Outsiders can often see psychology more clearly than those of us on the inside. Here's a classic paper (Social Science as Theology) by Neil Postman and Richard Feynman's classic Caltech graduation speech (Cargo Cult Science), both offering pointed critiques of the social sciences. Theology was considered by St. Thomas Aquinas to be the queen of all the sciences because through it the mind contemplates the source of all being as well as being itself. The truly disturbing fact about Postman's paper is that social science is not good science nor is it good theology. What is it, then? To the extent that it relies on variable-based modeling and NHST, we would characterize it as a hodgepodge of statistical anecdotes.
David A. Freedman's classic paper Statistical Models and Shoe Leather. William Mason, writing in the same 1991 issue of Sociological Methodology (vol. 21), summarized several of Freedman's arguments thusly:
- Simple [statistical/data analytic] tools should be used extensively. More complex tools should be used rarely, if at all. Thus, we should be doing more graphical analyses and computing fewer regressions, correlations, survival models, structural equation models, and so on.
- Virtually all social science modeling efforts (and here I include social science experiments, though I'm not sure Freedman would) fail to satisfy reasonable criteria for justification of the stochastic assumptions. (p. 338)
Much to the dismay of social scientists, Freedman adopted (during the latter part of his career) a critical view toward the use of linear modeling techniques such as regression, path analysis, and structural equation modeling in sociology, psychology, and education, among other disciplines. We agree with Freedman's arguments and view Observation Oriented Modeling as a relatively simple analytic tool that entails fewer assumptions than the traditional linear modeling techniques. We also hold that Freedman should have taken his critique further to include the measurement issues noted above.
- It is an old lesson lost, but between-person aggregate statistics can often obfuscate rather than enlighten our understanding of psychological phenomena. How many times, for instance, have psychologists posited ideas like "the two processes are dependent; that is, they interact" or "the personality traits are universal" and then support such hypotheses with between-subjects factorial ANOVAs or between-person factor analyses? The logical disconnect between aggregates (e.g., means and variances) and individuals or individual processes has long been forgotten. By stating that two processes are dependent, for instance, the psychologist is discussing causal forces that inhere in individuals, not statistical aggregates. The observations required to test the interaction of the two processes must therefore come from the same individuals. Stating that personality traits are universal also requires person-centered analyses to test such a claim, not between-subject analyses. Some of our work has shown that the Big Five, as represented by composite scores (or factors), are in fact not universal. We recommend Kurt Danziger's book Constructing the Subject for a thorough review of the history behind the "triumph of the aggregate" and for an informative bibliography.
- Modern psychology does not value exact replication. We should not be surprised then by this recent fascinating article in the New Yorker (and postscript) that discusses how social science effects often don't replicate in independent samples, or at best the size of the effects shrink. From our view psychological "effects" are built on a foundation of sand (i.e., NHST and positivistic methodologies) and the so-called "decline effect" is a product of the Human Nature that comes into play when picking over statistical results. The decline effect is not Nature attempting to reveal a secret to us. A related story involves a recently published and highly publicized study on psi phenomena in the Journal of Personality and Social Psychology. When a subsequent failed replication was sent to JPSP it was not sent out for peer review but was instead immediately rejected by the editor. This sad history is recounted here in The Psychologist, where the authors also point out that paranormal journals are ironically more likely to publish attempted replications than the ostensibly top-tier journals like JPSP. The lesson from all of this is simple: exact replication is an indisputable hallmark of science, and some of our "top" journals are simply not interested. Again, this is partly why we refer to most of the published work in psychology, including much of our own past work, as a hodgepodge of statistical anecdotes.
- Joel Michell's most recent plea (Qualitative Research meets the Ghost of Pythagoras) for psychologists to confront the ghosts in their history of redefining scientific measurement and then evading difficult scientific questions pertaining thereto. In 1998 Sternberg and Williams lamented the ideational stagnation of the psychological testing industry, and from our perspective this stagnation is the direct result of failing to address the consequences of accepting S. S. Stevens' subjective definition of measurement.
- It is worth posting the APA's 1999 attempt to reform the reporting practices of psychologists regarding their statistical analyses. As Gigerenzer above notes, these efforts have largely failed. Pick up any journal and you will not likely find effect sizes, confidence intervals, or even standard errors on bar graphs (all recommended by the APA). If you do find effect sizes, you will not likely find any serious discussion of the values beyond conventional interpretations (e.g., "d = .80 is a large effect according to Cohen's conventions."). It is moreover disappointing that the taskforce failed to mention the importance of exact replication in their report. In our view the taskforce was doomed to fail because the problems with psychological research lie deeper than failing to report p-values to five or six decimal places or failing to report effect sizes for aggregate statistical analyses.
- A consortium of NIMH editors chimed in in 2000 with an editorial statement in Applied Developmental Science. Notably, they stated: "We believe that traditional, variable-oriented, sample-based research strategies and data analytic techniques alone cannot reveal the complex causal processes that likely give rise to normal and abnormal behavior among different children and adolescents. To a large extent, the predominant methods of our social and psychological sciences have valued quantitative approaches over all others, to the exclusion of methods which might clarify the ecological context of behavioral and social phenomena."
It is our hope that by reading these chapters and articles you will gain an appreciation for the deep-seated need for a new approach like Observation Oriented Modeling with its seven principles:
- The primacy of observations
- Aggregation often leads to obfuscation
- Outliers are people too
- View the world through the lens of an integrated model
- The primacy of accuracy and repeatability
- Estimate population parameters in their proper, limited role in psychology
Observation Oriented Modeling invokes Aristotle's notion of final cause which Thomas Aquinas considered the most important species of cause. Are psychological theories that invoke final cause possible? We believe they are, and the best example is Joseph Rychlak's Logical Learning Theory which he tested over several decades of research. We also recommend Rychlak's books: The Psychology of Rigorous Humanism, Introduction to Personality and Psychotherapy, and In Defense of Human Consciousness. In Rigorous Humanism professor Rychlak provides a tabled history of Aristotle's four causes (material, formal, efficient, and final) in philosophy and in science. Our work with Vladimir Lefebvre's algebraic model of cognition also leads us to think that it may be workable in final cause models. It certainly provides a formal cause model of the cognition involved in binary decision tasks. Lastly, Bill Powers' Perceptual Control Theory, which posits that "behavior is goal directed and purposeful, not mechanical and responsive", invokes the notion of final cause and offers a promising framework for the development of the types of integrated models advocated in the OOM book. Because integrated models are causal models, efficient cause, which is allied with the randomized controlled experiment, as well as formal and material causes are all treated in OOM. Aristotle made it clear that all four causes are needed to understand nature.