By Sherwood Waldron Jr.



Psychoanalytic efficacy has been demonstrated in general, but not in comparison with other therapies, nor with detailed study of the relationship between process and outcome. The steps necessary to accomplish such studies are outlined, along with a review of our present readiness. Crucial dimensions of such work are explored, including the use of single case studies, and ways of looking at sequences of interaction between analyst and patient as they change during the various phases of treatment. Methods of using control and comparison groups, and follow-up studies are described, and various promising specific strategies are proposed.

What attitude do most Americans have toward psychoanalysis? Many analysts would, I believe, agree with the following characterization: that psychoanalysis is an alien procedure to most Americans, and they would rather simply talk to someone about their personal problems than seek the aid of a psychoanalyst. Furthermore, even educated people are unfamiliar with the idea that psychoanalysis may be more effective than psychotherapy for ordinary problems. Of those who have taken introductory psychology courses, many have heard that mature college professors without psychotherapeutic training are as effective in helping troubled students as trained psychotherapists – an unwarranted conclusion at best (Strupp & Hadley 1979)(1). Few, certainly, have been told that persons with psychoanalytic training have more success in helping people than those with other credentials or no credentials at all, in part because systematic studies have not been conducted which would demonstrate the difference in results, if any.

Today, a century after Freud’s first case reports, the outcomes of different psychoanalytic treatments(2) have scarcely been compared with one another in a methodical, scientifically valid manner (Bachrach et al 1991). Many thoughtful professionals regard this as negligent, and it certainly jeopardizes support for psychoanalysis as a therapeutic procedure. In view of the difficulties besetting such studies, it is understandable that psychoanalytic organizations have not given them a high priority. Now, however, many of these difficulties can be surmounted by methodologies currently under development and research strategies applicable in the immediate future. The purpose of this paper is to discuss these possibilities(3).

It is by no means implied here that a systematic study of psychoanalytic cases will of itself necessarily lead to clear-cut, uncontroversial conclusions. There are many issues of the development of psychoanalytic theory and the interpretation of results which will influence the fruits of proposed empirical studies. Scientific advancement in the social sciences is far more complex, far more of a social phenomenon, than in the natural sciences (Mishler, 1990), and psychoanalysis is no exception. Hence the impact on our field of findings from empirical studies cannot be predicted. There is, however, little basis for the pessimism in regard to the value of conducting such studies, fraught though they inevitably are with difficulties in interpreting the significance of individual findings (Edelson, 1984).

What aspects of the patient, the therapist, and the treatment would constitute important variables in empirical studies? Unfortunately, these have not been successfully defined even for short-term treatments, and long-term treatments are much harder to study for practical reasons. One would hope that theories of psychoanalytic and psychotherapeutic technique would serve as a basis for specifying the significant aspects of the treatment procedure, but as yet there has been little systematic study of data in relation to such theories (Fine & Fine, 1990). Hence there are few if any agreed-upon criteria, except of the broadest kind, for distinguishing one treatment from another, and little empirical data to verify such distinctions. An important example is the role of interpretation as opposed to the role of the relationship, including so-called corrective emotional experiences and corrective object relationships. The relative importance of these two aspects has never been systematically assessed.

It has been suggested that we can reduce the complex variables in such studies by focusing exclusively on short-term treatments, in the hope that findings can be generalized to long-term treatments, and by ensuring that a specified treatment has been given strictly according to manuals that instruct the practitioner in correct procedure – what is described as “manualized psychotherapy.” These suggestions, which leave the experienced professional doubting whether crucial human interactions could possibly be captured by such abbreviations and oversimplifications, are not recommended here.

Clinicians and others who have expressed a wish for a study demonstrating “the results of psychoanalysis” are often unaware of the need to study the process as well. There are particular problems in studying the process because the clinician must engage in some fairly extensive and inherently uncongenial data collecting; nevertheless, the field is unlikely to advance without carefully examining what actually takes place in treatment. Psychoanalytic procedures vary a great deal in practice, as every clinician knows, and because of this variability it would be hard to interpret the results obtained from studying outcomes alone. Psychoanalyses need to be studied over their entire course, and the processes as actually observed related to outcomes. Yet difficulties in the development of reliable measures of process have been the major impediments to research (Schlesinger 1974), and therefore the development of such measures is a major focus of these papers. For example, a core aspect of psychoanalytic or psychotherapeutic process is the quality of interventions. In a recent NIMH review about psychotherapy outcome research (Borkovec & Miranda, 1996, p 15), the authors offered their opinion that “despite initial attempts for some types of therapy, there is no valid way to measure quality for any therapy technique.” It is clear that studies which do not develop and use some valid way of evaluating the quality of psychoanalytic work are unlikely to contribute to advances of our understanding the relationship between process and outcome (see later discussion of the Analytic Process Scales, Waldron et al 1995, for an example of a reliable approach to assessing quality). The relationships between processes and outcomes are also complex and in some ways a matter of changing definitions, which need carefully to be evaluated in carrying out studies (see excellent discussions in Stiles, Shapiro & Harper, 1994).

This paper attempts to anticipate and suggest the methods that will be required to study the efficacy of psychoanalysis and related therapies, while recognizing the hazards of premature commitment to incompletely developed methods. The attempt appears worthwhile to me despite the hazards, because up to now, no extensive systematic efforts have been made to study material derived directly from psychoanalytic treatments with a view to evaluating the process and relating it to various indices of outcomes. It is time for the psychoanalytic profession to follow in the footsteps of Freud, who in The Interpretation of Dreams used new, untried and controversial methods of data collection. We must collect a representative body of cases and further develop the methodology to study them, even though many thoughtful individuals will undoubtedly object to each of the possible methods and the ultimate benefits of such studies cannot be predicted. Two steps need to be taken to accomplish process-outcome studies:

Step 1: The clinically relevant dimensions of psychoanalytic processes must be reliably assessed, by outside observers as well as by the treating analyst. A number of important developments have occurred in recent years in the methodology for assessing psychoanalytic and psychotherapeutic processes. Close study of the available instruments will show that although many of them have demonstrated promise, further work is necessary to determine fully their validity and reliability.

Step 2: The scores derived from these assessments must differentiate one treatment from another in a clinically meaningful way. In other words, evaluators will have to be able to distinguish the qualities of a treatment, such as those of the patient, the therapist, or the patient-therapist interaction, that have important predictive properties. Studies therefore must be designed that permit valid estimates of the relationships between specific dimensions of psychoanalysis and the outcome of treatment. Here we are on much less firm ground than in the first step: there has been little systematic study of the relationships between observed processes and ultimate outcomes (Wallerstein, 1986). This is largely because, in the past, the goals of the first step had not been achieved.

In order to orient the reader to the issues involved in designing efficacy studies, I will begin by addressing the broadest aspects, mentioned in step 2, even though they depend upon developing the instruments described in step 1. In subsequent articles I will present detailed considerations of data collection, ways to assess and characterize psychoanalytic and psychotherapeutic processes using the data, and specific studies that can then be undertaken.



Psychoanalysis is quintessentially a complex process. Research efforts inevitably entail simplification, but any effective study must retain sufficient complexity to permit advance. In a comprehensive discussion about oversimplifications in psychotherapy research, Elliott & Anderson (1994) describe a number of pitfalls which should be avoided in designing as well. These include, among others, oversimplification by the use of only one variable or perspective in assessment, or only one level of measurement of a central aspect (such as quality of intervention, for example), or only a few points in time. Of equal importance is the failure to take into account the patterns or configurations of various elements (such as the relationships between the type of intervention and its quality, and the patient’s state of readiness at the time of the intervention). Many previous efforts have failed to contribute to our clinical knowledge because of failure to take into account the complexity of the subject matter, and this problem needs to be addressed by collaboration between experienced and sophisticated researchers with equally experienced and sophisticated clinicians.

There is a conflict between the established methods of psychoanalytic investigation and those most often utilized in the natural sciences: the former emphasize understanding the peculiarities of the individual, while the latter focus upon large numbers of individuals studied under standard conditions. It is not surprising, then, that some of the most interesting methodological developments (L. Horowitz et al, 1975, 1989; Jones & Windholz 1990: Kächele & Thomä 1995 p.121; Nye, 1991, etc.) have dealt with individual cases, for types of data analysis based upon understanding such cases in depth directly reflect psychoanalytic thinking, and are the most feasible using our current methodologies. Once a study of a particular case or cases has demonstrated a potentially important relationship between an aspect of the process and the outcome, the next step is to establish how well this relationship applies to a spectrum of similar cases. Then a finding can be stated as applying to a population of psychoanalyses (Edelson, 1984).



There is an extensive recent literature on the merits of single case studies (Kazdin, 1986). The study of individual patterns may well overcome the skepticism of the many experienced analysts who, citing the uniqueness of each patient, have questioned the utility of systematic research. As soon as one can specify dimensions that are relevant to a particular individual, whether in regard to symptoms (Battle et al, 1966), defenses and character traits (Perry & Cooper 1986; Perry et al, 1989, Vaillant 1986), ego capacities and functions (Wallerstein, 1988a; DeWitt et al, 1991; Zilberg et al, 1991) or personality styles (M. Horowitz et al, 1984), there are many possible ways (to be described further subsequently) of ascertaining how these specific characteristics are engaged in the analytic or therapeutic process, and what changes then are observable in these specific dimensions of the individual.

It has become paradigmatic to investigate a sample of a population in order to discover how one set of variables (e.g., initial diagnosis) relates to another set (e.g., analytic outcome.) Investigators implicitly or explicitly generalize from the individuals studied to the larger population from which they were drawn. This is the classical method of population sampling, for which statistical methods have been developed. Over the years ever more sophisticated procedures have been elaborated to allow investigators to draw reliable conclusions from samples, and even, in many instances, to provide quantitative estimates of the probability that a given conclusion is valid for the larger population (Stigler, 1986). The level of sophistication that statistical methods have reached, their quantitative results, and their fruitful application to a wide range of technological and scientific problems have led to their current prestige.

These methods are, however, limited in two respects. They are inapplicable to problems that do not meet their underlying assumptions, as when the object of study is a unique or rare event – a major historical occurrence, for instance – and no underlying statistically distributed population exists from which a sample can be taken. They are also inapplicable when the technical requirements for achieving an adequately studied sample far exceed the capabilities of the investigator. For example, as the number of variables increases, the size of the sample needed to demonstrate the significance of the contribution of any one variable likewise increases. In highly complex systems with many interesting variables, statistical sampling may become wholly impractical. The immense prestige of sampling methods should neither blind the psychoanalytic investigator to the virtues of other methods nor lead him to equate them and only them with methodological rigor (Edelson, 1984). It would be a mistake to assume that the limited value of sampling methods for studying psychoanalysis means that scientifically rigorous investigations are impossible.

An important alternative to sampling strategies is the case study, which attempts to reach valid conclusions by exploring a single situation in depth (Jones, 1993). It has acquired an undeserved reputation for being less rigorous then other empirical methods, largely because it has been misunderstood as a variation on survey or quasi-experimental designs (Cook and Campbell, 1979). Useful case studies are characterized by a careful design that lays out the study’s goals and methods, the situation to be investigated, the logic that links observations with conclusions, and the criteria for determining to what extent that link is satisfactory. (Nachmias and Nachmias, 1976).

Case study methods have been extremely fruitful and informative in a variety of situations. In medicine, case studies were the principal means of investigating diagnosis, pathology and therapeutics until the middle of this century(4). The accumulation of case histories over the centuries led to those formal generalizations that constitute the most important basis for the classification of physical illness. Biology also owes a great debt to case study methods. Darwin’s researches, for example, focused primarily on case studies of organisms living in various environments. From these, he generalized principles in a manner that illustrates the power of non-experimental methods to reveal underlying mechanisms. Case study methods have proved highly effective in disciplines ranging from the history of science (Conant 1957) to the study of business enterprises (Cheape 1985; Dalzell 1987; Popple 1974; Smith 1966; Tolliday 1987). Anthropologists have relied on the case study method in the development of their field (Geertz 1983) and similar methods have played a central role in sociology (Yin, 1989)(5).

Kazdin (1986) has pointed out three major advantages of the case study method for psychotherapeutic research. First, comparative studies of populations provide information only about a composite “average patient”; whereas case studies can provide insights into the mechanism of individual change (Barlow, 1981). Second, single case design allows a sharper assessment of whether an observed change resulted from treatment or some other cause. The flexibility of the single case study permits quasi-experimentation to produce a clearer picture of causal links than a population study can normally provide: phenomena of interest can be isolated and examined in more detail as they occur in a particular case; further instances can be sought in case material from the same patient. Finally, information about idiosyncratic features of patients that may be central to their psychopathology (Kazdin, 1982) or to their treatment is lost in population studies.

Yin (1989) has described situations to which the case study method is appropriate. Survey and sample methods are better suited to questions formulated in terms of who, what, where, how many, and how much; case study methods to questions of how and why a phenomenon occurs. They can also be used for preliminary exploratory investigations. In studies involving other methodologies, they can be used to describe and explain complex phenomena. The multiple case design, in which several case studies are performed, is an especially valuable research tool, in that it permits the replication of results and the comparative study of cases. It should be carefully differentiated from investigations based on sampling methods applied to a population of individuals. However, sampling methods in a case study can indeed be applied, but the sample is based upon individual instances of interest in the same case (such as a particular symptom, hour, utterance or behavior). A failure to address the issue of sampling from instances in a case study can limit the significance of the findings as severely as does improper design in any study. Findings are based upon sampling from many individuals in a population study, whereas findings from a given single-case study may turn out to apply only to that case, or to some subset of cases, or to the entire population of cases. Just how widely these findings apply may be determined by multiple case studies. As there is no absolute differentiation between case studies and population studies, the issue of how representative is a given series of cases needs to be addressed (Edelson, 1984). Psychoanalytic writings have often suffered from a failure to do this, and this failure can and should be remedied.

The inevitable personal involvement of the researcher in the material she is studying has led social scientists to recognize that case study data must be specially treated to avoid bias (Becker, 1958, 1967). Psychoanalysts have long recognized this problem in the analytic situation, but the research situation commonly poses problems of a similar nature with which even the most conscientious analyst is unlikely to be familiar. We expect distortions, resistances and other defensive operations in analysis, and we need to investigate these issues in the research arena as well. An awareness of our own irrational attachment to psychoanalytic ideas and the means by which we defend ourselves against contradictions to them (Greenacre, 1966, Edelson, 1984) can help us to deal with the impact of distortions arising from our own psychological needs in the analytic situation, and the same caution is needed in case studies. The analyst who employs case study methodology has to work hard to be aware of bias in his investigations. We can benefit from the techniques developed by social scientists to reduce observer bias in their own case studies. These include training in case study methodology, adequate protocols that include precise descriptions of the work to be done, and systematic review by peers (Yin, 1989).


Investigation of psychoanalytic processes in individual cases will often be enhanced by understanding the temporal relationship of events. An emphasis on the temporal aspects of analytic material has been the hallmark of much highly respected teaching since Freud. A group of analysts based largely in Washington, D.C. has been developing systematic understandings of these temporal relationships: Paniagua (1985) has described a systematic approach to what he calls “surface material” (see also Levy and Inderbitzen 1990), of which Davison, Pray and Bristol (1990) have published a detailed example in seeking evidence of mutative interpretations. Their efforts to classify the relationship between analysts’ interventions and patients’ response were preceded by only a small body of systematic work (Garduk and Haggard, 1972; Sampson et al, 1972; Luborsky, Bachrach, Graff et al, 1979; Silberschatz et al, 1986, 1988; Jones and Windholz, 1990), a paucity that probably reflects the inherent difficulty of studying sequences of events in a complex system. Some important recent methodological developments will be described below.

Gedo and Schaffer (1989) have developed methods of sequentially assessing alterations in interplay between analyst and patient, based upon ten randomly chosen sessions from early in a 324-hour analysis and ten from late hours. They coded the therapist’s statements as to whether they were interpretations, and the patient’s statements as to whether they demonstrated insight. Both the patient’s and the therapist’s statements were then coded to indicate whether they referred to the transference, using the Gill and Hoffman (1982) coding scheme, which assesses various aspects of transference-relatedness. The ratings for the presence of insight were not as reliable as the authors had wished; nevertheless they were able to characterize the degree to which the patient changed in producing more insights and more sequences of insights later in the analysis. They also showed how the patient’s insights were quite responsive early on to transference interpretations of the analyst. They used a Markov chain approach to analyzing sequential material(6).

In another study of the same patient, Nye (1991) developed ways to systematically assess whether both patient and analyst were telling stories or transforming them. She found it feasible to rate sections of the transcribed work in regard to whether the meaning of statements was being transformed or not. The resulting ratings served to operationalize the concept of whether the speaker’s words represented an effort to develop insight. The concept of insight is in turn related to whether something is being analyzed, and if the speaker is the patient, whether a self-analytic function is in evidence at that time. Her conclusions illustrate the kinds of findings possible with this approach. “Changes in narrative process over the three phases of treatment corresponded to predictions made based on the psychoanalytic literature on the acquisition of the self analytic function. Early in treatment, the analyst provided the function of questioning and exploring narrative meaning; during the middle phase, the function was performed jointly, and during the end phase the analyst was less active and the patient assumed the function”(p.28). Using totally different methodologies, both Gedo and Shaffer’s study and Nye’s were able to show evidence consistent with the hypothesis that interpretations contributed to changing the patient’s self understanding in the course of an analysis. Further research is required to explore to what degree the relationships found reflect a cause-and-effect relationship, since the findings could be explained by other hypotheses as well.

Examination of the interaction between variables considered crucial to analytic work is illustrated by the two studies just described. Interpretation and insight were the objects of study, both being dimensions of analytic work generally agreed to as important among analysts. Progress along such lines has, however, been limited by past unreliability in describing or measuring crucial psychoanalytic dimensions. It is still often believed by many that psychoanalytic ideas are inherently unmeasurable (compare Seitz, 1966). With these problems in mind, a research group of senior analysts in New York (Waldron et al, 1991) has developed reliable rating scales of analyst and patient response characteristics on many dimensions significant to analysts. Called the Analytic Process Scales (APS), they are applied to audiotapes and transcripts of sessions after the rater has oriented himself by listening to the three previous sessions to establish context. Ratings are made of the types, aims, characteristics and quality of interventions. Ratings of type include the degree to which an intervention is an encouragement to elaborate, a clarification, an interpretation, or a different kind of intervention, such as one that provides education, direction, praise, support or analytic work enhancing strategies. Aims rated include the degree to which the analyst approaches and works with resistances, transference derivatives, the patient’s conflicts, and problems of self esteem, as well as the degree of developmental focus in the intervention. Characteristics assessed include how confronting the analyst is, and how much the analyst’s feelings are manifestly influencing his conduct with the patient. Finally, the quality of the intervention is assessed: how well does the analyst’s response follow the patient’s preceding material, and how optimal overall is the intervention for the patient? The patient is also assessed according to how well she conveys experiences in a way that permits the rater (and presumably the analyst) to understand her conflicts, both in regard to the analyst and to the rest of the patient’s life. Then the patient’s productions are assessed as to analytic productivity and the degree of productive use that has been made of the analyst’s previous interventions. Each analyst and patient variable is defined in a coding manual, and illustrative examples are provided for scale points. Anchoring the variables to actual clinical examples has resulted in much more reliable ratings of essential aspects of psychoanalytic work than have been accomplished before. This approach has the advantage of working with psychoanalytically meaningful dimensions in a way that is both scientifically acceptable and interesting to clinicians, and provides measures which can serve to explore in a more systematic way the interacting forces at work between analyst and patient. Early findings from this group have included clear-cut differentiation of patient-analyst pairs from each other on a wide variety of dimensions. Phases of treatment have also been differentiated. Differing responses of patients to different analytic interventions have been seen in a pilot sample, and the pattern of scores when examined through the course of sessions has revealed meaningful relationships as well (Waldron 1997).

The establishment of reliable scores on the APS is an example of the importance of examining recorded material in sufficiently full context. Nevertheless, strategies for studying the interaction between analyst and patient may effectively omit portions of the material or alter the original sequence, for some limited purposes, . For example, leaving out interventions allows researchers to evaluate the changes from one segment to another without being influenced by their preconceptions about the particular interventions made. L. Horowitz et al (1975) removed all statements indicating that the patient felt that certain insights had previously been warded off, in order to provide the raters an unbiased opportunity to assess themselves whether a change in self awareness had occurred. Similarly, scrambling the sequence in which material is presented, so that it cannot be determined whether it came from early, middle or late sessions, enables researchers to test hypotheses about change without being influenced by their knowledge of where the material occurred in the treatment. Such careful and ingenious planning can enhance the value of a study. If the impact of observer preconception or bias is minimized, the reliability of conclusions drawn from a study becomes greater.

Two special methodological problems have to be surmounted in studying interaction. One of these is the problem of segmenting the material in such a way that the researcher can score what is going on at a particular point in the treatment, then use it as a basis for comparison with other points in the same treatment. We rightly regard an analysis as a continuous process throughout its course, one that may continue even after sessions have stopped; therefore a division into segments, with its unavoidable implications of discontinuity, will inevitably involve assumptions that may obscure more than they clarify. The other problem concerns the statistical aspects of how to assess changing relationships between variables in complex systems over time. Special tools, which will be described under the general heading of time series analysis, have been developed to deal with this.

1) Segmenting. Perhaps the simplest, most intuitive, solution to this problem, and one that is unquestionably effective in many situations, is to divide an analysis into sessions and regard each analytic hour as a discrete unit. Causal relationships can be hypothesized based upon which changes took place in earlier hours and which changes followed. For instance, if an analyst makes a certain kind of transference interpretation in regard to transference sexual fantasies or wishes, and a significant alteration in the analytic atmosphere occurs in subsequent hours, a causal inference can be proposed. Jones and Windholz (1990) successfully used the one-hour division in applying their Q-sort instrument to a series of hours throughout a lengthy analysis.

Often, however, investigators wish to explore the immediate responses of patients to specific interventions, This requires them to separate the analytic material into units shorter than whole sessions so that they can focus more precisely on the interaction. Many analysts, for example, believe that the analysis of resistance is central to our endeavors (Weinshel, 1984; Gray 1990). To study this relationship, it would be appropriate to divide the material into segments directly reflecting the interaction. Once this has been done, various measures can be applied which include whether the analyst addressed resistances, such as the analytic process scales described above (Waldron et al 1991). Another application could be that of a reliable resistance scale recently developed by Schuller, Crits-Christoph and Connoly (1991).

How then may a session be divided? In some studies, the segment has been an arbitrary unit, such as the fifty lines of typescript used by the current group at Menninger (Horwitz et al, 1989). Certain computer-based studies, such as the application of Spence, Mayes and Dahl’s (1994) study of the “analytic surface” using the co-occurence of first- and second-person pronouns use a 1,000 character search space in an effective way. Many researchers, however, would prefer to divide their material less mechanically, according to natural changes in the process. Although change of speaker is a simple, natural, and widely-used criterion for division, it has marked disadvantages because the size of each segment reflects inversely the activity of the analyst, as well as whatever patient factors may stimulate differences in analytic activity. It is better to use a method that is conceptually driven, such as one that identifies significant changes of topic, whether the analyst comments on them or not. Bucci and Stinson (personal communications, 1990) have developed a system of “Major Thematic Units,” and “Thematic Units” to demarcate the boundaries of topics in texts. Other investigators, including my group (Waldron et al, 1991), have found their method both reliable and easy to use.

Two studies may be cited to illustrate the fruits of well-conceived segmentation procedures. Gassner et al (1982) revealed that, with one exception, in the first hundred hours of Mrs. C. (a fully recorded case which has been extensively studied) warded-off mental contents emerged without the analyst specifically interpreting them. Similarly, in a study of psychotherapy, Elliott (1991a) used discourse analysis(7) to show that the client’s developing an important insight did not follow specific interpretations. Studies like these, by exploring the conditions leading to the development of insight, could lead to important changes in the theory of therapeutic action; and this in turn would help to clarify the role of interpretation and other factors in therapeutic change. Kris (1982), for example, has discussed the impact of the free associative procedure and of interventions aimed solely at facilitating the completeness of free associations. His views, among others, would provide an admirable basis for research into the preconditions of insight.

2) Time series analysis. When studying sequences of patient and analyst activities, researchers always encounter problems in assessing the patterns of change over time. These can be handled using a statistical approach called time series analysis, a well-defined discipline applicable to a wide range of fields, including the social sciences (Gottman, 1981, Gottman & Roy, 1990). Statistical methods are required to demonstrate meaningful correlations between a series of events because unaided human observers generally do a poor job of distinguishing chance variations from significant differences. Time series analysis aids the exploration of the source of change by assessing the statistical significance of patterns of change. For example, determining whether a change in average temperature over time reflects seasonal variation or some other phenomenon would be a question for time series analysis.

Time series analysis has been widely used in the social sciences to study discourse (Gottman, 1981; Gottman & Parker, 1986). Gedo and Shaffer (1989) have applied it to the psychoanalytic process. To illustrate, one time-series method involves comparing the score for a variable in a patient segment with the score for that same variable in a previous segment, the latter serving as a baseline. One then compares the score with another variable, such as accuracy of interpretation, from the intervening analyst segment. This process is repeated for successive segments, thereby enabling one to ascertain what impact the analyst’s intervention had on the patient’s functioning in that dimension, as studied over a whole series of interventions. If, for example, one assessed increases in analytic productivity by means of time series analysis and found that they followed accurate interpretations of transference much more frequently than would be expected by chance, this would support the hypothesis of a causal relationship.(8) There are many pitfalls and problems in designing and carrying out such time series analyses, and the newer techniques of analyzing time ordered data do not provide sure-fire answers to design problems (Elliott & Anderson 1994, pp. 83-86). However, careful attention to the measures used in relation to the goals of the study can lead to valuable results. It is possible to study the bi-directionality of influence in the psychoanalytic situation (ibid, pp. 90-91) and assess the degree to which the analyst’s approach is influencing the patient and vice-versa. For example, in the Analytic Process (APS) study described above (Waldron 1997), There was a patient-analyst pair showing a much more successful analytic process than was the case for two other pairs. For this successful pair, there was a strong relationship between the quality of the intervention and immediately subsequent patient productivity. There was also a moderate relationship between the patient’s productivity and the quality of the immediately subsequent analyst intervention. In other words, both parties to the analytic process had a facilitating role, and analysis of the interaction patterns supported the view that the quality of interventions made a special contribution to a successful analytic process.



For many years the use of multiple measures has been recommended to assess any characteristic of interest (Waskow & Parloff 1975). Agreement between findings from more than one approach increases our confidence in their validity and enhances our ability to generalize from them. There are important areas of overlap among the various measures that we can apply to treatments, and determining precisely where they differ and where they resemble one another would do much to establish their value. For example, Wallerstein’s group has developed what it calls “Scales of Psychological Capacities” (DeWitt et al, 1991; Sundin et al 1994) to tap the kinds of changes that most analysts believe are especially furthered by psychoanalysis and intensive psychoanalytic psychotherapy. These scales reflect capacities in living, and are clearly relevant to the quality-of-life issues that I will discuss shortly. They also reflect aspects of defensive functioning, when “defenses” are understood in a broad sense. It would be extremely valuable to apply these scales to patient material together with the much more complex method of assessment developed by M. Horowitz and his co-workers (1984). The Horowitz method defies succinct characterization, but certain comments can be made about it here. The assessment of a patient at multiple points in a treatment leads to thirteen dimensions in regard to symptoms, relationships, and the self, which are summarized in an instrument called the “Patterns of Individual Change Scales.” Changes in the patient are represented graphically in a way that is highly specific for the patient and clearly captures the actual changes (or lack thereof) brought about through treatment. Both the Wallerstein and Horowitz measures have the great virtue of reflecting how psychoanalysts actually think about their patients, especially in regard to important qualities in which they hope to effect change. Bringing these measures into a careful relationship with each other would therefore produce a whole greater than the sum of its parts.

Multiple measures are useful not only for validation but – and this is perhaps more important – for identifying changes in the psychoanalytic process through changes in the relationship of one measure to another. Skolnikoff (1985) compared the results of two different forms of data collection. First he dictated process notes immediately after each session. These were transcribed at the end of the week, and read by his collaborator, Emmanuel Windholz. He then began the following week by recounting in a freeform manner the previous week’s work with the patient. This report was tape-recorded. The collaborators found many discrepancies between the process notes and the tape-recorded recollections; moreover, these discrepancies were greatest at times that, in retrospect, had proved to be especially productive. In short, the analyst’s departures from neutrality tended to coincide with analytic progress. This discovery lends experimental support to Boesky’s (1990) view that effective treatment requires a complementary response in the analyst to the patient’s conflicts, a response usually marked by discomfort and a temporary departure from neutrality. Further studies should be planned of the variation in the reactions of both the analyst and the patient, using more than one source of information at various points in the analysis. A whole range of psychoanalytic process variables is possible based upon this approach. Von Benedek (1992), for example, has reported extensive recorded interviews with twenty psychoanalysts at the initiation of treatment and one year later, providing documentation of the complexity (and imperfection) of the analyst’s response to the patient over time.

Process notes normally focus on the analyst’s observations about the patient, while tape recordings of sessions provide only the spoken words of both participants. In keeping with an increasing emphasis on the emotional reactions of the analyst herself, new sources of information have come to include the analyst’s unspoken thoughts and feelings, and even unspoken associations, visual imagery and bodily sensations (Gardner 1983, Jacobs 1973). So far however, there has been little systematic accumulation of such information. Many analysts might be more willing to write down their reactions during or immediately after a session if they felt assured that they would not be embarrassed by subsequent exposure. Tape recording the same sessions would allow comparisons to be made between the analyst’s internal experience and the external discourse. Using a special diary as a data source, Calder (1980) has demonstrated the value of such self-scrutiny in his study of self-analysis. Meyer (1988), in a small but elaborate study, has compared recorded sessions with”retro reports” dictated by the analyst immediately after each session. His clinical exploration of analytic thinking using this method appears to me to be well thought out and may hold much promise for future developments (see also Kächele, 1988, p.66).



Evaluating the quality of the patient’s life after treatment is obviously central to any attempt to ascertain the efficacy of psychoanalysis. Multiple measures are very important in this regard, because so many different aspects – the quality of relationships, relative freedom from severe symptoms, and the capacity for a productive daily life – contribute to a person’s overall level of mental health. However, this task is not as difficult as it may appear: research has shown that measures that assess the various dimensions of mental health from interviews have become increasingly more reliable, in that clinicians and others agree far more often than one might expect about how healthy or sick a given individual is. This remarkable and encouraging finding emerged from the use of the Health-Sickness Rating Scales (Luborsky, 1962; Luborsky & Bachrach, 1974). After determining the general level of the individual’s health, based upon the study of a particular data source (process notes, case reports, tape-recordings and so on), a manual containing thirty-four case illustrations graded on a 100-point scale is consulted, and the health-sickness rating is arrived at by deciding whether the individual in question is more or less healthy than a given case illustration. This has turned out to be a highly reliable method of evaluation, one that produces little disagreement about the degrees of illness exhibited by a wide range of patients from radically different backgrounds and with radically different symptom pictures. In other words, these studies have shown that mental health has to a considerable degree a unitary quality. Furthermore, there is a strong correlation among the various subdimensions of mental health (Luborsky 1962; Luborsky & Bachrach 1974; Waldron 1976, Ogles et al 1995), leading us to believe that the health-sickness rating represents an important property of the individual that is completely relevant to psychoanalytic efficacy research. Hartmann’s (1939/1956) concept of a unitary adaptive function appears to be supported by these empirical findings(9).

Clinical assessment methods can be applied to various forms of primary data derived from diagnostic or therapeutic interviews. Global assessments of mental health derived from such materials tend to correlate strongly with such “objective” indices of social impairment as educational and job history, marital status, criminal record, and so on (Robins, 1966; Waldron, 1976). Epidemiological and developmental studies from several centers concur in this finding (Robins, 1966; Dohrenwend et al, 1980; Vaillant 1974, 1975, 1976, 1978, 1986; Vaillant & McArthur 1972; Vaillant & Vaillant, 1982, 1990). Wallerstein (1986) has provided perhaps the richest evidence for the interplay between “objective” indices and the course of a person’s psychological unfolding over decades.

Quality of life measures. The psychoanalytic understanding of mental processes has long recognized that an absence of symptoms cannot be equated with mental health. Nevertheless, psychiatric study has tended to focus on target symptoms because they provide a definable area for research (Battle et al, 1966). Recently, many researchers have come to realize that the measures for evaluating the outcomes of clinical interventions must reflect more than an absence of pathology (Greenfield, 1989). In evaluating coronary care units (Elwood, 1988), health care systems (Brooks, 1991, Nord, 1991), treatment of end-stage renal disease (Parfrey et al, 1989) and treatment of cancer (Reizenstine, 1986), the question has become not merely whether intervention has eradicated the disease, but whether it has made the patient’s general quality of life better or worse; and investigators have developed methodologies to this end (see also Stewart et al, 1989; Markowitz et al, 1989; Wells et al, 1989; Gill & Feinstein, 1994). Research into psychoanalytic outcomes should follow their lead.

Psychological tests are an important source of information about patient functioning, especially as they are largely protected from contamination by the motivations of either the patient or the analyst. Recently, Blatt and his co-workers (Blatt 1990, 1992; Blatt & Berman 1984) have developed measures of object-relatedness based upon the Rorschach test, and these have been productively applied to the protocols at the outset and termination of most Menninger Psychotherapy Research Project patients. They have found an interaction between the patient personality type, the type of treatment applied, and the results of the treatment. This illustrates the advantage of carefully appraising the treatment actually given and the patient’s actual response to it. Such a corroborative source of information about outcomes would be a valuable addition to any efficacy study.

We can also evaluate quality-of-life using several validated self-report instruments that correlate with assessments by experienced clinicians (L. Horowitz et al, 1988; Fisher et al, 1989). Analysts generally regard psychological data derived from self-report instruments as superficial; however, if the instruments are well chosen, such information allows us to evaluate the results of analytic work in settings where clinical evaluations and follow-ups are not feasible. These instruments may be useful in situations where repeated assessments are needed and in gathering data on control groups or comparison groups.

Developing process measures which correlate with quality-of-life measures. One reason to study the relationship between treatment and outcome is in order to find out whether we can use materials derived from treatments to assess their benefits with reasonable accuracy.(10) For example, can we assess the quality of relationships, capacity for productive involvement, and relative freedom from crippling symptoms with the aid of detailed process notes or tape recordings made toward the end of treatment? Such methodological advances would provide an important springboard for substantive studies.

Process-outcome studies in the closing phases of treatment could proceed by assessing how the patient relates to the analyst, or by assessing other aspects of the patient’s functioning based upon his reports during treatment of his ongoing daily life. Clinicians have often observed changes in the way their patients relate to them as a successful analysis draws to a close; however, these changes have not been systematically studied except by Pfeffer (1959, 1961, 1963) and those inspired by him. It would appear that the patient re-experiences the same core transference pattern during the brief period of the follow-up, but it no longer holds the same unbending sway over him and he is able to mobilize adaptive responses, especially that of self-analysis (see also Schlessinger and Robbins, 1983).

There have been other systematic findings that reflect the way the patient relates to the analyst, to others in her life and to herself toward the end of treatment: Dahl found less stereotypy in frames toward the end of Mrs. C.’s analysis. A change of this nature makes clinical sense, reflects desirable shifts of personality, and can be confirmed by other observers(11). Similarly, Luborsky, Crits-Christoph and others (1988) have ascertained that in psychotherapy that has been judged successful on other grounds, patients describe events (“relationship episodes”) in a less stereotyped manner toward the end of treatment. The patients’ scores on the Core Conflict Relationship Theme (CCRT) changed correspondingly, the greater variety of themes directly indicating that they were no longer stuck in their old patterns to the same degree. Bucci’s studies of changes in referential activity (RA) permit additional confirming (or disconfirming) assessments of changes in patient material (see Dahl et al, 1988).

Other process measures can be developed which may prove valuable in assessing the change that occurs from the beginning to the end of an analysis. For example, if we could measure the degree to which the patient associates freely, we might be able to directly evaluate the quality of the work of analysis (A. Kris, 1982). Spence, Dahl and Jones (1993) have made such an effort in looking at lexical co-occurences in relation to changes through an analysis. Changes in symptomatic impairment as manifested in the analytic hour can readily be studied (compare Jones & Windholz, 1990); and the quality of the patient’s life outside analysis, at least from the patient’s perspective, can be rated from what she tells us about her relationships, productivity, and symptomatic impairment.

To date, no study has explored the relationship between such process-derived measures of outcome and the gathering of information by various means at follow-up. There is a large body of data, comprising the more than fifty cases that have been studied using the methodology of Pfeffer (1959, 1961, 1963; Schlessinger and Robbins 1983; Oremland et al, 1975; Norman et al, 1976), which might be used for this purpose. Collected material could be studied from two points of view, that of outcomes as judged by the process and that of outcomes as described by patients to the follow-up analyst. Other studies along similar lines could be conducted within treatment centers at our institutes, in which systematic efforts at data collection would be made in order to assess process and outcome at beginning, end and follow-up using multiple measures. Studies of this kind are particularly important because the validity(12) of any assessment of efficacy of psychoanalysis is best established through convergent measures.

Our efforts would be greatly facilitated if it could be determined whether outcome measures derived from material recorded toward the end of treatment accurately predict outcome measures derived from material collected during follow-up. Follow-up studies are difficult to arrange at best, whereas it is relatively easy to record treatments (although it is not easy to persuade analysts to record). For this reason, establishing the relevance of process-derived outcome measures to ultimate outcomes would make a much broader sample of cases available to researchers interested in evaluating efficacy than could possibly be obtained from follow-up studies alone.




In the Menninger study (Wallerstein, 1986), a large proportion of patients showed substantial positive changes in their health-sickness ratings (Luborsky, 1962, see Bachrach et al 1991 for statistical summary of changes).(13) This is an encouraging finding; however, in the absence of a control group – that is, a group of subjects who were treated by other methods or not at all – for comparison, we cannot assume with complete confidence that these improvements resulted solely or even primarily from treatment (Malan et al, 1975). It is true that clinicians are generally convinced that the changes they observe in their patients are influenced, at least to a considerable extent, by the therapeutic relationship; on the other hand, it is also true that people do make improvements on their own, or with the help of Alcoholics Anonymous, various self-help groups, organized religion and other aids. Longitudinal studies have shown changes during the life cycle which sometimes suggest very substantial improvements (Vaillant, 1976, Vaillant & McArthur, 1972; Vaillant & Vaillant 1982, 1990; Wallerstein 1986), and these are sometimes brought about by the individual reflecting on his own characteristic behaviors, without any significant psychotherapeutic intervention. Vaillant (1976) has reported instances of this in connection with mid-life crises. If outcome studies are to support the value of psychoanalysis and other allied therapies, it is not enough for them to demonstrate @that positive changes occurred – they must also demonstrate through the use of control groups that these changes were substantially less likely to have happened without treatment.

It would be grossly unethical, of course, to withhold treatment from persons who need it in order to create a control group. It would be possible, however, to do collaborative studies comparing the immediate and long-term outcomes of patients treated by analysis with those of patients treated by non-analytic modalities. This approach is consistent with thinking in regard to controlled studies of cancer patients: the emphasis has changed from having only one control group to having different kinds of comparison groups (Gehan & Freireich 1974). The study of any comparison group can tell us something about the relationship between the processes of treatment and outcome.

To a certain extent, cases within the study population itself can provide a kind of control group, since in virtually any study there will be persons in whom “analytic process” however defined will not occur. For instance, some patients will not develop reflective self-awareness specifically tied to understandings derived from interpretations. Studying the differences between these patients and those who develop a more typical psychoanalytic process would accomplish our goal of relating process to outcome, whether or not a given treatment was intended to be a psychoanalysis! In other words, one source of comparative information about the impacts of a typical psychoanalytic process may be the differences both at the time and subsequently, between those who do and don’t work with their psychoanalyst in a way characteristic of a psychoanalysis. The Menninger Psychotherapy Research Project (Wallerstein 1986) provides the best systematic documentation of the way many cases assigned to a psychoanalysis ended up having very different actual treatment experiences (see also Erle 1979 & Erle & Goldberg 1984).

Another kind of control group for psychoanalysis might be found in a community large enough to provide a sufficient number of patients but in which analysis is unavailable – for example, rural Stirling County in Maine, which provided the study population for the extensive study of mental health in a community by Leighton et al (1963). Data might be collected from such a control group with only moderate funding and the services of a single supervising analyst, who would coordinate data collection longitudinally during regular visits to the community. In fact, the data from studies like the Stirling County study may already be suitable to form comparison groups. Such longitudinal projects, including the one reported by Vaillant (1986), have accumulated extensive databases that might well be adapted to yield comparison data bearing on the natural course of health-sickness. If the measures applied to analyzed cases (Waskow & Parloff, 1975) can also be applied to other longitudinal databases, we will be able to compare changes in health-sickness following non-analytic therapies with those following psychoanalysis.




In order to demonstrate that the benefits of psychoanalysis are not only real but lasting, follow-up studies are indispensable (Wallerstein, 1992). Unfortunately, there have been very few efforts to collect follow-up data across a broad range of patients. The data collected in follow-up of the Menninger cohort over a period of up to thirty years (Wallerstein, 1986) is available; the studies using Pfeffer’s methodology (Schlessinger & Robbins 1974, 1983; Oremland et al, 1975; Norman et al, 1976) are in effect follow-up studies; and the termination of analysis has been studied systematically by Schachter (1990; Panel 1989) and by Firestein (1978). The work of Knapp (1960), Klein (1960) and Kantrowitz et al, (1987a&b; 1990a-c) is also relevant in this regard. Admittedly, follow-up investigations pose formidable practical problems, not the least of which is securing the necessary log-term commitment and funding. However, the widespread belief that it is harmful for an analyst to contact her former patients should not be allowed to complicate an already difficult situation. The experiences of the investigators that I have mentioned, especially those of Schachter’s group (1990; Panel 1989), convinced those who collected the data that, far from being harmful, such contacts were actually beneficial to many patients. Of course, this finding needs further confirmation by other studies.

Follow-up research can also provide opportunities for studying the mechanism of therapeutic action. To the best of my knowledge, the relationships between the patient’s initial problems, the subsequent course of treatment, the patient’s report of what seemed beneficial to him in retrospect, and the analyst’s report have never been systematically studied. Some of the data collected by Schlessinger and Robbins (1974, 1983; also Schlessinger 1987) could be studied in this regard. It would be informative to study the degree of agreement between analyst and patient, and the degree to which core transference issues have been worked through. Information gained from this approach could help to illuminate the mechanisms of change in psychoanalysis (Appelbaum, 1978) and clarify the role of the match between patient and analyst (Kantrowitz et al 1990c).

Follow-up studies can also contribute a great deal to the education of analysts. With this in mind, it would seem reasonable to build follow-up agreements into the understandings reached with patients treated in our low cost clinics. The benefit to students and faculty alike of systematic follow-up might be considerable.




The extensive goals that I have described here cannot be achieved without extensive collaboration. The efforts required to initiate and sustain such collaboration are warranted when the findings can be expected to be of interest to most psychoanalysts, and to benefit the field as a whole. The scientific yields of collaboration were illustrated at the 1985 meeting of the Society for Psychotherapy Research (SPR) in Ulm and in the book by Dahl, Kächele and Thomä (1988) that resulted from this meeting. The meetings of the SPR have provided opportunities for scientific discussion, and the International Psychoanalytic Association (IPA) has recently begun an annual research meeting in London. However, these forums are not sufficiently accessible to most American psychoanalytic clinicians with a research interest, and the presentations at the SPR are generally distant from the central interests of psychoanalysts. There is as yet no central coordinated ongoing effort in the nature of a Task Force on Research under the aegis of the American Psychoanalytic Association, or its allied organizations. The efforts led by Wallerstein on a twice yearly basis, entitled the Collaborative Analytic Multisite Program (CAMP) have not so far led to a coherent enterprise with significant funding. Expenditures on research in psychoanalysis on an annual basis are minute, compared for example to the funding for research on brain wave imaging which has caught the imagination of many. Unfortunately, educated people have not become convinced so far that exciting advances can readily be attained in psychoanalysis through systematic research on a sufficient scale. The paths of research described in this paper could, I believe, lead to such exciting advances.

A related problem is that of the role of research in psychoanalytic education. A recent survey by Richards (1991) reveals that research teaching in most institutes is severely limited. Few even have a person specifically knowledgeable about research on their curriculum committees. Despite some conspicuous exceptions that I have discussed in this paper, there has been too little cross fertilization of ideas between clinicians and researchers in the psychoanalytic community: in fact, psychotherapy researchers and clinicians in general have had hardly any effect at all on one another’s thinking (Bachrach, 1992, Luborsky & Spence, 1978, Kazdin, 1986). National organizations should exert themselves to promote the exchange of ideas. Individuals or groups within each institute and society could be designated to facilitate the planning of collaborative studies of specific topics, and a consultative arm of the American Psychoanalytic Association could be formed to make experienced researchers with a knowledge of clinical work available to members who have research questions. This in turn might lead to engaging more clinicians in research efforts of interest to the clinical psychoanalytic community. In addition, coordinated efforts to raise research funds are sorely needed.

It is hoped that the broad overview presented by this paper may serve to inspire interest and support. I have prepared two subsequent articles to provide a further stimulus: the first considers in detail issues about data collection and utilization, and the second describes a series of specific studies which spring from the general principles espoused here. Many of the issues described here have been more extensively described in a recent volume edited by Miller, Luborsky, Barber and Dougherty (1993), which includes chapters by many of the authors cited in this paper. In addition, a 1995 volume edited by T. Shapiro & R. Emde has many thoughtful contributions addressing many of the same issues (see also Galatzer-Levy et al in press). Finally, many design and technical issues of importance are thoughtfully discussed in Reassessing Psychotherapy Research (Russell, 1994), as has been indicated by the several citations in this paper.


  1. Although the study cited was ingenious and carefully carried out, the conclusions that may be drawn from it are severely limited, as the authors themselves are well aware. First, there were significant differences between the two groups in that the students treated by them were not randomly chosen: for the most part, the therapists treated students seeking help at the university’s mental health facility, while the professors treated students who had responded to notices that had been distributed at large in order to generate more patients for the study. Second, because there were only about fifteen patients in each group the statistical value of the study was slight. Third, the patients were selected on the basis of elevated scores on the MMPI scales of depression, psychasthenia and social introversion, reflecting their feelings of alienation on campus. Contact with mature professors on the same campus, chosen for their warmth and ability to relate to students, was ideally suited to provide them with a “corrective emotional experience”, or at least a powerful supportive intervention. Fourth, treatment was restricted to a maximum of twenty-five twice-weekly sessions, a schedule that gave the therapists only limited opportunities to apply their skill. Finally, there was evidence that professional skill did indeed contribute something unique for those patients who had a positive rapport with their therapists. These facts might not come to the attention of those informed of the results of the study. For further evidence contradicting the hypothesis of the Strupp & Hadley paper, see Jones, Cumming & Horowitz, 1988.
  2. Throughout this paper, I refer to “psychoanalysts” and “psychoanalysis”, however, my hope is that most of the points made will be useful in regard to psychoanalytically oriented therapies as well. Systematic differences between psychoanalyses and psychoanalytic therapies have not as yet been established empirically (Wallerstein, 1986).
  3. In the Spring of 1988, the current and incoming Presidents of the American Psychoanalytic Association, Drs. H. Curtis and R. Simon, asked the Association’s Committee on Scientific Activities to summarize the research literature on the efficacy of psychoanalysis. A subcommittee, including Henry Bachrach as chair, Robert Galatzer-Levy, Alan Skolnikoff and myself, was formed to accomplish this task. The first result of this effort was a review of previous studies on psychoanalytic efficacy (Bachrach et al 1991). The present article grew out of the subcommittee’s continuing efforts to explore efficacy. Although I am indebted in many ways to the other members, the responsibility for the views expressed is solely mine.
  4. As will be discussed below, multiple case studies are distinct from population sampling methods. For example, a report in which a pathological finding is associated with a disease in twenty cases is simply a report of twenty cases and not a statistical sample.
  5. The use of statistical methods in a case study does not turn it into a sampling statistics study. For example, in a case study of the economic development of a single community, statistical sampling techniques may be used to investigate the community’s economics, but the object of the study is still a single entity, the community’s economic development.
  6. This consists in comparing the probability of any given remark being an insight with the probability of its being an insight following an immediately prior interpretation.
  7. “Discourse analysis” is not a set of theories or procedures, but is more loosely defined to include the ideas and methods developed by those interested in discourse.
  8. See Sexton (1993) for a sophisticated example of studying such change sequences in group therapy.
  9. There remains the thorny problem of whether, ultimately, such agreements about the mental health of individuals simply represents a shared cultural bias. However, I present the HSRS in such a positive way because the findings still represent a major contribution, in my opinion, and an important advance in our field, even if ultimately there are important limits to the generalizability of findings across cultures.
  10. The concept of patient – treatment – outcome congruence, from the Vanderbilt research group (Strupp et al 1988), is a useful one in this regard.
  11. Remaining to be established would be evidence of the generalization of these changes to the rest of the patient’s life.
  12. See Cook and Campbell (1979) for a helpful discussion of various aspects of problems of validity.
  13. See Bachrach et al, 1991, for a statistical summary of these changes.