The Quality of Group Performance and Individual Performance (1920 - 1957)
by Irving Lorge Et Al
(Photo Credit: Sam Felder)
The Article in Full
This is an analysis of studies done in the years 1920-1957 which contrast the quality of performance by individuals and by groups in diverse situations. A number of studies are included which add to our understanding of this aspect of human behavior. However, an unpublished review by Lorge et al. (37) prepared in 1953 served as an important source for this presentation. In fact, some of the organization of that report is carried into this study. The existence of this report is due to a literature search made in connection with research into group performance and group process in problem solving.
It is important to focus on basic concepts such as "group," "task," and "criterion." These terms have been applied in so many different senses and in so many different situations that clarification and differentiation are necessary for the interpretation of the research results.
The most ambiguous term seems to be that of "group," which not only is recognized in a variety of senses by lexicographers, but also is used with a wide range of meanings by social psychologists. The lexicographer considers a group as: (a) an assemblage of persons in physical proximity considered as a collective unity, e.g., a group by aggregation; and (b) a unity of a number of persons classed together because of any kind of common relation, whether of organization or of commitment. The social psychologist recognizes three kinds of groups: (a) an assemblage of persons in a physical environment; (b) an association of persons with some form of social, political, or managerial organization; and (c) a collective unity of members subscribing to a common symbol or loyalty. Sapir, in the Encyclopedia of the Social Sciences, distinguishes three classes of groups: (a) persons at a football game or in a train; (b) organizationally defined, as having some mutuality of purpose, e.g., employees in a factory or pupils in a classroom; and (c) symbolically defined, as serving some well-recognized function or functions, e.g., family, military staff, or executive cabinet.
A military staff or an executive cabinet, however, achieves its collective unity as a consequence of having interacted with one another over a considerable period of time, so that they have developed a tradition of working together for mutual and common purposes. This viewpoint allows one to think of the group as continuously emergent—the longer its members work together, the greater the possibility of developing a more cohesive and more cooperative team. Group cohesiveness, moreover, may be one of the resultants of interaction among a team's members that leads to the development of a group or team "tradition." The social psychologist tends to think of the "group" as having a "tradition," i.e., a cooperative association of individuals whose members have progressed through the states of coming together in physical proximity, of organizing for common goals, and of accepting commitment for the group's purposes. The members of a traditioned group will have assayed each other as resources and as personalities, will have established channels of communication, and will have achieved mutual reinforcement for the common goal. The traditioned group, therefore, is a functioning unity—functioning for a real and genuine goal. While the world's work is accomplished by many traditioned groups or teams or staffs, such traditioned groups have not been studied extensively, primarily because of the difficulty of access and their unwillingness to have others observe their processes.
Methodologically, it is important for social psychology to develop an understanding of the changing dynamics of the emerging groups. To do this, social psychologists have usually worked with ad hoc groups, i.e., some experimenter has assembled several individuals to work together mutually and cooperatively on some specific and externally assigned task. An ad hoc group, therefore, may represent one end of a continuum of "group" which extends from the just-assembled ad hoc, to the well established, traditioned group. Ad hoc groups, necessarily, will vary in the extent of cohesion that they achieve, as well as in the acceptance of the mutuality of purposes. Each externally designated ad hoc group, therefore, in some more or less tentative way, must organize, test each other's resources, accept the task goal, muster its resources to reach that goal, and then accomplish its end. Such experimental ad hoc groups usually cease to exist when the experimenter's purposes have been achieved. The research use of the ad hoc groups is exemplified in the experiments of Watson (70) and of Shaw (55). They each selected college students at random from the same class to form a group for the experimenter's purposes, and then, only for the duration of the experiment.
A common and dangerous practice is to generalize the principles valid for ad hoc groups to traditioned groups. The ad hoc group is treated as a microscopic model of the traditioned group. This might be true, but has not been experimentally validated. It is equally possible that ad hoc and traditioned groups behave in accordance with their individual principles.
The continuum, therefore, of ad hoc to traditioned groups constitutes an ambiguous and complex semantic range for interacting, face-to-face groups who deliberate to solve problems or produce joint products. In sharp contrast to the continuum of ad hoc to traditioned groups which do interact among its members, are the so-called groups whose individuals do not overtly interact with one another. Rather, these are groups only because their constituent units are in physical proximity. In social psychology, groups by physical proximity have been utilized primarily in research involving an individual's performance in a sociophysical setting of other individuals. Usually, the experiments have been studies of "social facilitation," designed to appraise the psychological consequences on the individual working in a mass or among one's fellows. The sociophysical setting has been termed by some psychologists as the "climatized group" but it must be recognized that the group is a group only in the sense of having members in physical proximity.
Three variations in "climatized group" are reported in the research literature. Of these, the type nearest to the real group provides for group discussion of a problem followed by individual judgments or estimations. Such a "climatized group" has interaction among individuals but no measure of group consensus. The jury experiments of Bechterev and Lange (7) and of Burtt (9) illustrate the pattern, e.g., the credibility of the testimony of witnesses is discussed by the "jury," followed by judgment by each individual on the issue.
The second variety of "climatized group" does not provide for discussion, but rather is a sequel to individual evaluation of group judgment or is an evaluation made by some open form of voting, like a show of hands. Gurnee's (23) 1937 experiment employs such a "climatized group": college students first took a true-false examination as individuals, and then repeated the same examination as a group. Group choice was determined by a show of hands. While "visual" interaction by observations of other members' voting behavior was evident, it certainly was not the overt verbal interaction of deliberation.
The third variety of "climatized" group has neither interaction nor consensus. In such a "climatized group," the individual works alone at his task in the presence of other people, as, for example, in the social facilitation experiments of Allport (1) and of Dashiell (IS), in which individuals either worked in isolation or in the presence of others, but without any interaction.
The research literature frequently refers to a type of group which is really not a group at all—rather, it is a consequent of statistical computations, i.e., averaging of the products of independent and non-interacting individuals. The "statisticized" group, for instance, was used in the 1924 study of Kate Gordon (21) in which college students as individuals judged weights. These individual judgments, then, were averaged to form "groups" of 5, or of 10, or of 20, or of 50. Since such a statistical "group" neither meets nor interacts, it does not function as a psychological entity. It is of dubious semantic advantage to designate the consequence of such statistical averaging or aggregating a "group" product. Experiments with the "statisticizing" technique may be more appropriately considered as evidence about the reliability of measurement (of one judge versus several judges) rather than about group dynamics. Basically, the "statisticized" group appraises aggregation, not interaction.
Another so-called group is the "concocted" group. It, too, neither meets nor interacts. In the "concocted" group the unique elements of each individual's products are combined to form a so-called group product. One form of "concocted" group is that in which each individual's products are summed to form the so called group product. A second form is represented in Marquart's experiment of 1955 (42). Individual's products arc combined so that one "solution" is assigned a fictitious group if at least one of the members working individually arrives at the correct solution. A "no solution" is assigned the group if none of the "group" members arrive at the correct solution individually.
In the experimental literature, the earliest studies were of the "statisticized" group, followed subsequently and in succession by the "climatized," the "concocted," the ad hoc, and most recently the "traditioned" group. Since the "traditioned" group is most like real life groups, the development may be considered to have moved along a continuum from artificial ("statisticized") to real ("traditioned").
The varieties of groups may be broadly classified, then, as follows:
1. Interacting, face-to-face group, i.e., involving group meeting and discussion:
a. with a tradition of working together (traditioned) b. with no tradition of working together (ad hoc)
2. Noninteracting face-to-face group, i.e., involving physical meeting, but no discussion:
a. with a sequel appraisal of group opinion (climatizcd) b. with a sequel appraisal of individual opinion (social climatized)
3. Noninteracting non-face-to-face group, i.e., involving no meeting and no discussion:
a. averaging of individual's performances (statisticized) b. combining of individual's performances (concocted)
This broad classification, of course, fails to consider every variant of "group." For instance, the above classification does not give appropriate consideration to the interacting non-face-to-face group which has been used to evaluate the effect of different kinds of interaction networks, or of different kinds of information, on individuals trying to achieve some common end. Some of the network research has had as its dependent variable the quality of cohesiveness or the speed of problem solving; some of the information control studies have been concerned with group and individual satisfaction or success in completing a task. None of the research, however, has been oriented toward group vs. individual comparisons. Therefore, these important studies are not considered in the following discussion.
This review, also, omits studies concerned with group psychotherapy or group discussions designed to develop insights about individual's attitudes. Group psychotherapy, in some ways, overlaps the "social climatized group" but its members do, or are expected to, interact. The individuals constituting the group arc selected by an outside agent because of his belief that they as individuals will be changed by the nature of their interaction in, and with, the group. There is no group goal, but there is an individual objective for each person in it—amelioration of maladjustments and the achievement of self-understanding. Similarly, in some forms of opinion research, the experimenter meets with an assemblage of individuals to elicit a gamut of attitudes and values about an issue. The assemblage meets for the analyst's purpose and not for the individual's, or the group's, objective, although its individuals may achieve some groupness. In group discussions for opinion research, the group provides an atmosphere which tends to facilitate individual contributions for the experimenter, but is not oriented toward either any participant's goals or a mutual task for all of its units.
Therefore, the review is limited to researches contrasting individual with any of the six groups conceptualized, i.e.: (a) traditioned, (b) ad hoc (c) climatized, (d) social climatized, (e) statisticized, and (f) concocted. Although some variations may be assigned cavalierly to the nearest broad category, nevertheless, such categorization may aid in reviewing specific studies. As such, the classification may provide a clearer basis for interpretations about conclusions involving the multimeanings of group.
CONSIDERATIONS IN INDIVIDUAL AND GROUP COMPARISONS
History of Subjects
Performances of groups, of course, are contrasted with those of individuals. The individual, too, is a multimeaning concept. The individual may be an executive thoroughly accustomed to making policy or taking action; or the individual may be any young person selected at random from a larger population to participate in an experiment. In the same sense that the "traditioned" group survives because of its joint ability to solve problems, so, too, the functioning executive continues because of his proficiency in setting policy or in making decisions. Comparisons of groups with individuals, indeed, should give full consideration to the similarity of the experiences of the groups and of the individuals. Logically and psychologically the traditioned group should be compared with the functioning executive; the ad hoc group with a random individual selected from the same supply. Researches all too frequently fail to appreciate the significance of, and the need forcontrastingthe equivalence for, groups and individuals in the quality level of their background or responsibility for action. Marston (44) demonstrated this in a study reported by Kelly and Thibaut (30). He showed that in the realm of legal judgments, a collection of untrained individuals may be an inferior judge of events when compared to a trained individual.
What performances should be involved in comparing the group with the individual? Is the group product to be compared with the average of the individual products? with the average individual? with the best individual? or with the "summated" individual? The usual procedure contrasts the average individual with the average group, although studies which contrast the best individual, or the "summated" individual with the best group may lead to different conclusions. Concern with the average disregards the fact that, in general, one (or more than one) individual exceeds the best group and conversely that one (or more) individual does worse than the worst group. Actually other mathematizations may be required to compare individual and group product; perhaps measurements based on probabilistic or other systems.
In such comparisons, motivation, too, is often ignored. The view frequently cited in the literature is that meeting in a group stimulates participation and discussion, as well as interest in, and acceptance of, the experimental task. For instance, if, in a group of five members, two or more reject the experimental situation or task, the group may still emerge with some final product. By contrast, however, when an individual is not motivated to accept a situation or a task, there is no product, or certainly not a representative one from such an individual. Some product of a group, therefore, may exceed that of an individual, primarily because of differential task acceptance.
In addition to differential motivation, there is also the possibility of differential acceptance of the responsibility in the experimental situation. The degree to which a feeling of responsibility for the decision affects the content and quality of group, and of individual, decisions is not known, but should be recognized in evaluating results. Obviously, the effect of responsibility in experimental situations cannot approximate the real situation.
Not only do "group" and "individual" vary in setting and in motivation, but also a partial confounding, in the statistical sense, exists between task and kind of group. For instance, studies using "statisticizcd" groups tend to use tasks requiring estimating or judging. "Estimating" refers to estimating the number of items, the length of lines, the weight of substances, things perceptible to the senses, e.g., Bruce's (8) Ss estimated numerosity of buckshot on a card, or Schonbar's (53) Ss estimated line length. The "climatized" group tends to be primarily used in "learning" experiments on improvement in knowledge of subject matter or the mastery of a skill, e.g., Gurnee (24) measured improvement in the mastery of a maze.
The ad hoc group most commonly is used in studies in "problem-solving." "Problem-solving" is to mean the thinking out of the correct answer to a problem. Shaw's comparison (SS) of problem-solving by individuals and by ad hoc groups is illustrative. Her problems were mathematical puzzles for which there is a known (or, at least, a knowable) solution. Such problems are less characteristic of life situations since few real situations occur for which the solution or the decision is known or completely knowable. The trend in research has been away from the puzzle with a Eureka solution to more realistic problems, where adequacy or goodness (not correctness) is the criterion, e.g., Maier's (41) human relations problem.
In this review, the range of problems from the puzzle to the human relations situation to the policy decision has been distinguished. While it is difficult to state wherein the processes needed to solve puzzles differs from those required to establish policy, it is felt that the nature of the potential feedback is not the same for all kinds of problems. Eureka problems can be evaluated as right or wrong, but human relations problems must be evaluated in terms of relative goodness—the range of considerations in the solution is evaluated, e.g., Maier's (40) parasol assembly problem. His problem had no correct or unique solution for adjusting the slow worker who is the bottleneck in an assembly line; rather, the several alternative plans for action must be appraised for "elegance" in terms of likely consequences.
The truly "traditioned group" as such, has not been used in researches on the quality of group product, although some approximations have been made in studies of quantity of "productivity," e.g., Cochand French (11) measuring output of factory workers. The multiplicity of different experimental tasks leads to a multiplicity of criteria. In "estimating," the criterion is the true order or true number; in mathematical problems or in puzzle solving, it is the right answer; in "learning," it is improvement; in "judging," and in complex problems, it is consensus of experts about the order of merit of the material or the quality of the decisions.
Studies of problem solving vary not only in nature of the group used, the kind of problem or task worked on, and the criterion, but they vary also in concern with the side effects of group participation. These include personal gains from the experience, commitment to decision, personality development or personal growth in empathy for the needs and feelings of others. This review is primarily concerned with studies estimating the quality of group, and of individual, products (although relevant studies of side effects will be considered). The review excludes group process as such, emphasizing only those studies contrasting the quality of the product from group interaction with the quality of the product by the individual.
Undoubtedly, there are many situations for which side effects are the major concern, and for which a group dynamics program has been instituted. Yet, in the military, in education, and in industry, quite frequently it is the quality of the group product that is the major desideratum, and frequently such side effects as commitment, morale, or feelings of participation are of less importance. One common inadequacy of all reviewed studies is that of Ss. Most tend to use any Ss available. The absence of studies with truly functioning groups contrasts sharply with the very large number that use college students. Among the major findings of this review of experimental work in group products has been the recognition of the relatively narrow base for the consistently broad generalizations about groups in complex situations, not only because of the narrow range of the kinds of Ss but also because of the narrow assortment of puzzles, games, riddles, and judgmental tasks. Insofar as the generalizations derive from the "statisticized," the "climatized," and the "ad hoc" group rather than from the functioning "traditioned" group, the generalizations in the main are founded on the behavior of college students with their less certain motivations and responsibilities rather than on the behavior of adults working under the genuine tensions and pressures of life. Thus, the generalizations will be limited and possibly not too realistic. Generalizations, psychologically, may be limited by kind of group, nature of population, the kind of task, and the basis of estimating correctness, goodness, or adequacy.
Judgment, in its long use in psychology, has often been used in research to contrast group with individual products. One type is exemplified in the work of Sherif (56), where the individual qua individual makes judgments in the presence of others to get an estimate of the effect of the group setting either upon the group's judgment or upon that of the individual. Another type contrasts the quality of judgments by the group with those by individuals to the same stimuli. In many reported contrasts of the judging of the "group" with the "individual," the "group" in the dynamic sense never existed; rather, it was an average of several judgments made by noninteracting and separate individuals, i.e., a "statisticized" group (3).
Judgments by Statisticized Groups
The earliest use of a "statisticized" group was by Hazel Knight in 1921 (33). In her best-known experiment, college students estimated the temperature of a classroom. The judgments of the individuals ranged from 60° to 85°; the "statisticized" group judgment was 72.4°, approximating the actual room temperature of 72°. The "statisticized" group judgment was better than that of 80% of the individual judgments, even though 20% of the latter are as good as, or superior to, the "statisticized" group. In a distinctly different experiment, the Ss, in the absence of any other information, ranked 12 children for intelligence from their photographs. Each 5 ranked the children independently; then "statisticized" group rank order was obtained. The "group" rank order did not correlate with actual intelligence test scores any better than the individual rank orders. Finally, an ad hoc group of 10 Ss, met together to discuss each photograph in order to obtain ranks by an interacting deliberative ad hoc group. The ad hoc group rank order was significantly more accurate than that of either the individual or the "statisticized" group ranking. While just one ad hoc group was too small for generalization, Knight developed a new approach to group versus individual judgment.
In 1923, Gordon (20, 21) began publication of her series of studies extending Knight's technique of the "statisticized" group. College students ranked weights appraised against the criterion of true order. The average of 200 correlations for that many individuals was .41. Averaging any five random individual rankings at a time, she obtained 40 "statisticized" group rankings which correlated with true order .68. Similarly "statisticized" groups of 10, 20, and 50 were computed. For four "groups" of 50, the average correlation rose to .94. Gordon, nevertheless, reports that among her 200 individual correlations, five were at least as high as .94. Her primary conclusion was that "results of the group are distinctly superior to the results of the average member and are equal to those of the best member."
The Knight technique of statisticizing group judgments was used by Smith in 1931 (59). He developed groups of 5, 10, 20, and 50 undergraduates who worked individually on the task of judging personality and behavior traits of children from written reports of their behavior. Although the correlations against the criterion (Smith's own judgment of the correct order) increased as size of group became larger, the increase was not as great as in Gordon's study. The average correlations based on 50 individuals was .37, versus .51 for the one "statisticized" group of 50. Six individuals exceeded the "statisticized" group correlation. Smith attributes the low correlations to the great number of, as well as to the ambiguity of, the traits, rather than as evidence of shortcomings in group judgment. Judgments of weight as well as the numerosity of buckshot were made by Bruce's (8) Ss in 1935. The average of the 120 correlations for individuals with actual weights was .50; the average for two "statisticized" groups of 60 was .88. For the visually-presented buckshot, the average of the 120 correlations for individuals was .82; for the two "statisticized" groups of 60 was .95.
Eysenck (16) used "group" techniques In 1939, when his Ss judged the beauty of 12 pictures against the Criterion of the average judgment of 700 students. An experimental group of 200 Was selected from the same college population. The average judgment of the entire 700 was considered the "expert" Correlations for the 200 individuals against the "expert" averaged .47; four "statisticized" groups of 50 averaged .98; and for one "statisticized" group of 200 the correlation became unity. Eysenck also reported a table of the increment in correlations as a function of number of judges utilized in "statisticized" groups.
In 1945, Klugman (31), using the Knight method, had high school students judge the number of several kinds of items in a bottle: "familiar" (jacks, marbles) and "unfamiliar" (lima beans, marrow beans). For the unfamiliar items, the one "statisticized group" of 60 was significantly closer to the true value than was the average of the individuals. On the familiar items, by contrast, there was no significant difference. Klugman concluded "when items are unfamiliar group judgment is significantly better than most individuals while on familiar items only a tendency appears."
Soldiers estimated the dates of the ending of the war with Germany and with Japan in another Klugman study (32). For the German armistice, of the 109 individuals who were tested, 27 were closer to the actual date than the "group" mean; and, for the Japanese armistice, 59 were closer. For the German armistice, he found a significant difference between the percentages of individuals with errors greater than the "group" error as contrasted with individuals with errors less than the "group" error. This difference for the German armistice, he "interpreted to mean that the group judgment was better." For the Japanese armistice, however, there was no such significant difference, though the direction favored the individuals. In conclusion, Klugman quotes Poffenberger - "one cannot say categorically that a group opinion will or will not be better than the opinions of the individuals that comprise the group."
Not until 1932 were the obvious defects of Knight's so-called "statisticized" technique criticized. Stroop (62), after verifying Gordon's results by repeating her experiment, then adapted it by requiring just one individual to make 50 separate judgments, i.e., his four "statisticized groups" of 50 were four individuals who had each made 50 judgments of the same stimulus. When he combined 5, 10, 20, and 50 judgments of the same individual, he obtained correlations with the criterion nearly identical with those that Gordon reported for combining 5, 10, 20, and 50 judgments of different individuals. Stroop argued that Gordon's results, rather than demonstrating the social psychology of "grouping" merely illustrated an obvious statistical principle of reducing the error variance.
Farnsworth and Williams (18) in 1936 demonstrated that Knight's results were unrelated to the fact that the individual made estimations in a group setting. They repeated her experiment in every detail except that each individual estimation was made in isolation. The accuracy of their "statisticized group" results were almost identical with Knight's. In another experiment, they attempted to show that improvement by "grouping" was not a general principle but only applied to judgments about familiar material. Using the size-weight illusion, subjects hefted two boxes and then estimated the weight of a third constructed to be lighter than either of the others although larger in bulk. The estimations were made by individuals from whose data the "statisticized group" estimations were computed. These "group" estimations did not approach the true value, leading Farnsworth and Williams to conclude that when the material is unfamiliar, distorted in a way such that all individuals are prone to make similar errors of estimation, the "statisticized" group estimation is not likely to be any closer to the true value than are individual estimations. Klugman's first study (31) indeed, was an experimental investigation of the Farnsworth-Williams generalization, but, contrary to Farnsworth- Williams, he found that when Ss are unfamiliar (as he defines "unfamiliarity") with the object to be estimated, the "group" estimation is significantly different from that of the individual.
Despite Gordon's defense in 1935 (22), the contention that "mere grouping ranks does not produce correlations," critique of her methodology was continued by Dashiell in 1935 (16), Preston in 1938 (49), and Smith (58) in 1941. The recent criticism has emphasized that, regardless of the statistical argument, experiments in which groups never meet can add little to understanding group process in social psychology. Preston (49), for instance, suggests that, notwithstanding what the Gordon results do show, they give no evidence either for psychological process or for group interaction. These studies have been cited not only because the technique has been used widely, but also because the Gordon and the Knight studies in particular are used as evidence for the values of group process.
Judgments by Interacting Group Members
What are the results from judgments by groups with genuine interaction among its members? The earliest study is that of H. E. Burtt (9) in 1920 with testimony. Individual Ss heard testimony by "stooges," some of whom were lying and some telling the truth. Each S judged which "stooges" were truthful and which were not. The individual votes were tallied and announced immediately. Ss then were constituted as a total interacting group of "jurors," who, after discussing the testimony, voted again as individuals. On the first vote, 48% were correct; after discussion, the percentages were not different. Of the 25 shifts in vote, 14 were in the right, and 11 in the wrong, direction. Burtt concluded that while discussion alters judgments, it does not necessarily improve them.
Dashiell (15) reports a study by Bechterev and Lange in 1924 in which individual judgments were made for a variety of tasks, ranging from the time interval between two sounds to the justification for a man beating a boy who had stolen from him. After the individual judgments had been made and summarized, the results were presented for discussion; after that, a second individual judgment was made. Their results seem to be consistently in favor of post-discussion judgments. Bechterev and Lange maintain that the group process is beneficial for all individuals, although those who have less to offer the group gain most by it. Dashiell states that Bechterev and Lange's report is not clear about the extent of actual discussion or even if the reading of summaries itself was the discussion.
In 1932, Arthur Jenness (29) investigated the effect of discussion in ad hoc groups on the accuracy of individual judgments. Ss as individuals estimated the number of beans in a bottle; then they discussed the estimates in ad hoc groups of three, and made a group estimate; and, finally, made a second postgroup individual estimate. In two different experiments, Jenness formed the ad hoc groups in different ways: in one, the individuals were chosen to make for maximum disagreement in the groups; in the other, the individuals were chosen to assure maximum agreement. With ad hoc groups selected for maximum disagreement, group estimates were less accurate than the average of individual estimates had been, but their individual post-discussion judgments were better in 20 of 26 instances, with a 60% average reduction in error.
In a control experiment, in a class of individuals who as individuals made two estimates without any intervening discussion, there was an average reduction in error of 4%. When the ad hoc groups were selected for maximum agreement, however, the group estimates were more accurate than the first individual estimates had been, but the post-discussion individual judgments were not significantly different from the control. In a fourth aspect of the experiment, after the initial individual judgments had been made, the results were read to the class who were then allowed to form groups as they wished. The results parallel closely those for groups selected for maximum disagreement. Jenness (29) concludes that discussion does not make group estimates more accurate, but stresses the importance of the knowledge of difference among judges in improving group judgments. He also introduced a method by which to estimate gain from group participation, i.e., the gain made by the individual subsequent to the group result, an appraisal too frequently ignored in the experimental literature despite the fact that it is usually suggested as an advantage of group process for education and for industry.
Judgments by Noninteracting Group Members
In 1937 Herbert Gurnee (23) attempted to evaluate only the effects of discussion by contrasting the judgments of individuals with those of noninteracting face-to-face groups with a sequent measure of group opinion. Individuals made their judgments on a written true-false examination. The same statements then were put to them in groups of 53, 57, 66, and 18 where each judgment was made by acclamation, with a show of hands when necessary. In each experiment, the group was better than the average individual, and approximately equal to the best individual. Gurnee computed "statisticized" groups but found that four of his five face-to-face groups were superior to their statisticized computed results. He reports a social influence upon the doubtful, in that those who were more certain of their judgments often carried the doubtful with them. His general conclusion was that although the group will be superior, the amount is unpredictable since the amount of gain depends on how well the individuals in it will do, since a task difficult for its constituent members will also be difficult for the group.
The improvement in accuracy of group judgment was demonstrated by Rosalca Schonbar (53) who reported that pairs of Ss were more accurate in estimating line length than were individual Ss since the pairs seem effective in cancelling of over and under-estimates in interaction.
In his review, Dashiell (15) reports an experiment he conducted. He compared the written reports of two different witnesses to a staged classroom incident with those written by legal psychology students after hearing an oral version of the incident. The legal psychology students first reported as individuals their conception of the original event, and then made a subsequent report as a group. None of the seven students as individuals gave an account as complete as either of the two witnesses; most of the individual reports were intermediate between those of the two witnesses in accuracy. The group report was less complete but more accurate than either witness and all but one of the seven individuals.
What generalizations can be made about group and individual judgments? Generalization is more difficult than the earlier work based upon the "statisticized" groups had implied. Increase in accuracy of judgment is not obtained by the simple expedient of convening people into a group. For the results of Farnsworth and Williams (18) and Klugman (32) have shown that for some type of material a group judgment docs not differ significantly from that of the average individual; and Jenness (29) as well as Gurnee (25) indicated that group superiority depends upon the quality of the judgments, and the range of judgments of individual members of the group. At best, group judgment equals the best individual judgment but usually is somewhat inferior to the best individual. Bechterev and Lange (7) have shown that the individuals making the poorer estimates benefit more from interaction in a group than those making the better estimates. Verbal interaction, however, does not seem essential to improvement, for Gurnee (25) as well as Bechterev and Lange (7) obtained significant improvement without discussion. Regardless of the shortcomings of the "statisticized" group technique, the experiments with facc-to-face groups also showed improved group judgment. This predicted superiority of groups is more probable when the material is unfamiliar or when there is an extensive range of opinion in the group.
In using learning as a basis for contrasting groups and individuals, the researches usually are less rigorous than those based upon judging or estimating. The lack of rigor comes from the ambiguity of terms, e.g., "the lecture system," "class discussion," and "study group," as well as the semantic confusion in "group" and "individual." Furthermore, the concepts of "change," "improvement," and "growth" have no reference either for "greater" or "more" or in statistical significance. Many reports are more testimonials by classroom teachers for methods they have used than experiments. As is usual in evaluating methods of instruction, variability in the quality of instructor or of the instruction is ignored.
G. Ryan (52) made one of the first studies using learning as a basis for evaluating group and individual achievement. She divided each of four college levels into equivalent halves for intelligence. One of each of the equivalent halves studied English and education as individuals for a six-week period with the instructor available for consultation; the other studied as a "regular" class. At the end of the first six weeks, the halves reversed roles; and for a final six weeks both halves studied together as a "regular" class. At the end of each six weeks' period, comprehensive achievement tests were given with the general result that those who had studied as a "regular" class did better. Despite such results, however, Ryan concludes that when time spent on study is equated, independent study was superior for freshmen, sophomores, juniors, and seniors and for all ability levels. This interpretation is based on the assumption that independent study took less of the instructor's time than did class instruction. Ryan seems to gloss over significant aspects of her results by implying that one goal of education is saving the teacher's time.
In 1925 Bane (5) reported results comparing the "lecture method," and "class discussion" technique. Ss were college students in education and psychology. In each of five "experiments" those taught by "class discussion" did significantly better on tests of delayed recall. On immediate recall, however, three did worse.
Using two equivalent sections of students, in 1926, Barton (6) gave each the same preliminary instruction on first-year algebra problem solving. One section was assigned new problems to solve as individuals; the other solved the new problems using class discussion. Two posttests of problem solving in algebra favored class discussion.
In 1928 Spence (61) compared the efficiency of learning by the "lecture" system with the "class discussion" system. Two large classes of approximately 150 Ss each were compared. The first section took an initial test, studied under the "lecture" system for one semester, took a second test, then studied under the "class discussion" system during the second semester, and took a final examination. The second section followed the same schedule, except it reversed the order of "lecture" and "class discussion" study. The test results indicate superiority for students in "lecture" classes. During the first semester those who had the usual "lectures" forged ahead. During the second semester, those who previously studied under the "class discussion" method made up the lost ground. The large size of these classes should be borne in mind. These results may be valid for extremely large classes, but varying results may be obtained as class size varied.
The three following experimenters agree on the beneficial side effects of "class discussion" in comparison to the traditional "lecture" method, but disagree as to which is the superior learning system. Thie (64), using high school English students, contrasted two equivalent halves on the basis of ability, one half working as a usual class with instruction by lecture, and the other studying in groups of five members each. Improvement was measured by difference between pre- and post-term scores on a reading test and on the writing of an original paragraph. Not only did the half that had studied in groups show greater improvement on both tests, but, in addition, these students showed greater gain in self-sufficiency as appraised by amount of voluntary work, by individual activity, and by reported enjoyment of the course. Of the 24 students in groups, 16 registered for another term of English, in contrast with but one of the 24 of the class. Though the so-called measures of self-sufficiency are not adequately defined, Thie was one of the first to suggest that the benefits of small group techniques in the classroom may be underestimated when evaluation neglects side effects and emphasizes content achievement only.
In 1927 Zeleny (73) made much the same point in the first of his studies on the discussion-group method of teaching with college students in sociology. The experimental classes were formed into groups of seven who were given written assignments and a syllabus. The instructor gave help to the seven-member groups as needed. The control classes were taught by "traditional lecture." On terminal tests of factual knowledge and of opinion there were no significant differences between the group-discussion method and traditional lecture method. Zeleny suggests that the expected values of group discussion were not in content mastery but rather in more teacher-student cooperation, increased mutual tolerance of each other's views, and better working together with others without sacrifice in subject mastery.
In a second study of group learning, Zeleny (74) in 1940 matched two classes for age, sex, intelligence, and subject-matter proficiency. These were taught by the same instructor: one by lecture-recitation, the other in discussion groups of five students each. On gains in content knowledge, there were no differences between the two. Groups, however, were superior to those who had lecture recitation, in participation, in personality development, in social adjustment, and in cooperation. Essentially, the results corroborate Zeleny's earlier suggestion that the advantage of group techniques for school learning is more in personality changes than in mastery of academic skills and knowledge.
In 1951 Asch (2) conducted an experiment to compare the over-all effectiveness of nondirective teaching (group participation) using the counseling methods of Rogers, Combs, Snyder, etc., to the usual lecture method. Four undergraduate sections of general psychology served as Ss. The experimental section was informed that no tests or final examination would be required. The control sections worked toward a final examination. The groups were compared for knowledge of subject matter, social attitudes, emotional adjustment, and the over-all evaluation of the course. The results indicated that the control group was superior to the experimental group in knowledge of subject matter. However, both groups were not similarly prepared to take the final examination. On the personal evaluations of the course by the students, a number suggested that nondirective teaching encourages greater amounts of outside reading, stimulates thinking about basic conceptual material, and makes for more independent decisions based on the knowledge of many individuals and not just one "authority." No differences were found between the directive and nondirective groups concerning their social attitudes as measured by the Social Distance Score. A comparison of MM PI scores indicated that the nondirective group improved to a significantly greater degree than the control group in emotional adjustment. Finally, an analysis of the Course Evaluation Forms completed by each S indicated that the Ss felt that the experimental section was more helpful in teaching the subject matter than did the members of the control group.
In 1937 Gurnee (24) reported the first of two "learning" experimental studies which, though more rigorous in control of basic variables, used somewhat less realistic problems. Half of his Ss worked as individuals, while the other half worked in groups averaging 10 members. The task was to learn a maze. Individuals were to concentrate on eliminating errors without concern for time; groups voted each step in the maze by acclamation. Groups and individuals had six trials. Groups did significantly better than individuals by having fewer errors and completing the first perfect trial sooner. Gurnee, then, tested all Ss as individuals on a seventh trial. He did not find any significant difference as a result of the two different kinds of experience. His results may be subsumed under Jenness' generalization that when groups are in agreement, group members will not improve as a function of group experience.
The next year Gurnee (25) reported a similar study with quite different results. In addition to the same maze, he added the learning of the arbitrarily correct number of 20 pairs of two-place numbers each in the course of six oral trials. Individuals were contrasted with groups, then a seventh written trial was given to all as individuals. On the seventh trial, those who had worked in groups made significantly fewer errors than those who had worked the first six trials as individuals.
Moore and Anderson (45) contrasted the learning of ad hoc three-man groups with that of individuals in applying some of the laws of the calculus of propositions in order to solve 10 different symbolic logic problems. The results are based on six individuals and six matched groups. In general, none of the differences is significant; i.e., the number of steps taken, of errors made and of time to solution did not differ statistically between individuals and groups. There was a greater tendency, however, for individuals to repeat steps, suggesting that the members of the group remember steps taken. It is not surprising that estimates of variance for groups and for individuals usually are not significantly different, but the direction is always for greater variance among individuals. Nevertheless, in the use of symbolic logic Moore and Anderson have introduced a novel learning task for use by psychologists.
Contrasting the results by groups and by individuals in "learning" suggests quite amorphous generalizations. Spence indicated that the lecture system is superior to the class discussion system for large classes. Ryan's results agreed. Asch's results, in a narrow sense, must be similarly interpreted. Thie, on the other hand, found that under his experimental conditions "class discussion" produced significantly better learning than the "lecture" method. Ryan found class discussion superior for certain types of learning, and inferior for others. Zeleny and Gurnee found no significant differences between the two forms of learning. These amorphous results suggest several explanations, the most likely of which is that these experiments were conducted under such varying conditions that seemingly diametrically opposed results are understandable. For example, the size of groups can be expected to have a profound effect on results. It is known that as group size in creases, individual involvement decreases, and inhibition increases. Large discussion groups, therefore, might be expected to produce less learning than smaller groups. Other factors such as announced goals, subject matter, methods of measuring improvement, etc., can be expected to have profound effects on the results. This indicates a serious need for additional experimentation which carl control important conditions.
Social facilitation refers to the effects on an individual of working at a task in the presence of other individuals but independently of them, i.e., not interacting with them, although they may be face-to-face in an audience or classroom. In social facilitation experiments, there is no interaction or cooperation, and, often, no expressed feeling of rivalry, although competition may affect the results. Allport (1) used graduate psychology students, who, in a first period, took a free association test alone, then, in a second period, took it individually in "groups" of three to five persons. Fourteen out of 15 individuals produced more words while working in the social setting than when working alone, although the differences were not significant statistically. Allport found that the effect of the so-called co-working "group" on individual productivity was to increase quantity but decrease quality of the associated words. He concluded that some tasks may be better done alone than in groups.
His basic technique was used through the years by Sims (57), Sengupta and Sinha (54), and others with almost no variation in results. The assigned tasks were always repetitive and meaningless, e.g., letter cancellation in running text, etc. In Sengupta and Sinha's study, for example, in Ss that worked at a task for nine days, output did not vary much after the third day. Upon changing the work situation into a. social setting, output rose significantly until restabilized at a second but higher level. Mukerji (46) found that with children doing letter cancellation and letter-naming, almost 90% of the individuals had superior outputs in the social setting, but that oscillation in production was greater when performing in groups.
In 1952 Wapner and Alper (69) described a restraining force as a result of an audience. One hundred twenty Ss were tested in three varying situations. All were asked to select one of two words which best fit a given phrase. In the first situation only the S and the experimenter were present. In the second situation, the S and the experimenter were present, but the S was informed that an "unseen" audience was listening to and watching his performance. In the third situation, the S and the experimenter were present with a seen audience. Either task-oriented or ego-oriented instructions were given the Ss. In the task-oriented instructions the Ss were informed that the material rather than the S was being studied. In the ego-oriented instruction the Ss were informed that the task was a form of personality test and that they, rather than the task, were being evaluated. The results indicate that the time to make a choice was longest in the presence of an "unseen" audience under both forms of instructions; next longest in the presence of a "seen" audience; and shortest when there was no audience. The significant differential effects of the audience variable occurred for the first half of the experimental sessions only. Items withpersonality references yielded longer times than neutral items. Contrary to the expectations of the experimenters there were indications that time to make a choice is longer for task-oriented than for ego-oriented instructions.
An extensive inquiry into the effect of co-workers on productivity is reported by Dashiell (14). He investigated the conditions of working: (a) alone; (6) together but non-competitively; (c) together and competitively; and (d) alone, but under observation. Speed increased for each of the three tasks (multiplication, serial association, and a mixed relations test, particularly between conditions (a) and (d). Accuracy, on the other hand, was much more evenly distributed among the four conditions: with the only clear difference in the "observation" condition for which the work was least accurate although the greatest amount was produced.
Kelly and Thibaut (30) reported a study by Wyatt, Frost, and Stock (72) in 1934 which indicated that in real life situations involving work of a highly repetitive nature, social facilitation effects have been found consisting of closely similar production curves for employees working together. The authors found that workers' rates of output varied with the output of others in the work group. This relationship was particularly close for pairs of workers seated opposite each other, and was somewhat more marked the more visible and the more measurable the output. When individual workers were subsequently isolated, the correspondence between their work and that of the others disappeared.
Hilgard, Sait, and Magaret (27) indicated that not only can actual production be affected by social facilitation, but also level of aspiration. The Ss worked in groups of three to six members. They individually worked on successive subtraction of three place numbers. The material was graded in difficulty in order to produce experimental differences in success. After the first experimental session, when all Ss' scores were known to each other, they were asked to estimate their future performance. Those Ss ranking superior in relation to their social group tended to estimate their future performance too low, while those Ss making inferior scores tended to estimate their future performances too high. Though the critical ratios were low and caution was recommended in interpretations, the trends within the groups were clear. The authors speculated that the desire for social conformity might well produce this regression of predicted scores toward the mean.
To a degree, the influence of co-working members, indeed, may be stronger in a group interacting for a common objective. The feeling of ego-involvement in the group's product may be a significant factor as in problem solving for a group result.
In problem solving, few experimental studies contrast the quality of solutions by groups and by individuals. Most results seem to be byproducts of investigations of the problem-solving process. Nevertheless, these few studies are those most frequently cited as evidence of group superiority, e.g., Watson's (70) comparison of groups and of individuals in problem solving. He used ad hoc groups of college students given the task of making as many shorter words from the letters of a larger word as possible within a time limit. For the first trial, Ss worked as individuals; for the second and third trials they worked in 20 ad hoc groups of from 3 to 10 members; and for a final fourth trial again as individuals. The group product, i.e., the number of different words, was significantly larger than that made by the best individual and thus, obviously, larger than that of the average individual. When Watson (70) formed what may be called the "concocted group" or "summated individuals," i.e., added together all the different words in the first trial made by the individuals comprising the groups in Trials 2 and 3, he found the average "concocted group" product significantly larger than the average ad hoc group product. Even though the average product of the ad hoc groups significantly exceeded the product of the average individual or of the best individual, nevertheless it was significantly inferior to the full resources of all of its individual members. Group interaction may inhibit the fullest potential contribution by its members. Indeed, the superiority of the "concocted" group over the interacting ad hoc group suggested such an inhibition.
In a subsequent study Watson (71) evaluated group and individual superiority on nine different tasks: finding antonyms, solving a cipher, drawing conclusions from stilted facts, completing sentences, listing steps in problem solving, composing limericks, comprehension of reading, and an intelligence problem. There were three equivalent forms of each task; Ss first did one form as individuals; second, another in ad hoc groups; and third, the remaining form as individuals. On all nine tasks, the average achievement of groups was superior to that of individuals; the differences, however, ranged from small and insignificant for reading comprehension to large and significant for completing sentences. In speed, on the average, groups were superior to individuals. For the nine tasks, on the average, about a third of the individuals were superior to their group in score and in speed. Such superiority, however, was a function of the task: for instance, on antonyms, 11% of the individuals made scores superior to their group in contrast to 50% of the individuals who did better than their group on the intelligence problem. The order of group superiority is as listed above.
In 1932 Shaw (55) compared groups with individuals in the rational solutions of complex problems. A class in social psychology was divided into halves. In the first period, half the class worked in five ad hoc groups while the others worked as individuals; in a subsequent period the roles of the two halves were reversed. In the first period, the task was the solution of three very similar classical "mathematical recreations" puzzles, e.g., the three beautiful wives and their jealous husbands who had to cross a river by rowboat carrying three persons at most, under the constraints that no wife and all husbands can row and that no husband would allow his wife in the presence of another man unless he was also present.
In the second period, the problems were quite different: (a) rearranging words to form the last sentence of a prose passage; (b) rearranging words to form the last three and a half lines of a sonnet; and (c) to find the most economical routes for two school buses to bring children to a common school under the constraint of maximum bus capacity and of a specified number of pick-up stations. For the puzzles, obviously, there is just one right answer; for the word rearrangements, however, the correct answer is arbitrarily the original word order, and for the school bus problem, it is the one that gives minimum mileage. Ss are more likely to be able to verify the solution for the three mathematical puzzles which were given in the first period; but for the second period problem, they have no way of verifying their solutions because the correct answer was arbitrary. For instance, word rearrangements can be completely appropriate in meaning despite deviation from the original word order. Puzzles having unique solutions may be termed "Eureka" since Ss can, and do, get confirmation for correct solution.
For the first period, on the so-called Eureka problems, three of the 21 individuals and three of the five groups solved the first problem; no individual and three groups solved the second problem; and two individuals and two groups solved the third problem. No individual solved more than one problem, but just three groups made the eight group solutions. Two groups and 16 individuals never solved any of the three puzzles. For the second period problems, three of 17 individuals and four of the five groups solved the first problem completely; a fifth group and seven other individuals made just one error. No individual and no group solved the other two problems. Group superiority rests only on the eight solutions by groups in contrast with the five by individuals. In general, interpreters of the Shaw experiment have disregarded not only the similarity among the three problems but also the fact that the solutions were based on the sum over-all problems rather than on the number of identical solutions by individuals and by groups. For instance, when only the solutions by individuals and by groups for the first problem are compared, there is no statistically significant difference. Shaw neither discussed the fact that two of the groups never solve any of the three puzzles, nor the relative efficiency of three solutions among 21 individuals versus three solutions for five groups of four members each, i.e., 20 individuals altogether.
Shaw advanced methodology by her more rigorous procedures for studying problem-solving of individuals and of groups; however, the interpretations implicit in her conclusions do not conform to the constraints placed either by the kind of problems, or the type of Ss, or the possibility of transfer of training. The fact that two groups never solve, and that three groups get eight solutions, suggests two hypotheses for research : (a) that transfer of training is more likely in groups and (b) that group solution is possible only if at least one individual as an individual could have solved the problem.
Shaw accounted for group superiority on the basis of observations that groups rejected incorrect solutions and checked against errors. Since her results differed with the different problems, her interpretations might have been that for problems with just one unique answer, groups were superior; but for problems with a wide range of answers, there is no genuine difference. Lorge and Solomon (38) re-examined the data for the Eureka problems in 1955 and suggested other explanations for group superiority. Their work is reported later in the section on mathematical models. The question of the relative efficiency of three solutions among 21 individuals versus three solutions for five groups of four members each, i.e., 20 individuals, was investigated by Marquart (42) in 1956. She essentially replicated Shaw's experiments with similar results. However, Marquart noted that Shaw's conclusions about group superiority hinge on comparing percentages of possible successes obtained by individuals to percentages of possible successes obtained by groups. A fairer comparison, she proposed, involved treating individual successes on a group basis, e.g., if, when working individually, one of the three individuals who later make up a group of three get the correct answer, individuals are credited with one success in one trial, rather than one in three. If, on the other hand, no correct solution is forthcoming from any of the three individuals, then one failure is attributed instead of three. On this basis, the individuals turned out to be slightly superior in both Marquart's results and in Shaw's.
Shaw, however, did not consider her conclusions limited by problem type. In 1938 Thorndikc (65) investigated the hypothesis that as the range of responses increased, the superiority of the group over individuals increased. Thorndikc used two versions of each of four problems; one with a "limited" number of responses and the other with an "unlimited" number. For instance, a multiple-choice item with four options was paralleled by an open-ended version; or similarly, completing a crossword puzzle was paralleled by requiring the construction of a crossword puzzle. The other tasks were limerick completion (either one line or three to be supplied); and a vocabulary test of synonyms, five choices or recall. Ss were college students, who worked four two-hour sessions, each a week apart, in two sessions as individuals and in two as groups. For all four tasks, differences were in the direction of the hypothesis, with three of them significant. Thorndike's tasks, in a sense, contrast recognition versus recall in groups and in individuals. The recognition item form favors groups. This indicates, as Shaw had suggested, that group superiority results more from members pooling information by rejecting incorrect options than by contributing options for consideration. Thorndike's problems differed so much from those of Shaw as to suggest that generalizations about problem- solving and about group superiority seem to depend upon the nature of the tasks.
Husband (28) in 1940 attempted the study of a group in contrast with an individual as measured by required man-hours to arrive at a solution and the quality of the product. He used three tasks: deciphering a code, solving a jigsaw puzzle, and solving arithmetic problems. Ss were students in psychology, 40 working alone, 80 in pairs. Some pairs were friends; some strangers. Pairs were superior on the first two tasks, but on the third (arithmetic problems) there was no significant difference. Husband suggests that on the arithmetic task one member of the pair tends to take the lead and do all the work. In all comparisons, pairs of strangers did better than pairs of friends.
Husband's results emphasized the conclusions from some of the earlier studies about originality and routine performance. His pairs did better on problems requiring some originality or insight than on the more routine arithmetic problem; this confirmed Thorndike's hypothesis that the superiority of the group product over the individual product is greater in problems with unlimited solutions than in those with limited alternatives and confirmed Watson's and Shaw's findings that the group handled complex problems adequately. Regarding efficiency, however, he indicates that the time saved in pairs was never more than a third-—not the half needed to equate time for pairs and for individuals, although Husband failed to consider the better quality for the time used by pairs.
After a long interval following Thorndike's work, Taylor and Faust (63) compared individuals and ad hoc groups in solving the identity of a topic in the game of Twenty Questions. Elementary Psychology students worked for four days at the rate of four problems a day, either as individuals, or in pairs, or in groups of four. On the fifth day, all Ss worked alone. Although time was recorded, the prime criterion was the number of questions necessary to reach a solution. In pairs and in groups, discussion was allowed, with the motivation that they were competing against other groups but not against each other. There were significant differences between the scores of individuals and those of pairs and of groups in questions, time, and failures. Except for failures, there were no significant differences between the pairs and groups. Of course, in efficiency, i.e., the number of man-hours to reach a solution, the group is inferior to the individual, with four-man groups less efficient than pairs. The gain from training acquired the first four days by each individual as measured on the fifth day did not seem to be different whether the first four days' training came by practicing as individuals, or in pairs, or in groups.
Taylor and Faust's Twenty Questions approximated the Eureka, but also it was summative in that each member's contributions could add to the group result. Their data tend to corroborate Shaw and Watson; but they contradicted some studies of "learning" insofar as there was no transfer to individual achievement as a consequent of previous differential group or individual experience.
Research contrasting group and individual performance in "learning" suffered from a lack of experimental controls; research with problem solving suffered from a lack of reality, etc.; problems or tasks are far removed from the genuine and the real. The problems, in general, have been puzzles, riddles, or information-test questions. Results from such tasks were not sufficiently conclusive to allow an unambiguous generalization about the superiority of groups over individuals with more realistic problems.
Little work in the area of group and individual memory has been completed. In 1952 Perlmutter and de Montinollin (48) experimented with group vs. individual learning of nonsense syllables. Twenty groups of three persons each were required to learn equivalent lists of nonsense words. One list was learned by each individual while working alone, but in the presence of the other two. A second list was learned as a cooperative three-person project. Half of the .Ss worked first as individuals and half first as members of interacting groups. On all trials the average group recalled more words correctly than did the average individual. The group recall tended to be equal to or better than the best individual score, and those who worked first as members of interacting groups tended to do better as individuals than those who worked first as individuals. The converse was not found to be true. In agreement with Shaw (55), Perlmutter and de Montmollin noted processes of rejection and evaluation operating within the groups, and the results seemed to indicate that groups adopted fewer invented words and fewer words represented modifications of those in the lists.
In 1953, Perlmutter (47) tested group vs. individual memory of "meaningful" material. A story entitled "War of the Ghosts" was read to 8 two-man groups, 8 three-man groups, 3 four-man groups, and 10 individuals. A comparison of recall was made after 15 minutes and after 24 hours. No statistically significant differences were found, although the results favored the groups. The standard deviations of individual's scores were nearly twice those of the three-man groups, indicating the possible existence of a group pressure toward conformity. Individuals required less time than both two- or three-man groups at a statistically significant level in both sessions. Perlmutter concluded from this experiment that, on the one hand, hardly any evidence was found to support the extreme position that the content of group memory product is unique and not related to the content of individual member recalls. Very little correct information was found in group recall that was not in any member's recall. Conversely, some correct content was found in all or some of the individual's recall that was not found in the group memory product. He concluded that while it was interesting to attempt a derivation of group product from individual products, in some respects group product can be treated in its own right, and that some principles of product change can be formulated without measurement of individual member memory.
The single generalization derived from these studies is that in conformity with other studies reported in this review, evidence of the existence of both a depressing and an accelerating effect from group participation is evident. These experiments do not aid in identifying or quantifying these effects.
SIZE OF GROUP
A section on group size is included because of its profound effect upon group productivity. "Group" in contrast to "individual" is affected by a number of important variables, size being one of the most important. This is true to such a great extent, that the term "group" can refer to materially different entities. It is important that knowledge of the variability of a group product, as conditions vary, be utilized when comparing "group" and "individual" products.
In 1927, South (60) conducted an experiment with 1,312 Ss divided into groups of three and of six. Four types of tasks were assigned ranging from the "concrete" to the more "abstract." The tasks were: judging emotion from a series of photographs portraying emotion (abstract); answering multiple choice questions (concrete); solving bridge problems (concrete); and judging English compositions (abstract). The results were obtained on the accuracy of the performance, and on the time required to complete the experiment. The results indicated that the size of the group affects its efficiency. In each of the four types of material there was a difference between the performance of groups of three and groups of six, depending somewhat on the type of material or the kind of problem given the group. The small groups were more efficient with "abstract" problems, while the larger groups did better with the "concrete"problems. South concluded that in the case of the abstract materials, the members had their own opinions after the first glance and the committee's task was largely that of compromising and overriding opinion. In the case of the smaller group there were fewer opinions and hence less to do. In the case of this particular type of problem, the small group was faster with no loss in accuracy.
Kelly and Thibaut (30) reported a study by Bales et al. (4) in which individual Ss were first ordered according to initiation rank, which is the degree to which they initiated responses in a group situation. Theoretical curves based on a harmonic distribution were then fitted to the obtained percentages of total acts contributed by members at each ranked position. For groups of size three and four the empirical curves were found to be flatter than the theoretical curves, but for groups of size five through eight the empirical curves were steeper than the theoretical curves. Thus it appears that the proportion of very infrequent contributors to the group interaction increases as the size increases. In the larger groups the discrepancy between obtained and expected frequencies was attributed to the large volume of participation by the highest initiator. This study suggested that as size increases from three to seven there is a sharp rise in the proportion of members who contribute less than would be expected if each member shared equally in the interaction. Beyond the size of seven, the proportion shows no consistent increase or decline.
In 1951, Gibb (19) experimented with the effects of group size upon idea production in a group problem solving situation. The Ss were 1,152 college students composed into groups of 1, 2, 3, 6, 12, 24, 48, and 96. The groups were asked to produce as many solutions as possible to a series of problems permitting multiple solutions. Each group session lasted 30 minutes. The results indicated that the number of ideas produced increased in n negatively accelerating function of size of group in each of the two conditions. Valid criticism of this experiment is that the time limit of 30 minutes was not sufficient to permit an exhaustion of the potential contributions of all of the members of the larger sized groups. Furthermore, the problems may not lend themselves to more than a limited number of solutions. However, of extreme importance is that Gibb reported that with increasing size a steadily increasing proportion of the groups' members reported a feeling of threat or inhibition of their impulses to participate. This, in addition to the statistical results, supports the hypothesis of a restraining force resulting from increased size of groups.
Carter et al. reported a study (10) comparing individual participation in groups of varying sizes. They concluded that in groups of four, individuals have sufficient space in which to behave, and thus the basic abilities of each individual can be expressed, but in the larger groups only the more forceful individuals were able to express their abilities and ideas, since the amount of freedom in the situation was not sufficient to accommodate all the group members.
From this limited number of studies certain tentative generalizations can be made. As indicated by South, greater production on "abstract" problems can be expected from smaller groups than from larger ones, and greater production on "concrete" problems from larger groups than from smaller ones. Bales et al. (4), Gibb (19), and Carter et al. (10) indicated the possibility that groups of increasing size will increase production at a negatively accelerating rate for problems of certain types. When comparing production of groups of varying size and individuals, these generalizations should be kept in mind. Considerable additional research is needed to confirm or refute these expectations.
PROBLEM SOLVING IN MORE REALISTIC SETTINGS
Studies using genuine and significant situations are less numerous than those involving judging, learning, etc., partly because concern with the more genuine human relations problems has emerged quite recently and partly because of the practical difficulties in working experimentally with problems involving decisions. These decisions require the individual or the group to weight alternatives for relative adequacy, followed by the selection of one or some combination of several as the most feasible solution rather than determination of the correct answer. Thus, the criterion for appraising decisions in these experimental studies should differ from agreement with the one true order or the one correct answer; rather, the evaluation of decision, ultimately, should be based on some system of credits for coverage and adequacy.
Timmons (67) used as criterion the experts' rank-order of five possible options to the genuine problem, "What type of parole system should Ohio adopt?" His research was oriented primarily to estimate the effect of discussion on the individual's ranking of the five options. The Ss were high school students in Ohio. Classes were divided so that some 6s worked as individuals throughout the experiment and others worked in specially constituted groups during part of the experiment. The controls (as individuals throughout) 1: on Day 1, ranked the five different options and took an attitude scale toward parole; 2: on Day 2, read a pamphlet containing authoritative information about parole, then again took the attitude scale and ranked the options; 3: on Day 3, reread the information pamphlet under motivation of competing with groups discussing the problem, and then again took the attitude scale and ranked the options; and 4: after an interval of a month, were measured for attitude and for ranking of options.
The experimental section was treated identically for Steps 1,2, and 4. The essential difference was in Step 3 in which six different kinds of groups were formed, based on the performance on the first day. Each group was supplied with a copy of the informational pamphlet, discussed the problem in leaderless groups and formulated a ranking of the options as a group. When that had been completed, each member of each group took the attitude scale and ranked the options as individuals.
Timmons' measure was the individual's ranking of options, so much so that Timmons considered ranking by groups only incidentally and tangentially. In terms of the individual's agreement with expert ranking, the informational pamphlet produced a tremendous shift toward experts' rank (Day 2 minus Day 1). Individual study and group discussion resulted in further movement toward expert ranking (Day 3 minus Day 2). The individuals who participated in group discussion were closer to the experts than the individuals who restudied the pamphlet. These changes were maintained, in general, a month later. For attitudes, gains only followed the reading of the pamphlet on Day 2 and at no other time, and showed no difference at any time between those who discussed and those who restudied the pamphlet.
In this major aspect, Timmons demonstrated a significant transfer from group discussion to subsequent individual rankings. Unfortunately, Timmons considered the group's ranking a very minor aspect of his research. He reported that after discussion the groups' average agreement score was 2.93, which was not significantly different from the immediately subsequent individual (from those groups) average agreement score of 3.31. The 3.31, however, was significantly better than the 6.70 of the individuals who had restudied the pamphlet.
Although Timmons formed six different kinds of groups based on the amount of their agreement with experts' ranks initially, he failed to report the ranking of the various groups. Methodologically, the six groups were made up as: I. 4 Ss with good scores; II. 4 Ss with intermediate scores; III. 4 Ss with poor scores; IV. 2 Ss with good scores, 2 Ss with poor scores; V. 2 Ss with good scores, 2 Ss with intermediate scores; and VI. 2 Ss with intermediate scores, 2 Ss with poor scores.
He reported, however, in terms of individual change that the gains were largest for the poor, smaller for the intermediate, and least for the good, student. The good made greater gains after discussing with other good than after discussing with the poor or the intermediate. The good did not get worse after discussing with the so-called poor. The poor gained as much from discussion with the good as from the intermediate, but always significantly more than from discussion with the poor. From the viewpoint of learning by individuals, all individuals seem to benefit from discussion even when the discussants were relatively less adequate.
In 1941, Robinson (51) investigated the effects of group discussion upon attitudes toward two social problems: capital punishment, and American policy to keep out of war. He contrasted college sophomores in 43 ad hoc groups of from 8 to 20 members as experimental samples, and 225 college sophomores as individuals. The experimental sample (a) studied group discussion theory for one month and had weekly practice discussions, (6) studied material on the problems, then, (c) took Thurstone Attitude Scales relevant to both problems, (d) had a two-hour discussion on each of the questions, and, finally, (e) took the attitude scales a second time. The control sample had no group discussion, but were given successively the two forms of the Thurstone Scales on each problem. In one variation, an information test was added before and after discussion; in other variations, discussion theory and the study of the informational material were omitted.
Although significant changes in mean scores were made only in attitudes about how to keep out of war, Robinson noted that a consideration of the magnitude of the attitude shifts by individuals revealed significant changes on both problems in all groups. When the informational test was used, the individuals showed gains after discussion. However, without comparable data for the control, this gain cannot be referenced to individual versus group superiority. In a third experiment, Robinson, comparing change in attitude after reading informational material with that after 30-minute group discussion found that the magnitude of the shifts by individuals after reading not only exceeded those after 30- minute discussion, but also those after the two-hour discussions in the earlier experiments. This lack of shift after discussion, contrary to Timmons' findings, may be attributable to the inadequacy of experimental designs, or to a genuine difference between the sequelae of reading or of discussion.
Robert Thorndike (66) hypothesized that much of the superiority of groups over individuals was attributable to the elimination of (a) individual chance errors and (b) those errors differing from individual to individual and from time to time. The second hypothesis had been confirmed partly in the Gordon experiments. Thorndike attempted to isolate that part of group superiority that was due to averaging or summing individual contributions, from that part due to the elimination of each individual's chance errors. He used 1,200 college students formed into 220 ad hoc groups of from four to six members. They worked on 30 problems, e.g., selecting the better of two poems, the more socially significant of two headlines, etc. The design required choice by an individual together with a measure of confidence; then, after discussion, a group choice. Both the individual and the subsequent group choice were completed for each problem before the Ss proceeded to the next problem. There were significant differences between the mean scores for all individuals before discussion, for "concocted" groups before discussion, and for ad hoc groups after discussion.
The analysis of the group product revealed that part, but not all, of the difference between group and individual results can be explained as a consequence of the pooling of the individual products. As Gurnee found in group judgments, the group product was more than an expression of the majority of the members comprising the group, and Thorndike found this "more" attributable to the discussion among the group's members.
The data also were analyzed for the consequences of grouping, i.e., the effects when the majority were correct before discussing in the group, as opposed to when the majority were incorrect. When at least 70% of the individuals were correct before discussion, there was a gain after group discussion of 11%. When less than 50% of the individuals were correct before discussion there was a loss after group discussion of 7%. This result indicated the necessity of qualifying Jenness' earlier hypothesis that disagreement among members is more conducive for group improvement than is agreement. His hypothesis may be correct with judgments, where awareness that others do differ results in restudy, but may not apply in problem solving or decision making when disagreement involves a majority with an erroneous view. Indeed, Farnsworth and Williams (18) made the same point in their demonstration that when all group members were likely to be in error, there was no reason to expect the group product to be better than that of individuals.
Timmons (68) concluded that after allowance for the averaging of individual contributions is made, a significant superiority for all groups still remains; similarly, after allowing for the effect of majority influence, an insignificant amount of superiority is reported. When allowances both for averaging effects and for majority influence are made, there is "a large, but not significant" difference favoring the group. Timmons suggested that considering the rigor of his methods, "it seems probable that the differences are even more significant than they seem to be." He suggested four factors possibly inherent in discussion that may account for unexplained differences: the group (a) has more suggestions leading toward the solution (cf. Shaw), (&) has a wider range of interpretations of the facts of the problem, (c) has a wider range of criticism and suggestion (cf. Shaw), and finally (d) has more information (cf. Lorge and Solomon).
In 1955 Lorge et al. (39) experimented with the difference in the quality of solution to a practical problem which was presented in four settings differing in degree of remoteness from reality. The problem was presented either as a verbal description, a photographic representation, a miniature scale model, but not allowing manipulation of parts and materials; or a miniature scale model allowing manipulation of parts and materials. The problem consisted of rinding a way to get a squad of soldiers across a specified segment of mined road as quickly and secretly as possible using a limited number of available props. The problem was adaptable to many solutions of varying quality, and had the characteristics of a genuine field situation. Ten teams of five AFROTC students and 10 individuals worked on the problem. Any individual or group had the right to ask as many questions of the experimenter as he so desired, and the experimenter made an effort to answer all questions as long as they did not directly divulge a method of solution. The results indicated no significant differences among the solutions at the four levels of remoteness from reality. However, at all levels the solutions of the groups were markedly superior to the solutions of the individuals. It was concluded that the differences may be due in large part to the amount and kind of information the Ss had at their command. It was noted that the number of questions asked of the examiner increased with the remoteness of the model from reality. This worked toward equalization of the information available to all groups. Groups asked more questions than the individuals at all levels of remoteness, which meant that the groups had more information to work with than did the individuals, and may account, in part, for their superiority.
Lorge et al. (36) completed a study in 1953 comparing the quality of group and individual solutions of human relations problems before and after class instruction in staff procedures. At the beginning of the course, one half of the Ss spent one period in problem solving as individuals, while the other half worked as groups. In the very next period, those who had worked as individuals were formed into groups, while the earlier groups were dissolved and their members worked individually. The design was replicated six months later, at the termination of the course. The results indicated that as a result of this particular form of instruction, the quality of decisions prepared by ad hoc staffs after training is significantly superior to that of those prepared by ad hoc staffs in the opening week of class. By contrast, and of interest, the decisions written by individuals after instruction are not significantly different from those they had prepared as individuals in the opening days of the course. This indicated the possibility of individuals being able to improve in their performance of a given task as members of a group, without showing improvement in their performance of the same task as individuals.
These results emphasize the fact that the group is not necessarily superior to the individual in human relations decisions. The quality of individual decisions before instruction is significantly superior to that of groups. This difference in favor of individuals may indicate merely the relative ineffectiveness of ad hoc groups to solve the problem in the given time. In some of the appraisals it was found that groups, before instruction, lost upwards of 80% of the ideas that their constituent members as individuals had for the solution of the problems. Many of these lost ideas were important. At the beginning of instruction, more than 75% of the individual decisions were superior in quality to the best of group decisions. Since the decisions of the individuals at the end of instruction do not differ in quality from those at the beginning, the presumption was that there was a gain in group interaction but not in problem-solving skills whether among individuals or in groups.
The data also indicated that the probability that any individual's idea will be expressed in the group decision is a function of the commonality of the idea, i.e., of the number of individuals who had the same idea prior to the group meeting. Of all the ideas that were held in common by two or more group members prior to the group meeting, half appeared in the group decision. Only 10% of the unique ideas, those possessed by only one person prior to the group meeting, ultimately appeared in the group decision. Similarly, only one third of the ideas evolved in group decisions were original, i.e., ideas which none of the group members had mentioned in their earlier individual decisions, whereas two thirds of the ideas had already been so expressed. These data suggested that group process does not generate original ideas but relies heavily upon ideas formulated prior to the group meeting.
A further aspect of the group's involvement with decisions was in the changes in food habits studied under the sponsorship of Lewin (35). The basic comparison was the carrying into actuality of decisions made either in groups or as individuals. As such, the Lewin studies concern a side effect of the group process without any reference to the quality of the decision.
One study was conducted with six groups made up of Red Cross volunteers in a home nursing course. The objective of the course was to increase actual use of the unpopular "variety meats," e.g., beef hearts, kidneys, and sweetbreads. Lecture was contrasted with group discussion in each of three groups to induce change, with the same nutritionist offering the same recipes. At the end of the discussion, the women, by a show of hands, indicated whether they intended to try the new foods. On follow- up, 3% of the lecture and 32% of the group-discussion volunteers used some of the "variety meats." Lewin stated, however, that only subjects in discussion groups were told of the planned follow-up.
In a study by Radke and Klisurich (50) six neighborhood groups composed of from six to nine housewives were organized with the objective to increase home consumption of whole and evaporated milk. The contrasted procedures were lecture and group discussion. After two and after four weeks, follow-ups indicated that those who had the discussions showed significantly greater change in the desired direction. As in the Lewin study, discussion groups were informed of the two-week follow-up but the lecture groups were not. Lewin, however, stated that neither sample knew about the four-week follow-up. The differences, on the four-week check, may be a consequent of successes with the new foods tried.
Klisurich and Radke (50) tried to have new mothers increase the amount of orange juice and cod liver oil fed their babies. The "individual" condition involved each mother individually discussing with the hospital nutritionist for about 25 minutes the feeding of her new baby after which she was given printed instructions on feeding. Both oral and written material stressed the importance of using orange juice and cod liver oil. The "group" involved other new mothers who were formed into ad hoc groups of six members for instruction and discussion of feeding. The time for a group of six was equivalent to that given any one individual, i.e., 25 minutes. Follow-up was made after two and after four weeks. The results show that significantly more who had made decisions in groups behaved in the desired fashion.
Lewin suggested that the first two experiments may be interpreted as the consequences of (a) greater involvement in the group situation as compared with the more passive audience role in the lecture, or (b) greater interest in the group discussion or (c) that only those in the discussion groups knew of the anticipated follow-up. The results of the third experiment were all the more striking because the individuals received special attention, much more than any group members; and, further, because the group members were farm mothers who were unacquainted with each other before and who had no subsequent contact after leaving the hospital. Nevertheless, a 25-minute discussion among six such strangers produced much greater change than did a 25-minute consultation with an individual. Lewin considers the third experiment as indicative either of greater individual involvement in the group decision, even under the described conditions, or that decision in groups tends toward action.
Levine and Butler (34) replicated the Lewin experiment, attempting better controls, i.e., avoiding differential expectations. Their Ss were factory supervisors who regularly gave ratings which determined the base pay of workers in their departments. It had been shown previously that the supervisors tended to rate the job rather than the man, i.e., workers for the more highly-skilled jobs were consistently rated higher than those for the less skilled jobs. The experimenters educated the supervisors to rate man performance, not job level. The 29 supervisors were randomly divided into three groups. One group had no training, one a 90-minute discussion on improving rating, and the third a lecture followed by a question and answer period totaling 90 minutes.
The control group did not change, rating the men on the more skilled jobs higher than the men on the less skilled jobs. "The lecture method had practically no influence upon the discrepancies in rating …. Performance ratings were affected significantly only after the raters had had a group discussion and had reached a group decision." The data, however, are not sufficiently rigorous to supply definite evidence for the superiority of the group technique.
In 1950, Maier (41) compared groups and individuals in decisions with a realistic problem. The task was to plan the action to solve the problem of what to do with the slowest man of a "parasol" or circular assembly line. The slowest men held up the whole assembly line. The problem was presented in two ways: (a) just a statement of the problem or (b) added to the statement of the problem was a description of the roles of the eight different members of the "parasol" assembly.
The Ss solved the problem in ad hoc groups and as individuals. All of the groups were assigned discussion leaders, only some of whom were trained in techniques of group leadership; the rest were not. The trained leaders, however, knew the "elegant" or experimenter's solution.
Maier's primary conclusion is that a trained leader can improve the quality of the group product. It is limited, of course, to situations where the trained leader knows the experimenter's solution, Under these conditions, Maier also discovered greater individual acceptance of the "elegant" solution. This illustrated the importance of appraising side effects, such as acceptance of a decision, as was done by Lewin (cf. 35).
In industrial situations, people work in genuine life settings with real problems. Since the problems are so genuine, the participants are usually motivated, often highly motivated. An example of interaction to influence productivity is Bavelas' study (sec 40, pp. 264-266) in which three groups of factory workers met with a psychologist to set a new production goal for themselves. Their previous production "high" had been 75 units, their previous average was 60 units, and the goal they set was 84 units. This was achieved. Then, they met again and decided to set 95 units as a goal which they failed to realize, but production was stabilized at 87 units. Two other teams of workers serving as controls met with the psychologist but set no goal. They showed no significant variation from their previous average of 60 units.
Coch and French (11) reported a study using as ,Ss factory workers with an average of eight years of schooling. They were investigating the effects on productivity of three degrees of participation in decision making. All groups had been producing at 60 units before the job change. One group had no participation in making decisions about the job change—management gave them the reasons for the change and answered any questions they raised. Experimental Group I only partially participated in the decision making— they elected a committee which made the decisions. Two other experimental groups participated completely, making all decisions as a group. The decisions pertained to design of the new job, determination of new rates, training methods, etc. The control group dropped, on the average, 10 units per hour, had an exceedingly high rate of turnover, and a slow rate of learning on the new job. Production in the experimental groups dropped initially but quickly rose to their old levels. In the complete participation group, productivity reached a new "high," 15% higher than previous production rates, the relearning rate was very rapid while turnover was practically nil. Coch and French concluded that speed of regaining old production levels was directly proportional to the degree of participation in the decision to make the changes. That this was not just a function of the specific personnel involved is emphasized when the control group became the experimental group for a later experiment, showing the same quick recovery and a new production high at the new job.
Marrow (43), president of the company in which Coch and French conducted their studies, reported that union-instituted job changes with money bonuses had failed to change production rates or to get workers to accept job changes. When the program of group decision-making was instituted, a control group that was changed by the customary technique objected bitterly: 17% quit the job, and the rest showed little improvement. The experimental group, on the other hand, as noted earlier reached and then exceeded their old levels with none quitting. Marrow concluded, as did Coch and French, that participation is the key to success in group production.
In 1952 Darley et al. (13) reported a study on group productivity only. The groups were residents of 13 women's cooperative housing units at the University of Minnesota. Each house (or group) contained 7 to 16 students with its own president. The authors stressed that "efforts were made to create an in-group spirit that would characterize residents of the village and give them a feeling of belongingness to the group." The students had been living together in the house for several months, long enough to be considered to have developed a group "tradition." The task was to prepare a "plan for better cooperative living in the village," a first instance of a complex human relations problem not only in a realistic but "real" situation of actual concern to the Ss. Faculty judges ranked the reports for quality so that cash prizes could be given. In general, studies on productivity have advanced beyond the early use of the group as a social climate or as a means of inducing competition or cooperation. Recent work is moving toward work with people in real situations making a decision basic to their everyday work. This, however, had involved a subtle change from participating in selecting alternatives for a decision, as Lewin's Ss did, to participating in working out the action plan which will give effect to a decision already made, e.g., as it is in the Coch and French (11) study.
Furthermore, productivity experiments have tended not so much to contrast the participation in a group setting with participation in the individual setting, but, rather, to contrast different amounts and kinds of participation where the fullest participation usually was in a group setting. More valid comparisons might have been made if each individual in the control group had been consulted, had had an opportunity to discuss the whole problem, and had made his wishes known. Then the comparison would have been between the effect on productivity of participation as individuals and as a group.
As important as the generalizations from experimental data - two changes in methodology over the years and recent attempts to provide structure through mathematical models - are, it is a long step from Gordon's (22) statisticized group, or Allport's (1) non-interacting face-to-face groups to Bavelas' (see 40, pp. 264-266) ad hoc groups and Darley's (13) "traditioned" group. Many of the early experiments did not produce results relevant for the problem under investigation. The "statisticized" group gives little evidence about real groups in real situations. As has been indicated, if all the judges are from the same population, the group products are sheer reiterations of the value of the Spearman- Brown prophecy formula.
Similarly, non-interacting groups shed more light on the significance of cooperation and competition, with and without an audience, than on the dynamics of grouping for the quality of the final product. Recent work with interacting ad hoc groups, and approximations toward the traditioned group, points the direction for estimating the consequences of groups in solving problems in real situations. There is some hope that group dynamics will be understood more fully, a hope more evanescent when grouping was by computation, i.e., without interaction.
A second advance is in the nature of the problem. The older less real tasks of ranking weights or estimating intelligence from photographs is not too crucial a basis for estimating group superiority. Fortunately, the trend is toward human relations problems as exemplified by Maier's parasol assembly line.
Yet, despite the accumulation of evidence suggesting group superiority, the question of the efficiency of the group has not been considered too often. Thorndike has hinted that for some problems, the group may not be as efficient as the individual. Husband (28) gives evidence that in terms of man-hours per correct solution, the group is not as efficient as the individual. The inefficiency in time cost is implicit in Watson's first study which showed the summated individual superior to the group. There is a practical need to specify the conditions under which the group may be used with the greatest efficiency so as to channel group process to circumstances for which the consequences are commensurate with the time and manpower used. It is on the question of efficiency of the group over the individual that mathematical models can at present play a very instructive role.
In this report we arc trying to focus on group productivity in contrast to individual productivity. Another active area of group research for the last 25 years has focussed on the study of group dynamics and group processes without regard to group productivity. Kelly and Thibaut (30) take a strong position on this latter research activity and state that continuing research along these lines "is indicative of major inadequacies in the research field," and that the reason for continuing research of this type "may be suspected to lie in the absence of any good theory about either individual or group processes . . ." (p. 780). While the senior author of this report recognizes the merit of this position, he has the feeling that perhaps a more tenable position is a realization that both approaches have their legitimate place in this broad area of study, and are supplementary instead of contradictory. The one real, though unfortunate, restriction that now exists is that simultaneous experimentation with social dynamics and processes and end products is highly difficult and usually unrewarding. This is probably due to the necessity for concentration on the very basic concepts and minutiae of both areas. The eventual goal is to develop a sizeable body of knowledge in both areas and perhaps hope for the convergence of this knowledge into one unified and consistent theory. An inherent recognition of this is also contained in Kelly and Thibaut's article in their discussion of the requirements for an understanding of the social processes involved in problem solving. Here again the exploitation of mathematical models might prove useful.
The recent interest in mathematical models is exemplified in papers by Lorge and Solomon (38) and by Hays and Bush (26). Lorge and Solomon provided two mathematical models to reproduce the Shaw data. In their mathematizations they essentially say that the observed better group performance can be explained simply on the basis of individual ability or interaction of individual ability, a conclusion at variance with Shaw's generalization that positive personal interaction is yielding the observed better group performance. Hays and Bush use the Humphreys type learning experiment to mathematically assess performance in groups of three and in individuals. They consider two conceptualizations: one where the group acts as an individual; the second where majority vote in each instance determines the group performance. These two conceptualizations were thought of as boundaries for group performance. An experiment was then run and the results obtained did lie between the two expected group performances. Obviously one importance of mathematization is that it provides the design for a next step in experimentation which might never be revealed solely by examination of data. In the Lorge-Solomon situation some new experimentation was indicated by the results. In the Hays-Bush study new experimentation where none had been performed before was indicated by their two models. An interesting account of these two papers and several others not directly related to problem solving or learning is given by Coleman (12) in a survey of mathematical models in small group behavior.
In general, in the evaluation of the relative quality of the products produced by groups in contrast to those produced by individuals, the group is superior. The superiority of the group, however, all too frequently, is not as great as would be expected from an interactional theory. In many studies, the product of the "best" individual is superior to that of the "best" group. It is quite probable that group solution may have its advantages in stimulating one another for, and in inducing cooperation for, a common solution. Yet, it must be recognized that group procedures may have disadvantages, too. A single member, or a coalition of members, may retard the group by holding out for its kind of solution—a consequent that may reduce the quality of the group product if the solutions so proposed are inadequate or unrealistic.
Obviously, it would be valuable to study group processes to ascertain how members facilitate or inhibit the development of a group product. But, it is just as important to ascertain the quality of product developed by groups in contrast to those of individuals. The researches reviewed in this survey are limited to the experimental assessment of the quality of group and of individual products. Currently, the research literature is in a terminological confusion, generally, in that the group involved in the experimental studies is an ad hoc group created by the experimenter for the experimenter's purposes, but that the experimenter generalizes to a traditioned group which has organized for a common purpose and has been interacting over a considerable period of time. A similar conflict of terms is involved in the definition of the individual. Experimentally, the individual is a person selected at random from some population, but the interpretation suggests that even the best individuals cannot appraise or solve a problem in the round. Since some individuals can, the essential question is to determine how the efficient and able individual compares with the traditioned and able group.
In the researches reviewed, moreover, the range of tasks varies from estimating (e.g., the number of beans in a bottle) to solving genuine problems (e.g., the productivity of an assembly line). The significance of these tasks in involving individuals and groups does differ and so do the problem-solving processes. Nevertheless, the experimenters tend to interpret the results of all tasks in terms of the relative ability of groups and of individuals to solve novel problems of policy or procedure.
The researches, to date, tend to treat the ad hoc group solving a trivial task as a prototype of the fully traditioned group solving a very important problem. The microscopic approach is a prerequisite to understanding the genuine group but is not itself the understanding.
1. ALLPORT, F. The influence of the group upon association and thought. J. exp, Psychol., 1920, 3, 152-182.
2. ASCH, M. J. Nondircctive teaching in psychology: An experimental study. Psychol. Monogr., 1951, 65, No. 4 (Whole no. 321).
3. ASCH, S. E., BLOCK, HELEN, & HERTZMAN, M. Studies in the principles of judgments and attitudes: I. Two basic principles of judgment. J. Psychol., 1938, 5, 219-251.
4. BALES, R. F., MILLS, T. M., ROSEIIOROUGH, MARY E., & STRODTBECK, F. L. Channels of communication in small groups. Amer. social. Rev., 1951, 16, 461-468.
5. BANE, C. L. The lecture vs. the class discussion method of college teaching. Sch. & Soc., 1925, 21, 300-302.
6. BARTON, W. A., JR. The effect of group activity and individual effort in developing ability to solve problems in first year algebra. Educ. admin. Supersis., 1926, 12,512-518.
7. BECHTEREV, W., & LANGE, A. Die Ergebnisse des Experiments auf dem Gebiete der kollektiven Reflexologie. Zeit. angrew. Psychol., 1924, 24, 305-344.
8. BRUCE, R. S. Group judgments in the fields of lifted weights and visual discrimination. J. Psychol., 1935-36, 1, 117-121.
9. BURTT, H. E. Sex differences in the effect of discrimination. . exp. Psychol., 1920, 3, 390-395.
10. CARTER, L., HAYTIIORN, W., LANZETTA, J., & MAIROWITZ, BEATRICE. The relation of categorizations and ratings in the observation of group behavior. Hum. Relat., 1951, 4, 239-254.
11. Coch, L., & FRENCH, J. R. P., Jr. Overcoming resistance to change. Hum. Relat., 1948, 1, 512-532.
12. COLEMAN, J. S. A survey of mathematical models of small group behaviour. New York: Bureau of Applied Social Research, 1956. (Behavioral Models Project Tech Rep. No. 10.)
13. DARLEY, J., GROSS, N., & MARTIN, W. Studies of group behavior: Factors associated with the productivity of groups. J. appl. Psychol., 1952, 36, 396-403.
14. DASHIELL, J. F. An experimental analysis of some group effects. J. abnorm. soc. Psychol, 1930, 25, 190-199.
15. DASHIELL, J. F. Experimental studies of the influence of social situations on the behavior of individual human adults. In C. Murchison (Ed.), Handbook of social psychology. Worcester: Clark Univer. Press, 1935, Pp. 1097-1158.
16. EYSENCK, H. J. The validity of judgments as a function of number of judges. J. exp. Psychol., 1939, 25, 650-654.
17. EYSENCK, H. J. The validity and reliability of group judgments. J. exp. Psychol., 1941, 29, 427-434.
18. FARNSWORTH, P., & WILLIAMS, W. F. The accuracy of the median and the mean of a group of judges. J. soc. Psychol., 1936, 7, 237-239.
19. GIBD, J, R. The effects of group size and of threat reduction upon creativity in a problem solving situation. Amer. Psychologist, 1951, 6, 324. (Abstract)
20. GORDON, KATE. A study of aesthetic judgments. J. exp. Psychol,, 1923, 6, 36-42.
21. GORDON, KATE. Group judgments in the field of lifted weights. J. exp. Psychol., 1924, 7, 389-400.
22. GORDON, KATE. Further observations on group judgments of lifted weights. J. Psychol., 193S-36, 1, 105-115.
23. GURNEE, H. A comparison of collective and individual judgments of fact. J. exp. Psychol., 1937, 21, 106-112.
24. GURNEE, H. Maze learning in the collective situation. J. Psychol., 1937, 3, 437-444.
25. GURNEE, H. The effect of collective learning upon the individual participants. J. abnorm. soc. Psychol., 1939, 34, 529-532.
26. HAYS, D. G., & BUSH, R. R. A study of group acting. Amer, social. Rev., 1954, 19, 693-701.
27. HILGARD, E. R., MAGARET, G. A., & SAIT, E. M. Level of aspiration as affected by relative standing in an experimental social group. J. exp. Psychol., 1940, 27, 411-421.
28. HUSBAND, R. W. Cooperative versus solitary problem solution. J, soc. Psychol., 1940, 11,405-409.
29. JENNESS, A. The role of discussion in changing opinion regarding a matter of fact. J. abnorm. soc. Psychol., 1932, 27, 279-296.
30. KELLY, H. H., & THIBAUT, J. W. Experimental studies of group problem solving and process. In G. Lindzey (Ed.), Handbook of social psychology. Cambridge, Mass.: Addison-Wesley, 1954. Pp. 735-785.
31. KLUGMAN, S. F. Group judgment for familiar and unfamiliar materials. J. genet. Psychol, 1945, 32, 103-110.
32. KLUGMAN, S. F. Group and individual judgments for anticipated events. J. soc. Psychol., 1947, 26, 21-33.
33. KNIGHT, H. C. A comparison of the reliability of group and individual judgments. Unpublished master's thesis, Columbia Univer., 1921.
34. LEVINE, J., & BUTLER, J. Lecture versus group decision in changing behavior. J. appl. Psychol., 1952, 36, 29-33.
35. LEWIN, K. Group decisions and social change. In T. Newcomb & E. Hartley (Eds.), Readings in social psychology. New York: Holt, 1947.
36. LORGE, I., DAVITZ, J., Fox, D., & HERROLD, K. Evaluation of instruction in staff action and decision-making. USAF Hum. Resource Res. Inst. Tech. Rep., 1953, No. 16.
37. LORGE, I., DAVITZ, J., Fox, D., HERROLD, K., & WELTZ, PAULA. Studies contrasting the quality of products by individuals and by groups. Unpublished manuscript, Institute of Psychological Research, Teachers College, Columbia Univer.
38. LORGE, I., & SOLOMON, H. Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20, 139-148.
39. LORGE, I., AIKMAN, L., Moss, GILDA, SPIEGEL, J., & TUCKMAN, J. Solutions by teams and by individuals to a field problem at different levels of reality. J. educ. Psychol, 1955, 46, 17-24.
40. MAIER, N. R. F. Psychology in Industry, Boston: Houghton Mifflin, 1946.
41. MAIER, N. R. E. The quality of group decisions as influenced by the discussion leader. Hum. Relat., 1950, 3, 155-174.
42. MARQUART, DOROTHY I. Group problem solving. J. soc. Psychol, 1955, 41, 103-113.
43. MARROW, A. Group dynamics in industry: Implications for guidance and personnel workers. Occupations, 1948, 26, 472-476.
44. MARSTON, W. M. Studies in testimony. J. crim. Law. Criminol, 1924, 15, 5—31.
45. MOORE, O. K., & ANDERSON, SCARVIA. Search behavior in individual and group problem solving. Amer. social, rev., 1954, 19, 702-714.
46. MCKERJI, N. F. Investigation of ability to work in groups and in isolation. Brit. J. Psychol., 1940, 30, 352-356.
47. PERLMUTTER, H. V. Group memory of meaningful material. J. Psychol., 1953, 35, 361-370.
48. PERLMUTTER, H. V., & DE MONTMOLLIN, GERMAINE. Group learning of nonsense syllables. J. abnorm. soc. Psychol., 1952, 47, 762-769.
49. PRESTON, M. Note on the reliability and validity of the group judgment. J. exp. Psychol., 1938, 22, 462-471.
50. RADKE, MARIAN, & KLISURICH, D. Experiments in changing food habits. J. Amer. diet. Ass., 1947, 23, 403-409.
51. ROBINSON , K. F. An experimental study of the effects of group discussion upon the social attitudes of college students. Speech Monogr., 1941, 8, 34-57.
52. RYAN, G. An experiment in class instruction versus individual study at college level. Unpublished doctoral dissertation, Johns Hopkins Univer., 1932.
53. SCHONBAR, ROSALEA. The interaction of observer pairs in judging. Arch. Psychol., N. Y., 1945, 299.
54. SENGUPTA, N. N. & SINKA, C. P. N. Mental work in isolation and in group. Indian J. Psychol., 1926, 1, 106-109.
55. SHAW, M. E. Comparison of individuals and small groups in the rational solution of complex problems. Amer. J. Psychol, 1932, 44, 491-504.
56. SHERIF, M. A study of some social factors in perception. Arch. Psychol., N. Y. 1935, 187.
57. SIMS, V. M. The relative influence of two types of motivation on improvement. J. educ. Psychol, 1928, 19, 480-484.
58. SMITH, B. The validity and reliability of group judgments. J. exp. Psychol, 1941. 29, 420-426.
59. SMITH, M. Group judgments in the field of personality traits. J. exp. Psychol 1931, 14, 562-565.
60. SOUTH, E. B. Some psychological aspects of committee work. J. appl Psychol, 1927, 11, 348-368.
61. SPENCE, R. B. Lecture and class discussion in teaching educational psychology. J. educ. Psychol, 1928, 19, 454-462.
62. STROOP, J. B. Is the judgment of the group better than that of the average member of the group. J. exp. Psychol., 1932, 15, 550-560.
63. TAYLOR, D. W., & FAUST, W. L. Twenty questions: Efficiency in problem solving as a function of size of group. J. exp. Psychol., 1952, 44, 360-368.
64. THIE, T. W. The efficiency of the group method. English J., 1925, 14, 134-137
65. THORNDIKE, R. L. The effects of discussion upon the correctness of group decision when the factor of majority influence is allowed for. J. soc. Psychol, 1938, 9, 343-362.
66. THORNDIKE, R. L. On what type of task will a group do well? J. abnorm. soc. Psychol., 1938, 33, 408-412.
67. TIMMONS, W. M. Decisions and attitudes as outcomes of the discussion of a social problem. Teachers Coll., Contrib. Educ., No. 777, Columbia Univer., Bureau of Publications, 1939.
68. TIMMONS, W. M. Can the product superiority of discussors be attributed to averaging and majority influences? J. soc. Psychol., 1942, 15, 23-32. 69. WAPNER, S., & ALPER, TIIELMA G. The effect of an audience on behavior in a choice situation. J. abnorm. soc. Psychol., 1952, 47, 222-229.
70. WATSON, G. B. Do groups think more efficiently than individuals? J. abnorm. soc. Psychol, 1928, 23, 328-336.
71. WATSON, G. B. A comparison of group and individual performances at certain intellectual tasks. Proc., Ninth intern. Congr. Psychol, 1929, 743.
72. WYATT, S., FROST, L., & STOCK, F. G. L. Incentives in repetitive work. Med. Res. Coun. Rep., 69. London: H. M. Stationery Office, 1934.
73. ZELENY, L. D. Teaching sociology by a discussion group method. Social Soc. Res., 1927, 162-172. 74. ZELENY, L. D. Experimental appraisal of a group learning plan. J. educ. Res., 1940, 34, 37-42.