ASA Sections on: Statistical Computing

[ Awards, Data expo, Video library ] [ Events, News, Newsletter ] 
John M. Chambers and Barbara F. Ryan
The involvement of statisticians with computing goes back a very long time, long before any device existed that we would call a computer today. The U.S. Bureau of the Census and its predecessors, for example, have wrestled since the inception of the decennial census with the challenge of recording, transmitting, storing, and processing the growing volume of data (e.g., see a fascinating history by K. S. ReidGreen in the February 1989 issue of Scientific American). Since American Statistical Association (ASA) members have always been instrumental in census activities, we can confidently assert that ASA has been concerned with computing from the association's earliest years. Similar arguments apply also to the statistical and computing needs of other organizations, such as the U.S. Bureau of Standards.
With the advent of programmed computing on a substantial scale in the 1950s and 1960s, the involvement of statisticians in research, development, and use of computing facilities became increasingly central to much activity in the profession. Growing recognition that the ASA should acknowledge the role of computing culminated in the formation of the Statistical Computing Section in 1972 and in the section's contributions to the association and the profession since then.
By the middle of the 1960s, much statistical software existed, of varied styles and aimed at varying applications. Early software may seem crude from our present perspective, just as the computers it ran on seem amazingly primitive now. But these early efforts had already changed statisticians' views of their activities profoundly and permanently. Those who were involved in statistical computing had a conviction that associations like ASA had to respond by recognizing the new field. Several events and activities during the 1960s provided special impetus to that conviction and the formation of the section.
In December 1966 a conference on statistical computing was held in the United Kingdom, under the auspices of the Royal Statistical Society. Papers at the conference presented both existing software systems and general ideas for future directions. The papers appeared later in the journal Applied Statistics. Discussions during and after the conference led to the formation of a Working Party on Statistical Computing by the Royal Statistical Society. Though the focus of the conference was naturally British, one of the papers and several of the participants in the conference came from the United States. Some of the participants, including John Chambers and Paul Meier, returned later to the United States, anxious to see ASA become involved in this area as well.
Beginning in 1968, a sequence of meetings generally called the Interface Symposia created a remarkable selfsustaining forum for exploring the interactions between statistics and computing. Originating in California from the efforts of Arnold Goodman and Nancy Mann, the symposia have flourished ever since and continue to draw together the two disciplines. In 1969 another conference was held on statistical computing, this time at the University of WisconsinMadison. Papers from this conference were published as a book. Statistical Computation, edited by Roy Milton and John Nelder. The association responded to all of these stimuli at a typically deliberate pace. The first concrete step was the appointment of a Committee on Computers in Statistics, originally appointed to serve for 1970. The committee had 10 members: Meier (chair), Frank Anscombe, Joe Cameron, Chambers, Joe Daly, Art Dempster, Wil Dixon, Michael Godfrey, Merv Muller, and Martin Schatzoff. As was typical of all early ASA computing activities, the members ranged from those directly involved in computing research to sympathetic observers who rarely, if ever, touched a keypunch themselves. The name given to the committee suggests, in retrospect, a continuing emphasis on computing machinery as something with an impact on statisticians, rather than on computing as an activity by statisticians. Change, however, was clearly on the way. The committee sponsored three sessions at the December 1970 annual meeting, marking the first time that computing had an official part at the meeting. The sessions were "Algorithms for Statistical Computation," "Computers in Teaching Statistics," and "Data Handling."
The next major step also came out of the December 1970 meeting: a petition with 75 signatures calling for an ASA section on statistical computing. Dixon sent the petition to Churchill Eisenhart, ASA president, on January 5, 1971. This event finally set in motion the creation of the section. Eisenhart replied sympathetically, outlining the steps to be taken to get approval and asking Dixon to prepare a charter statement. That draft charter cited six purposes for the section. The first and the sixth defined desirable interactions, in both directions, between statistics and computing: "to encourage the interest and involvement of the membership of ASA in statistical computing" and "to encourage the applications of statistics in the design, maintenance, and evaluation of computing hardware and software systems." The first has been the main theme of the section ever since. The other, the application of statistics to computing, has flickered in and out over the years in the form of compumetrics, program evaluation, software quality, and other topics. The other four purposes involved activities such as interacting with other organizations and providing a "resource" in statistical computing.
The approval process went forward smoothly, and a section was approved at the 1971 annual meeting. The first officers, appointed for 1972, were Dixon as chair, Dempster as program chair, and Alan Forsythe as secretary/treasurer. In its first election the section chose a partial slate of officers (Chambers as chair and Wes Nicholson as program chair) for 1973 and a full set of officers for 1974. From then on, the section was in business.
As the new field evolved, the section evolved with it. In the early 1970s statistical computing was still working toward a clear identity and recognized respectability, especially among the more mathematically inclined. Theorems, or even crisp results, are rare, and some of the most important developments, such as the interactive exploration of data, cannot be adequately captured in written form.
The computingsection sessions at the ASA annual meeting and the annual Symposium on the Interface were the primary homes for statisticians interested in computing. Even though the formal ties between these two groups were loose, the informal ties were strong and close; we regarded the two meetings as our spring and summer meetings on statistical computing. Journals were not readily available to us, too slow for the fastmoving technology, and often inappropriate for our work. So we exchanged our ideas and software informally, at these meetings.
Certain basic areas in computing emerged earlysimulation, algorithms, computerized data analysis, and software. These remain the major areas in statistical computing today.
Surprisingly, perhaps, statisticians do not do much basic research in simulation as a technique, but they do use it extensively. Most theoretical work is supported to some extent by simulation, and most ASA meetings contain talks that use simulation. A few meetings, including the first for this section, contain sessions that develop simulation methodology. In 1985 the section became a cosponsor of the Winter Simulation Conference.
Algorithms, on the other hand, have the largest representation of the four areas mentioned previously. They form the supporting base for computing and hence for most data analysis. In the 1970s, research in the area emphasized algorithms for linear models, primarily analysis of variance and regression. Most ASA meetings had at least one invitedpaper session on this topic, and they always had many contributed papers. We were all trying to learn how to do calculations efficiently and correctly, how numerical analysts could help us, and how calculations were actually being done in commercial software.
In the 1980s, work in algorithms was much more diverse and included traditional research in linear methods as well as the new computerintensive methods for data analysis. The annual meetings had talks on mixed models in analysis of variance (ANOVA), optimality criteria for designed experiments, projection pursuit, nonparametric regression, clustering, methods for analyzing incomplete data, image analysis, Bayesian methods, and smoothing.
In 1981 the Committee on Statistical Algorithms was formed, with Sally Howe as chair, to develop a classification scheme for statistical algorithms and an index of available algorithms. During the next few years the committee compiled a guide with more than 2,500 entries.
We also explored ways to use computers effectively to analyze data. The area discussed most, with a total of seven sessions, was data management, including how to manage large, complex data sets, how to assure the quality of research data, and how to store data efficiently. Some of the most popular sessions were on the use of computers for graphics, from simple lineprinter plots to interactive, dynamic graphics systems. In addition, there were talks on using computers in specific areas of analysis, including biased estimation, missingdata analysis, survival analysis, regression, ANOVA and the analysis of frequency data.
The last major area in computing, software, contains some of the most interesting and controversial talks, those that evaluated statistical packages. The 1970s was the era of the central mainframe computer, and most statistical packages were large batch programs (remember punched cards). User interfaces were unfriendly, the methods were often obscure, and the answers were sometimes even wrong.
In 1973 Ivor Francis and Richard Heiberger asked the section to form a committee to evaluate statistical packages. With Paul Velleman, they put together a report covering criteria, considerations, and test plans for evaluation. The report was presented at the 1974 ASA annual meeting and subsequently appeared in The American Statistician. During the next eight years the Committee on the Evaluation of Statistical Program packages and its successor, the Committee on the Evaluation of Statistical Software, organized more than a dozen sessions at the interface and ASA meetings, evaluating various aspects of large batch packages, such as BMDP, Omnitab, PStat, SAS, and SPSS. There were talks on regression, ANOVA, cluster analysis, time series, nonlinear regression, graphics, and user documentation.
In 1983 Ken Berk, then chair of the committee, wrote a letter to the editor of The American Statistician, suggesting that software reviews appear in print so that more people could use them. The next year, he was appointed editor of a new section of The American StatisticianStatistical Computing Software Reviews. In February 1985 the first reviews were published.
The lack of appropriate publication outlets for work in statistical computing has been a problem and topic of discussion since the beginning of the section. As early as 1973 we explored the possibility of creating a journal appropriate for our work. As a first step, in 1975 the section started publishing the proceedings of papers presented at the ASA annual meeting. In December 1988 William Eddy, as editor, with six associate editors, put together a special Applications section in the Journal of the American Statistical Association. Active promotion of a new journal continues.
The section is governed by an executive committee consisting of a chair, chairelect, program chair, program chairelect, secretary/treasurer, board representative, publications liaison officer, two American Federation of Information Processing Societies representatives, and a representative to the committee on sections. The committee meets at the ASA annual meeting and, since 1981, also meets at the interface meetings. In 1983 Barbara Ryan started what has become a tradition of having a mixer, sponsored by commercial software developers, after the section business meeting.
While writing this article, John M. Chambers was Statistician, Statistics Research Departments, AT&T Bell Laboratories, Murray Hill, NJ 07974. He is still with Bell Laboratories, but Bell Labs is now part of Lucent Technologies.
Barbara F. Ryan was a Visiting Scholar, Department of Statistics, Stanford University, Stanford, CA 94305. She is now President and Chief Executive Officer of Minitab, Inc.
This paper originally appeared in The American Statistician, May 1990 (Volume 4, No. 2, pp 8789).