A COMPARISON BETWEEN CLUSTER AND "RANDOM" SAMPLING

**************************************************************************************

The Journal of Social Psychology, 1983, 121, 155-156.

A COMPARISON BETWEEN CLUSTER AND "RANDOM" SAMPLING*

University of New South Wales, Australia

JOHN J. RAY

By far the most important gap between statistical theory and statistical practice is the fact that most statistical methods assume some sort of random sampling, whereas it seems doubtful that any truly random sample of any important human population has ever been obtained. There must be a "volunteer artifact": not all the people we wish to sample finally agree to co-operate. A vital policy-decision in real-world sampling is then what compromises we are prepared to accept in deviating from perfect random sampling.

A popular compromise is cluster sampling. In this method only starting points, not people, are randomly sampled. Whether the point is a person (typically chosen from voter registration lists) or a street intersection (typically chosen from area maps), there need not be great difficulties in ensuring randomness, and the interviewers are sent out to each point with instructions (of varying elaborateness) to interview (say) the 10 nearest cooperative people to each point specified. The method has three advantages: it cuts down on travelling time between calls and thus on cost; it enables a non-co-operative person to be replaced by a neighbor in probably similar economic circumstances (insofar as suburbs are homogeneous); it enables people to be reached who do not appear on master lists (e.g., transients who have not registered to vote).

Purists, however, say that accepting one type of defect in one's sample is no warrant for injecting additional ones and that samples so obtained will be artificially homogeneous. A sample of 1000 obtained in clusters of 10 is really little more than a sample with an N of 100. The answer that cluster samplers give is twofold: (a) it seems to work very well at predicting voting percentages in elections; (b) it works well because the psychological difference between even next-door neighbors is typically great. Working interviewers who have often beheld how little agreement there is even between husband and wife on attitude and personality questions will be in little doubt that the members of any one cluster are in fact typically heterogeneous.

The issue in evaluating cluster sampling is largely to determine the effect of its theoretically greater homogeneity. This question, unfortunately, can never be settled as the answer must surely vary according to the variables being sampled. We may, however, be able to begin to work towards generalizations if the differences in results between clustered and simple "random" sampling are more frequently reported.

A survey employed a random cluster sample of 570 people (in clusters of five) taken in the Upper Hunter Valley area of New South Wales, Australia. The subject was attitudes towards the environment (fuller details are available elsewhere) (1). Sampling proceeded by choosing as starting points random names (and addresses) from the Australian electoral rolls (voter registration lists). As Australia has compulsory voter registration for all citizens over 18 and for many non-citizens, the sampling frame was unusually comprehensive. Results from the randomly-chosen person at the "focus" of each cluster were kept separate. The larger cluster sample thus included a smaller simple "random" sample of 94 persons. In spite of the cluster sample's being larger, it should in theory still have been much more homogeneous than the simple "random" sample and a comparison between the two should reveal any effects that this homogeneity had.

Although not necessarily so, it seemed possible that a more homogeneous sample should give more internally consistent answers to attitude questions. This prediction was confirmed. The internal reliability (alpha) of the Attitude to the Environment scale was .66 on the cluster sample and .55 on the "random" sample. The means and SDs, however, were 69.58 (8.70) and 69.25 (7.66), respectively. Since the accurate estimation of means is the usual purpose of a survey, this very small difference tends to support the view that in this one instance the greater homegeneity of a cluster sample was not a serious distorting influence.

The survey also included ratings of the effects of strip mining on 39 community variables. The means of all these ratings also differed between the two samples by similar small fractions of their standard deviations.

----------------

1. Ray, J.J. (1980) Does living near a coal mine change your attitude to the environment: A case study of the Hunter valley. Australian & New Zealand Journal of Sociology 16(3), 110-111.

Go to Index page for this site

Go to John Ray's "Tongue Tied" blog (Backup here)
Go to John Ray's "Dissecting Leftism" blog (Backup here)
Go to John Ray's "Australian Politics" blog (Backup here)
Go to John Ray's "Gun Watch" blog (Backup here)
Go to John Ray's "Education Watch" blog (Backup here)
Go to John Ray's "Socialized Medicine" blog (Backup here)
Go to John Ray's "Political Correctness Watch" blog (Backup here)
Go to John Ray's "Greenie Watch" blog (Backup here)
Go to John Ray's "Leftists as Elitists" blog (Not now regularly updated)
Go to John Ray's "Marx & Engels in their own words" blog (Not now regularly updated)
Go to John Ray's "A scripture blog" (Not now regularly updated)
Go to John Ray's recipe blog (Not now regularly updated -- Backup here)

Go to John Ray's Main academic menu
Go to Menu of recent writings
Go to John Ray's basic home page
Go to John Ray's pictorial Home Page (Backup here)
Go to Selected pictures from John Ray's blogs (Backup here)
Go to Another picture page (Best with broadband)