Cornucopia of Disability Information

Appendix B - Source and Accuracy of the Estimates

CODI Directory

CODI Archives

Contact Webmaster

Appendix B - Source and Accuracy of the Estimates

[Index][Back]


Source Of Data

The SIPP universe is the noninstitutionalized resident population living in the United States. This population includes persons living in group quarters, such as dormitories, rooming houses, and religious group dwellings. Crew members of merchant vessels, Armed Forces personnel living in military barracks, and institutionalized persons, such as correctional facility inmates and nursing home residents, are not eligible to be in the survey. Also, United States citizens residing abroad are not eligible to be in the survey. Foreign visitors who work or attend school in this country and their families are eligible; all others are not eligible. With the exceptions noted above, field representatives interview eligible persons who are at least 15 years of age at the time of the interview.

SIPP sample for the 1990 and 1991 panels is located in 230 Primary Sampling Units (PSUs) each consisting of a county or a group of contiguous counties. Within these PSUs, we systematically selected expected clusters of two living quarters (LQs) from lists of addresses prepared for the 1980 decennial census to form the bulk of the sample. To account for LQs built within each of the sample areas after the 1980 census, we selected a sample containing clusters of four LQs from permits issued for construction of residential LQsup until shortly before the beginning of the panel.

In jurisdictions that have incomplete addresses or don't issue building permits, we sampled small land areas, listed expected clusters of four LQs, and then subsampled. In addition, we selected a sample of LQs from a supplemental frame that included LQs identified as missed in the 1980 census.

The 1990 panel differs from other panels as a result of oversampling for low income households. The panel contains an oversample of Black headed households, Hispanic headed households and female headed family households with no spouse present and living with relatives.

The first interview for the 1990 and 1991 panels occurred during February, March, April, or May of 1990 and 1991, respectively. Interviews for approximately one-fourth of the sample took place in each of these months. For the remainder of the panels, interviews for each person occurred every 4 months. At each interview the reference period was the 4 months preceding the interview month.

Occupants of about 93 percent of all eligible living quarters participated in the first interview of the panel. For later interviews, field representatives interviewed only original sample persons (those in Wave 1 sample households and interviewed in Wave 1) and persons living with them. The Bureau automatically designated all first wave noninterviewed households as noninterviews for all subsequent interviews. Field representatives conducted personal interviews in the first, second, and sixth waves only. The remaining interviews were telephone interviews. For personal interviews we followed original sample persons if they moved to a new address, unless the new address was more than 100 miles from a SIPP sample area. If the original sample persons moved farther than 100 miles from a SIPP sample area, we attempted telephone interviews. When original sample persons moved to remote parts of the country and were unreachable by telephone, moved without leaving a forwarding address, or refused the interview, additional noninterviews resulted.

As a part of most waves, we cover subjects that are important to meet SIPP goals and don't require repeated measurement during the panel. The data on these subjects are of particular interest to data users and policy makers. We cover these subjects once during the panel or annually. By collecting data once for the panel or annually, we reduce respondent burden. We call a specific set of questions on a subject a topical module. For this report the topical modules analyzed include questions on disability status. We implemented them in Wave 6 of the 1990 panel (and Wave 3 of the 1991 panel).

(Since Wave 6 of the 1990 panel and Wave 3 of the 1991 panel are concurrent and contain the same relevant topical modules on disability status, we combined the data and analyzed it as a single data set. The primary motivation for combining this data is to obtain an increase in sample size and reduce effects of nonresponse over the life of the panel.)

Noninterviews

Tabulations in this report were drawn from interviews conducted from October 1991 through January 1992. Table B-1 summarizes information on nonresponse for the interview months in which we collected the data used to produce this report. Some respondents do not respond to some of the questions. Therefore, the overall nonresponse rate for some items such as income and money related items is higher than the nonresponse rates in table B-1. For more discussion of nonresponse see the Quality Profile for the Survey of Income and Program Participation, May 1990, by T. Jabine, K. King, and R. Petroni, available from Customer Services, Data Users Services Division, of the U.S. Census Bureau (301-763-6100).

Weighting Procedure

We derived SIPP person weights in each panel from several stages of weight adjustments. In the first wave, we gave each person a base weight equal to the inverse of his/her probability of selection. For each subsequent interview, the Bureau gave each person a base weight that accounted for following movers. We applied a factor to each interviewed person's weight to account for the SIPP sample areas not having the same population distribution as the strata they are from.

We applied a noninterview adjustment factor to the weight of every occupant of interviewed households to account for persons in noninterviewed occupied households which were eligible for the sample. (The Bureau treated individual nonresponse within partially interviewed households with imputation. We made no spe- cial adjustment for noninterviews in group quarters.)

The Bureau used complex techniques to adjust the weights for nonresponse. For a further explanation of the techniques used, see the Nonresponse Adjustment Methods for Demographic Surveys at the US. Bureau of the Census, November 1988, Working paper 8823, by R. Singh and R. Petroni. The success of these techniques in avoiding bias is unknown. An example of successfully avoiding bias is in "Current Nonresponse Research for the Survey of Income and Participation" (paper by Petroni, presented at the Second International Workshop on Household Survey Nonresponse, October 1991).

We performed an additional stage of adjustment to persons' weights to reduce the mean square errors of the survey estimates. We accomplished this by ratio adjusting the sample estimates to agree with monthly Current Population Survey (CPS) type estimates of the civilian (and some military) noninstitutional population of the United States by demographic characteristics including age, sex, and race as of the specified date. The Bureau brought CPS estimates by age, sex, and race into agreement with adjusted estimates from the 1980 decennial census. Adjustments to the 1980 decennial census estimates reflect births, deaths, immigration, emigration, and changes in the Armed Forces since 1980. In addition, we controlled SIPP estimates to independent Hispanic controls and made an adjustment to assign equal weights to husbands and wives within the same household. We implemented all of the above adjustments for each reference month and the interview month.

Accuracy of Estimates

We base SIPP estimates on a sample. The sample estimates may differ somewhat from the values obtained from administering a complete census using the same questionnaire, instructions, and enumerators. The difference occurs because with an estimate based on a sample survey two types of errors are possible: nonsampling and sampling. We can provide estimates of the magnitude of the SIPP sampling error, but this is not true of nonsampling error. The next few sections describe SIPP nonsampling error sources, followed by a discussion of sampling error, its estimation, and its use in data analysis.

Nonsampling Variability

We attribute nonsampling errors to many sources, they include:

  • inability to obtain information about all cases in the sample
  • definitional difficulties
  • differences in the interpretation of questions
  • inability or unwillingness on the part of the respondents to provide correct information
  • inability to recall information errors made in collection (e.g. recording or coding the data)
  • errors made in processing the data
  • errors made in estimating values for missing data
  • biases resulting from the differing recall periods caused by the interviewing pattern used
  • undercoverage

We used quality control and edit procedures to reduce errors made by respondents, coders and interviewers. More detailed discussions of the existence and control of nonsampling errors in the SIPP are in the SIPP Quality Profile.

Undercoverage in SIPP resulted from missed living quarters and missed persons within sample households. It is known that undercoverage varies with age, race, and sex. Generally, undercoverage is larger for males than for females and larger for Blacks than for non-Blacks. Ratio estimation to independent age-race-sex population controls partially corrects for the bias due to survey undercoverage. However, biases exist in the estimates when persons in missed households or missed persons in interviewed households have characteristics different from those of interviewed persons in the same age-race-sex group. Further, we didn't adjust the independent population controls for undercoverage in the Census.

A common measure of survey coverage is the coverage ratio, the estimated population before ratio adjustment divided by the independent population control. Table B-2 shows CPS coverage ratios for age-sex-race groups for 1992. The CPS coverage ratios can exhibit some variability from month to month, but these are a typical set of coverage ratios. Other Census Bureau household surveys like the SIPP experience similar coverage.

Comparability with Other Estimates

Exercise caution when comparing data from this report with data from other SIPP publications or with data from other surveys. Comparability problems are from varying seasonal patterns for many characteristics, different non-sampling errors, and different concepts and procedures. Refer to the SIPP Quality Profile for known differences with data from other sources and further discussion.

Sampling Variability

Standard errors indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration, but do not measure any systematic biases in the data. The standard errors mostly measure the variations that occurred by chance because we surveyed a sample rather than the entire population.

Uses and Computation of Errors

Confidence Intervals. The sample estimate and its standard error enable one to construct confidence intervals, ranges that would include the average result of all possible samples with a known probability. For example, if we selected all possible samples and surveyed each of these under essentially the same conditions and with the same sample design, and if we calculated an estimate and its standard error from each sample, then:

1 - Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average result of all possible samples.

2 - Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average result of all possible samples.

3 - Approximately 95 percent of the intervals from two standard errors below the estimate to two standard errors above the estimate would include the average result of all possible samples.

The average estimate derived from all possible samples is or is not contained in any particular computed interval. However, for a particular sample, one can say with a specified confidence that the confidence interval includes the average estimate derived from all possible samples.

Hypothesis Testing

One may also use standard errors for hypothesis testing. Hypothesis testing is a procedure for distinguishing between population characteristics using sample estimates. The most common type of hypothesis tested is 1) the population characteristics are identical versus 2) they are different. One can perform tests at various levels of significance, where a level of significance is the probability of concluding that the characteristics are different when, in fact, they are identical.

Unless noted otherwise, all statements of comparison in the report passed a hypothesis test at the 0.10 level of significance or better. This means that, for differences cited in the report, the estimated absolute difference between parameters is greater than 1.6 times the standard error of the difference.

To perform the most common test, compute the difference XA - XB, where XA and XB are sample estimates of the characteristics of interest. A later section explains how to derive an estimate of the standard error of the difference XA - XB. Let that standard error be S(DIFF). If XA - XB is between -1.6 times SDIFF and +1.6 times SDIFF, no conclusion about the characteristics is justified at the 10 percent significance level. If, on the other hand, XA-XB is smaller than -1.6 times S(DIFF) or larger than +1.6 times S(DIFF), the observed difference is significant at the 10 percent level. In this event, it is commonly accepted practice to say that the characteristics are different. Of course, sometimes this conclusion will be wrong. When the characteristics are, in fact, the same, there is a 10 percent chance of concluding that they are different.

Note that as we perform more tests, more erroneous significant differences will occur. For example, at the 10 percent significance level, if we perform 100 independent hypothesis tests in which there are no real differences, it is likely that about 10 erroneous differences will occur. Therefore, interpret the significance of any single test cautiously.

Note Concerning Small Estimates and Small Differences

We show summary measures in the report only when the base is 200,000 or greater. Because of the large standard errors involved, there is little chance that estimates will reveal useful information when computed on a base smaller than 200,000. Also, nonsampling error in one or more of the small number of cases providing the estimate can cause large relative error in that particular estimate. We show estimated numbers, however, even though the relative standard errors of these numbers are larger than those for the corresponding percentages. We provide smaller estimates primarily to permit such combinations of the categories as serve each user's needs. Therefore, be careful in the interpretation of small differences since even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test.

Standard Error Parameters and Tables and Their Use

Most SIPP estimates have greater standard errors than those obtained through a simple random sample because we sampled clusters of living quarters for the SIPP. To derive standard errors at a moderate cost and applicable to a wide variety of estimates, we made a number of approximations. We grouped estimates with similar standard error behavior and developed two parameters (denoted "a" and "b") to approximate the standard error behavior of each group of estimates. Because the actual standard error behavior was not identical for all estimates within a group, the standard errors we computed from these parameters provide an indication of the order of magnitude of the standard error for any specific estimate. These "a" and "b" parameters vary by characteristic and by demographic subgroup to which the estimate applies. Use base "a" and "b" parameters found in table B-6 for combined 1990/91 panel estimates.

For users who wish further simplification, we also provide general standard errors in tables B-4 and B-5. Note that you need to adjust these standard errors by a factor from table B-6. The standard errors resulting from this simplified approach are less accurate. Methods for using these parameters and tables for computation of standard errors are given in the following sections.

Standard Error of Estimated Numbers

There are two ways to compute the approximate standard error, sx, of an estimated number shown in this report. The first uses the formula formula where f is a factor from table B-6, and s is the standard error of the estimate obtained by interpolation from table B-4. Alternatively, approximate SX using the formula,

formula

from which we calculated the standard errors in table B-4. Here x is the size of the estimate and a and b are the parameters in table B-6 associated with the particular type of characteristic. Use of formula 2 will provide more accurate results than the use of formula 1. When calculating standard errors for numbers from crosstabulations involving different characteristics, use the factor or set of parameters for the characteristic which will give the largest standard error.

Illustration SIPP estimates from text table B of this report show that there were 9,685,000 persons 15 years and older who experienced difficulty in seeing words and letters. The appropriate "a" and "b" parameters and "f" factor from table B-6 and the appropriate general standard error from table B-4 are

formula

Using formula 1, the approximate standard error is

formula

Using formula 2, the approximate standard error is

formula

Based on the standard error from formula 2, the 90-percent confidence interval as shown by the data is from 9,340,000 to 10,030,000. Therefore, a conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all samples.

Standard Errors of Estimated Percentages

The reliability of an estimated percentage, computed using sample dataforboth numeratorand denominator, depends on the size of the percentage and its base. When the numerator and denominator of the percentage have dfflerent parameters, use the parameter (or appropriate factor) from table B-6 indicated by the numerator. Calculate the approximate standard error, s(xp) of an estimated percentage p using the formula

formula

where p is the percentage of persons/families/households with a particular characteristic such as the percent of persons owning their own homes.

In this formula, f is the appropriate "f" factor from table B-6, and s is the standard error of the estimate obtained by interpolation from table B-5. Alternatively, approximate it by the formula:

formula

from which we calculated the standard errors in table B-5. Here x is the total number of persons, families, households, or unrelated individuals in the base of the percentage, p is the percentage (0

<100), and b Is the "b" parameter in table B-6 associated with the characteristic in the numerator of the percentage. Use of this formula will give more accurate results than use of formula 3 above.

Illustration. Text table B shows that an estimated 5 percent of all persons 15 years and older had difficulty seeing words and letters. Using formula 3 with the "f" factor from table B-6 and the appropriate standard error from table B-5, the approximate standard error is

formula

Using formula 4 with the "b" parameter from table B-6, the approximate standard error is

formula

Consequently, the 90-percent confidence interval using the standard error from formula 4 is from 4.8 percent to 5.2 percent as shown by the data.

Standard Error of a Difference

The standard error of a difference between two sample estimates, x and y, is approximately equal to

formula

where SX and sy are the standard errors of the estimates x and y and r is the correlation coefficient between the characteristics estimated by x and y. The estimates can be numbers, averages, percents, ratios, etc. Underestimates or overestimates of standard error of differences result if the estimated correlation coefficient is overestimated or underestimated, respectively. In this report, we assume r is 0.

Illustration. Again using text table B, 3.4 percent of persons 15 to 64 years of age had a problem hearing normal conversation, while for those 65 and over the figure was 17.6 percent. The standard errors for these percentages are computed using formula 4 to be .14 percent and .71 percent, respectively. Assuming that these two estimates are not correlated, the standard error of the estimated difference of 14.2 percentage points is

formula

To test whether the two percentages differ significantly at the 10-percent significance level, compare the difference of 14.2 percent to the product of 1.6 x .72 percent=1.15 percent. Since the difference is larger than 1.6 times the standard error of the difference, the data show that the estimates of 3.4 and 17.6 percent differ significantly at the 10-percent level.

Standard Error of a Mean

Define a mean as the average quantity of some item (other than persons, families, or households) per person, family, or household. (For the mean of these other items, compute the standard error of a ratio.) For example, the mean could be the average monthly household income of females age 25 to 34. Approximate the standard error of such a mean by formula 6 below. Because of the approximations used in developing formula 6, an estimate of the standard error of the mean obtained from that formula will generally underestimate the true standard error.

The formula used to estimate the standard error of a mean x is

formula

where y is the size of the base, s2 is the estimated population variance of the item and b is the parameter associated with the particular type of item. We estimated the population variance s2 by assuming xj is the value of the item for unit i. (Unit could be person, family, or household.) Then we divided the range of values for the item into c intervals. The upper and lower boundaries of interval j were Zj-1 and Zj, respectively. We placed each unit into one of c groups such that Zj-1 < xi <=Zj. The estimated population variance, s2, is given by formula:

formula

where pj is the estimated proportion of units in group j, and mj=(Zj-1 + Zj) / 2. We assume the most representative value of the item in group j is mj. If group c is open ended, i.e., no upper interval boundary exists, then an approximate value for mc is

formula

Compute the mean, x, using the following formula:

formula

Illustration Suppose that the distribution of monthly earnings among workers with a disability is given in table B-3. The mean monthly earnings from formula 9 is

formula

Using formula 7 and the mean monthly earnings of $2,672 the estimated population variance, s2, is

formula

The appropriate "b" parameter from table B-6 is 5,001. Now, using formula 6, the estimated standard error of the mean is

formula

[Index] [Back]

MOST POPULAR DOCUMENTS:   ADA Accessibility Guidelines | Disabled Students in Higher Education | Caregiver Stress: Causes & Treatment | History of Disabilities and Social Problems | Disability Statistics | Using Knowledge and Technology
This site is maintained by Jennifer Weir, Disability Services at Texas A&M University -- Corpus Christi
>