Cover Image
close this bookProtein-Energy Requirements of Developing Countries: Evaluation of New Data (UNU, 1981, 268 p.)
close this folderIntroduction
View the documentStatistical considerations in the estimation of protein requirements
View the documentA summary analysis of the nitrogen-balance data

Statistical considerations in the estimation of protein requirements

Precise estimation of the amount of protein that human populations require is too complex and too general a problem to be defined, much less accomplished, with our present understanding of human nutrition. The more specific problem of how to estimate the needs of a specific fraction of a specific population for protein derived from a diet of known composition is, however, a problem that can be explored.

From a statistical point of view there are four distinct steps to reach such an estimate:

a. measurement of the nitrogen content of a sample;
b. determination of the nitrogen balance of an individual receiving a constant dietary intake;
c. estimation of the response of an individual to varying levels of a specific protein;
d. estimation of the fraction of a population that would be in positive nitrogen balance at various levels of a specific protein.

In order to interpret the data presented at this workshop, it is necessary to consider the uncertainties at each of these steps.

1. Measurement of Nitrogen Content

Determining the nitrogen content of a sample is a complex procedure, with various potential sources of error. The uncertainty of the resultant value of nitrogen content can be considered to be composed of two different types of error. The first is the error of the method itself as carried out by a specific laboratory. This can be estimated by the reproducibility of results within a laboratory; results run one time can be compared with results run at another time. Second, methods may differ, and laboratories need to be compared with each other to see whether different results indicate different phenomena or just different procedures.

2. Determination of Nitrogen Balance

Nitrogen balance is defined as the difference between nitrogen intake and output. Nitrogen output is, in turn, calculated as the sum of the nitrogen contained in urine and faeces and that of miscellaneous losses, primarily sweat, all usually expressed as per kilogram of body weight:

Nbal = IN-(UN + FN + MN)

Since the components of output appear to be uncorrelated, for a constant intake the variances of balance will be simply the sum of the variances of the individual quantities.

V (Nbal) = V (iN) + V (UN) + V (FN) + V (MN)

The observed variability of balance data can be usefully partitioned into measurement error (mentioned above) and inherent biological variability. The day-to-day variability of nitrogen balance has been explored by us using long-term studies of individuals fed a constant intake of protein. Most importantly, we found no pattern of variability over time. Other investigators have suggested the existence of longterm cycles in the data (i.e., that urinary nitrogen losses are serially correlated); however, we are unable to reproduce these findings.

Since many of the data regarding the response to constant levels of nitrogen intake are taken from experiments in which the individual changed from one level or pattern of nitrogen intake to another, the short-term adaptation period is very important. Our investigations, which are consistent with those of other workers in the field, suggest that within five to seven days individuals reach a steady state, or at least a state that cannot be discriminated from a new steady state.

3. Individual Response

Knowledge of how an individual responds to a single level of intake of a specific protein is only moderately useful-it tells us merely whether that level will or will not fulfil an individual's needs for nitrogen. If, however, an individual's balance is measured at several different levels of intake, these data can be used to describe the individual's general response to that particular protein and permit an estimation of the particular level necessary for that individual.

Fundamental to this is the concept of a function that relates nitrogen balance to nitrogen intake. Since balance directly includes intake, it is better to consider the relationship between output and intake. While we do not know the true form of this function, we can approximate it by a straight line for a limited range of intakes; it is this line that we seek to estimate.

ON =a+bIN

For experiments that follow the standard UNU protocol we have, for each individual, at each of the n levels of intake (I1, ..., IN), a single faecal determination and five urinary determinations.

(Fi, Ui1, Ui2, U i3, Ui4, Ui5) i = 1, ..., u (no. of levels)

If faecal nitrogen does not vary over the ranges examined, faecal averages can be calculated:



For the urinary data the replicates at each level of intake can be used to calculate a mean and variance of urinary output at each level:







These values can then be used to estimate a urinary response curve by means of weighted linear regression, with each point being weighted by the reciprocal of its variance.



au = (1/n) S (I i /Sj2 ) - bu(1/n)SUi/Si2

These estimates of urinary and faecal response are combined with an estimate of miscellaneous losses (usually 5 mg N/kg) to give a function representing nitrogen output for that individual.

ON = (au+ F + 5) + bu IN

For an estimate of the amount of nitrogen that that individual would require (from that dietary source), the point at which intake would balance output is calculated. Note that this is our mean estimate for each specific individual.

IR=(au+F+5)/(1 -bu)

4. Population Requirement

The next step is to put individuals together to estimate a population response to a specific test protein. Here the direct approach is to use the estimates of the individual responses as a sample of the population response. This suggests use of average values and standard deviation, and this is what we do.

There are, however, two statistical constraints to this approach. The first is that use of the arithmetic mean to describe a population assumes a single population with no anomalous individuals, or outliers. If a population consists mainly of normal individuals but contains a few individuals who differ fundamentally, then more robust statistical techniques are necessary. Perhaps the simplest of these is the trimmed mean, where a fixed percentage of the high and low data is routinely discarded, resulting in a somewhat lower estimate of the variability. While this is a standard way of dealing with this sort of problem, we cannot really recommend it, since it requires the routine elimination of data that are expensive and difficult to obtain and usually are valid.

The alternative, followed by many investigators including ourselves, is to eliminate data for subjects with apparent clinical problems and use all the rest. This has the potential problem of introducing an investigator's subjective judgement into the analysis. One needs to be very careful on this point, and the general problem of deviant responses needs attention. Those subjects with anomalous responses need to be carefully examined and retested to determine the data's reproducibility; in addition, larger samples need to be examined to determine the actual extent of the outlier problem.

A second statistical constraint in the description of a population response comes up in the estimation of upper percentiles. Thus we are interested in determining the level of a test protein, or mixture of proteins, that would meet the requirements of a fixed fraction of population. While statistical methods do exist for estimating percentiles with confidence, such as calculating tolerance intervals, they have two problems. First, they require a Gaussian or normal distribution, or at least a distribution that is not too far from normal. Second, these techniques often result in unreasonably large values when based on data from small samples. The inherent problem is that we are trying to estimate the tail of a distribution when most of the data describe the central portion.

Practically, in relation to the assessment of protein quality, we use, as the best technique available, the estimated mean requirement plus an appropriate number of standard deviations. As a general caution, however, studies that involve relatively few subjects do not provide sufficient data to establish estimates of the extreme percentiles with confidence. We thus recommend that the mean of the requirement be used to estimate the mean population requirement and that two standard deviations above be considered as an estimate of the intake level that would suffice for 97.5 per cent of the population. Furthermore, as studies accumulate, consideration should be given to replacing the individual standard deviation by a pooled standard deviation.