A fundamental rule regarding diet formulation is that one never knows the true value of anything. Although we have reasonably accurate estimates of the average requirements for most nutrients, we have less certainty regarding nutrient requirements of a specific herd or animal under specific circumstances.

We have equations that accurately estimate the average dry matter intake for groups of cows, but estimating intake accurately of a specific cow is more difficult.

We have developed several good analytical procedures to measure the concentrations of many nutrients in feeds, and tables are available that contain the average nutrient composition of all feeds commonly fed to dairy cows. However, biological and manufacturing variation, variation caused by sampling and variation in analytical measurements can be substantial so that concentrations of nutrients within a specific feedstuff may be quite different from the average.

Does all this uncertainty mean that we should give up on ration formulation and feed analysis? The answer to that question is obviously, no. However, the uncertainty associated with feed analysis and ration formulation must be understood and addressed. With proper sampling techniques, adequate number of samples and appropriate data handling, one can reduce the uncertainty associated with feed analysis data.

**Elementary statistics**

We need to start thinking about feed composition data in terms of probabilities rather than actual, absolute concentrations. In other words, how confident should you be that the analytical value received from a laboratory actually represents the true concentration of a nutrient in a feed? Because we are working with probabilities, a basic understanding of some statistical principles and terminology is needed.

**Populations and samples**

The ultimate goal of feed analysis is to obtain an analytical value from a sample that reflects the actual value of a ‘population’. Populations can be quite different depending on the application. For example, a population can be a truckload of distillers dried grains, or all the distillers dried grains produced by a specific distillery or perhaps all the distillers dried grains produced in the country. In statistical terms, a population is loosely defined as a large set from which samples can be taken.

If distillers grains from a single distillery were sampled extensively, we would have a good estimate of the average nutrient composition of distillers grains produced at that plant. However, since other distilleries were not sampled, we should be very hesitant to extrapolate the data obtained from a single distillery (i.e., a narrow population) to the larger population of all distilleries.

**Central tendency and dispersion**

A population can be represented by a set of observations or samples. Because of inherent variation among the particles making a feed and because of variation caused by sampling and analytical procedures, we know that all the sample values will not be the same. Rather than one single value, one can obtain a distribution of values.

The two most important pieces of information we need to obtain from a set of samples are a measure of central tendency and a measure of dispersion. For observations that follow a normal statistical distribution, the mean (in this discussion average and mean will be used interchangeably) is the best measure of central tendency. The mean of a normal distribution is not the absolute ‘right’ answer, but rather it is the value that has the lowest probability of being substantially wrong (i.e., it is the most likely value or the expected value).

The concentrations of most nutrients in plant-based feedstuffs fit approximately a normal distribution; therefore the mean is the best measure of central tendency for those nutrients. With a normal distribution, approximately one-half of the samples have values lower than the mean and one-half have concentrations higher than the mean. The concentrations of trace minerals and a few other chemicals such as ether extracts or fats in plant-based feeds often have a skewed distribution (a few observations will have very high concentrations). With this type of distribution, the mean still represents the “expectation,” but it overestimates the central tendency. The median (the value at which half the observations are higher and half are lower) is the best measure of central tendency for this type of distribution.

For a normal distribution the most common measure of dispersion is the standard deviation (SD). In a normal distribution, approximately 38 percent of all observations are within plus or minus 0.5 SD units of the mean, 68 percent of all observations are within plus or minus 1 SD of the mean, and approximately 95 percent of the observations are within plus or minus 2 SD of the mean.

For example, if the mean concentration of crude protein in a population of brewers dried grains is 25 percent and the SD is 2 we would expect that about 68 percent of the samples from that population would contain between 23 and 27 percent CP and 95 percent of the samples would contain between 21 and 29 percent CP. This also means that about 5 percent of the samples would contain 21 or more than 29 percent CP. The smaller the SD, relative to the mean, the less likely it is that using the mean value will cause a substantial error in diet formulation.

**Sources of variation**

Understanding potential sources of variation in feed composition data helps determine which data to use and how to use it. The nutrient composition of grains and byproducts can be influenced by plant genetics (hybrid, variety, etc.) and growing conditions (drought, climate, soil fertility, etc.). In addition, the composition of byproducts is affected by manufacturing techniques.

The above sources of variation are considered fixed, (i.e., they can be described and replicated). In statistical quality control jargon, they are labelled as assignable causes. Hybrid X may have been genetically selected to produce corn grain with higher-than-average concentrations of protein. Distillery Y might dry their distillers grains at very high temperatures, causing high concentrations of acid detergent insoluble protein. A drought may reduce kernel size, thereby increasing the concentration of fiber in corn grain.

Another possible fixed source of variation is the analytical lab. Although great progress has been made in standardizing methods, labs may use different analytical techniques to measure nutrients. If Lab A measures NDF using sulfite but another lab does not, the NDF concentrations will differ between the labs because of procedure.

Other sources of variation are considered random. We do not know why the values differ, they just do. If you sample a load of brewers grains 10 times and send those 10 samples to a lab, you will probably get back 10 slightly different concentrations of protein. The variation could be caused by variation within the load of brewer’s grains or it could be caused by random errors at the lab. The causes of the variation are unknown. They are referred to in quality control jargon as unassignable causes.

Ideally, random variation would be considered within population variation and fixed variation would be considered as variation between populations. For example, because of manufacturing differences, distillers grains from Distillery X has consistently higher NDF concentrations than distillers grains from Distillery Y. If distillers grains from X and Y were considered separate populations, the SD within each population would be lower than the SD when the results from both distilleries are combined.

Because of blending grains and multiple sources of feedstock for manufacturing facilities, many fixed sources of variation become blurred (you will not know the variety of soybeans used to make the soybean meal you purchased or whether the gluten feed you purchased was made from drought-stressed corn grain). In these situations, the fixed sources of variation (assignable causes) become random sources (unassignable causes), resulting in an increase in the within-population variation. Nonetheless, accounting for as many fixed sources of variation as possible by defining separate populations will reduce the dispersion of the data and reduce the potential of being substantially wrong when using the mean.

**Expected variation in nutritional composition of feeds**

The largest publicly-available database of feed composition in the USA can be found in the NRC dairy publication. That database contains means, SD and the number of samples for measured nutrients in most common feedstuffs used in North America. The data used to calculate those means and SD came from a wide array of sources. Samples came from across the U.S. and over several years.

For some feeds and nutrients, the number of samples used to calculate the mean and SD is quite limited and those values should be used with caution. For other feeds, the sample size is quite large and the mean and SD are probably good estimates for the broad population from which the samples were drawn. However, it is important to remember that the broad population represented in the NRC tables may not be a good estimate for a specific source of a feed.

Based on expected variation, feeds can be classified as having low, moderate or high variability. Feeds with generally low variability include corn grain, sorghum grain and perhaps barley. Feeds with the largest variability in composition are byproducts that are usually not a direct co-product of manufacturing.

Feeds with moderate variability include most feeds that would be considered co-products rather than byproducts. Because production of these products is generally well controlled, the composition of the resulting co-product can be relatively constant within a production facility. Forages have moderate variability, but variation decreases when a more exact definition of the forages is used (alfalfa silage versus mid-maturity alfalfa silage).

Net energy for lactation (NEL) and metabolizable protein (MP) are arguably the most important nutrients used in dairy diet formulation, but they present unique problems in terms of variation. Those nutrients are not measured by laboratories but are calculated from numerous variables, some of which are measured while others are estimated.

To increase the accuracy of ration formulation, feeds with moderate and high variability in composition must be sampled and analyzed routinely and the data generated must be used correctly. An accurate estimate of SD for a specific feedstuff can be extremely useful in ration formulation.

**Handling variation in feed composition**

Variation in feed composition is handled differently depending whether a given feed is best conceptualized as the outcome of a batch process versus a continuous process.

**Batch-process feedstuffs**

Feeds in this category are handled in lots such as trucks and train cars. The manufacturing may be a continuous process, but their use is generally best described as a batch process. Most feed commodities by commercial feed manufacturers fall into this category. They are characterized by small variation within lots and small to large variation between lots.

Feeds with low expected variability between lots do not have to be analyzed routinely, and in some cases, not at all. Sampling and analytical errors become relatively small when large numbers of samples are analyzed. For these feeds, a mean derived from a large number of samples may actually be better than a single observation or a mean from a small set of samples. For these feeds, book values can be used unless one has good reason to believe that a particular feed is different (for example, if you grow or buy high-oil corn, the mean values for regular corn would not be appropriate).

For feeds with moderate or high variability in nutrient composition, routine feed sampling and analysis is essential. Although most people realize this, it is often not done because by the time they get the report back from the lab, the load has been fed. If this is your opinion, you are not using the analytical data correctly. As stated above, we need to think in terms of probabilities, not absolute numbers. You should be sampling and analyzing load samples to obtain estimates of mean composition and SD; the values obtained from a single load sample are not that important. The frequency of sampling depends on the expected variation and how much error one is willing to accept.

Populations with large variations require more sampling to obtain accurate estimates. I cannot give you a specific number of samples needed because it varies depending on the nutrient of interest (e.g., the number of samples needed to obtain accurate estimates of the mean and SD for CP is usually less than that needed for NDF) and the population. As a general guideline, 10 or more samples of a given population is reasonable. For highly variable feeds more samples is desirable.

The approach followed by many nutritionists is to sample a load of feed, have it analyzed and then formulate a diet based on that information. When a new analysis is obtained, the previous data are eliminated and a new diet is formulated based on the new composition. The inherent assumption underlying this practice is that the new data better represents the feed than did the old data. This may or may not be true.

When new analytical data are obtained, the user should ask one simple question: Is there an identifiable reason why the composition changed? If you cannot think of a good reason for the composition change, the change may simply be a random event. The difference could be caused by load-to-load random variation, by within-load (i.e., sampling) variation or both. In this case, the new number may be no better than the old number but the mean of the two numbers has the lowest probability of being substantially wrong. The mean, rather than the new or old number should be used for ration formulation.

If you can come up with a logical reason why composition changed (i.e., a new population), then the new number should replace the old number, and you start the process of collating data again.

**Continuous process feedstuffs**

Silages are excellent examples of feeds of this type. Silos are filled and, more importantly, unloaded in a somewhat continuous fashion. The composition of the silage remains relatively constant until the occurrence of an assignable cause: the hybrid or the variety changed, or the field from which the silage originated changed, etc. In this case, sampling for analysis is not done as much to determine means and SD but to identify the occurrence of a shift in composition.

The optimal sampling design, i.e., the number of samples to be taken, the frequency of sampling and the departure from the mean expressed in SD units must be determined. In the U.S., it has been customary to take one sample once a month and to automatically reformulate diets with the new data if the composition has changed by more than 2 SD.

**Accounting for feed variation during diet formulation**

As previously mentioned, the SD is an important statistic. It is an indicator of how wrong you could be. If a particular load of corn gluten had 18 percent crude protein (CP) and you used the mean concentration and corn gluten made up 10 percent of the diet DM, the actual CP concentration of the diet would be about 0.6 percent units lower than the formulated value. An error of this magnitude or larger would be expected 16 out of every 100 loads. If you are willing to accept this risk, then using the mean is the best option.

However, if based on your experience, you conclude that milk production will drop 1 kilogram per cow per day (or some other number) if the diet contains 0.6 percent less crude protein than formulated and you are unwilling to accept that risk (even though this will happen only 16 percent of the time), you need to adjust for variation.

You can reduce your risk of substantially underfeeding CP by ‘adjusting’ the mean value based on its SD. Based on a normal distribution, if you use the mean minus 0.5 X SD, rather than the mean, you reduce the risk of making the error discussed above from 16 percent of the time to 7 percent of the time. If you use the mean minus 1 SD unit, you reduce the risk of making the above error to just 2 percent of the time.

In the example above, mean CP for corn gluten was 23.8 (SD = 5.7). If I was willing to risk being substantially wrong 7 out of every 100 loads of corn gluten feed, I would use 23.8 - (0.5 x 5.7) or 21 percent CP for corn gluten feed when I balanced the diet. If I only wanted to be substantially wrong 2 percent of the time, I would use 23.8 - 5.7 = 18.1 percent CP. By using a lower CP concentration for corn gluten feed, I have substantially decreased the probability of being substantially deficient in CP; however, I will be oversupplementing CP most of the time. You will need to determine how much risk you are willing to accept and balance that against increased feed costs.

The problem with this approach is that it only considers variation in a single ingredient, but the nutrient composition of all ingredients in a diet will vary. What really matters is not the variation in a single ingredient but rather the variation and mean for a diet. Software programs can calculate variation in nutrient composition of diets if the user has information on variation in the individual ingredients. In addition, the programs will calculate the implications of variation in nutrient composition on milk production. Currently, the software simulates the nutrient variation of a given diet, but it cannot optimize the diet.

**Reducing the impact of variation**

The composition of all feeds vary. However, the probability that all feeds in a diet will have a lower-than-expected concentration of a given nutrient on a given day is low. Some feeds will have higher-than-expected concentrations other will have lower-than-expected concentrations. Therefore, the variation in nutrient composition of feedstuffs is usually greater than variation in nutrient composition of the TMR (assuming good, standard feeding practices are in place).

The impact of variation in the composition of feedstuffs is reduced as more feeds are included in diets. Relying on a particular feedstuff that is highly variable in CP concentration to provide a large proportion of dietary CP increases the risk of being wrong. We know that, on a theoretical basis the contribution of a feedstuff to the variance of the total diet grows with the square of its inclusion rate. If that particular feedstuff provided only 10 percent of the CP in the diet, a 5 percentage unit change in its CP concentration would cause dietary CP concentration to change by only 0.5 percentage units. Using a wide variety of ingredients in a TMR and not relying too heavily on a single ingredient is probably the best way to reduce the costs associated with variation. **PD**

*—Excerpts from 2007 California Animal Nutrition Conference Proceedings*