I learnt this from Minhaj.

It is common in surveys to over or under sample and use weights later to compensate for such decisions. For example in Pakistan the largest province by population is undersampled while the smallest one, Balochistan is oversampled. In these case, for data analysis, it is critical that the cases are "weighted."

In the data files I have been using the weights are included in a variable which can be applied to use this technique. In the following example, the weight variable is labelled as "weight."

If you just multiply the cases by weight

WEIGHT BY weight.

The proportions in your data analysis come out right but not the Ns which means you need to apply not just the weight, but need to reduce the weight variable to represent each case. (it seems baffling until you do it). In order to do that, create a new weight variable, let's call it "whi." For that you must divide the "weight" by the SUM of all the CASES and multiply by total number of cases. In this case the SUM=13557999999 and total cases=13558.

comp whi = (weight/13557999999) * 13558.

format whi (f16.11).

var lab whi "Reduced weighted N to 13558".

Now WEIGHT your file again, with the new variable. The percentages and the Ns, both will come out right.

weight BY whi.

To turn off weight, use the following,

WEIGHT OFF.