Saturday 30 July 2016

Creating new codes from one meta code

This is another life saver syntax that I learnt from Minhaj. Some datasets come with only one large code with various information. For example it is common to have PSU code which includes the information about the respondent's location, sex etc.

Here is a syntax that can be modified.

SET OVars Both ONumbers Both TVars Both TNumbers Both.
SET OVars Labels ONumbers Labels TVars Labels TNumbers Labels.


comp province = trunc (psu/100000).
comp temp = trunc (psu/10000).
*comp locality = temp - (province*10).
var lab province 'Province'.
val lab province 1'Punjab' 2'Sindh' 3'NWFP' 4'Balochistan'.

format province (f1.0).
fre province.

In this syntax the point to note is that variable "psu" has 6 characters and the code for "province" is stored in position 1. Hence, it is divided by "10000" which will truncate the last 5 characters. 

Wednesday 3 February 2016

More sources of social science data

Here is something you may consider to explore to see what social science data is available.

Github Social Science data.


Sunday 31 January 2016

"LIST" command to display the data

Yesterday I was at loss as I could not remember the "LIST" command to show or display the data "as is" rather than in aggregated form. Searched the internet and racked my brain for hours but all in vain. But thanks to Muhammad Ali, who sometimes helps in data entry, I was relieved of this torture this morning.

It is a really simple command and that is why it was difficult to find it through good search. So, the command to show the data "as it is" present in the data file, here is the command.

LIST [variable name(s)].

This command is used a lot for data cleaning and for looking at string variables where you don't expect to get phrases/sentences/words that match with each other. This is also useful to list all the responses of one respondent side by side.




Saturday 20 December 2014

Weighting in SPSS

I learnt this from Minhaj.

It is common in surveys to over or under sample and use weights later to compensate for such decisions. For example in Pakistan the largest province by population is undersampled while the smallest one, Balochistan is oversampled. In these case, for data analysis, it is critical that the cases are "weighted."

In the data files I have been using the weights are included in a variable which can be applied to use this technique. In the following example, the weight variable is labelled as "weight."

If you just multiply the cases by weight
WEIGHT BY  weight.

The proportions in your data analysis come out right but not the Ns which means you need to apply not just the weight, but need to reduce the weight variable to represent each case. (it seems baffling until you do it). In order to do that, create a new weight variable, let's call it "whi." For that you must divide the "weight" by the SUM of all the CASES and multiply by total number of cases. In this case the SUM=13557999999 and total cases=13558.

comp whi = (weight/13557999999) * 13558.
format whi (f16.11).
var lab whi "Reduced weighted N to 13558".


Now WEIGHT your file again, with the new variable. The percentages and the Ns, both will come out right.

weight  BY  whi.

To turn off weight, use the following,

WEIGHT OFF.

Wednesday 1 May 2013

Creating dummy variables (from UCLA site)

* make dummies, method 1 .
COMPUTE race1=(race=1).
COMPUTE race2=(race=2).
COMPUTE race3=(race=3).
crosstabs /tables = race by race1 
          /tables = race by race2 
          /tables = race by race3.

* make dummies, method 2 .
DO REPEAT A=race1 race2 race3 /B=1 2 3.
COMPUTE A=(race=B).
END REPEAT.
crosstabs /tables = race by race1 
          /tables = race by race2 
          /tables = race by race3.
http://www.ats.ucla.edu/stat/spss/code/dummy.htm
 

Thursday 11 April 2013

Thursday 28 March 2013

to explore more: Create a pivot table using Python

#Create a pivot table
        table = spss.BasePivotTable("Group Means",
                                    "OMS table subtype")
        table.Append(spss.Dimension.Place.row,
                     spss.GetVariableLabel(groupIndex))
        table.Append(spss.Dimension.Place.column,
                     spss.GetVariableLabel(sumIndex))

        category2 = spss.CellText.String("Mean")
        for cat in sorted(Counts):
            category1 = spss.CellText.Number(cat)
            table[(category1,category2)] = \
                   spss.CellText.Number(Totals[cat]/Counts[cat])
 
Source: SPSS online help.