Wednesday, 25 July 2012

To ADD (or SUM) in SPSS

Well, in SPSS you can add a series of variables in two different ways. First is you add two variables i.e., boys and girls and get the total children. Or, you want to create an Index based on a series of scores but want to ignore the respondent who missed out on any of the variables in the series (i.e., there is a MISSING value in 1 or more variables for them).
compute t_child=sons+daughters.
compute t_child=sum (sons, daughters).

"The difference between the two procedures above is that in the first procedure, the case on total would be missing if any one of the four variables had missing values on a case; in the second procedure, the total would be computed while ignoring missing values on the four variables." No cases will be dropped due to a missing value in any of the variables. "Essentially SPSS treats the missing value as ZERO." 

In the SUM argument the variables must be separated by comma but if there are multiple variables you can use the option of TO to provide a range. For example, if you want to construct a happiness index based on 12 indicators/variables hap1 thru hap12, you can use the following syntax:

compute happiness=sum (hap1 thru hap12).

Source: Indiana University IT Services and others.

Another point to note is that  "the SUM() function is evidently flexible enough to respect more complex statements like SUM(Var1+Var2, Var3-Var4, Var5*Var6).  Hence, do not use the addition symbol when you use SUM unless that is part of the list of arguments. Source: SPSSX Discussion group

While talking about the flexibility and greatness of SUM, there is another neat function that you can take note of. So, in case you want to limit the CASE DROPPING based on any MISSING values, you can provide a number to TELL the computer to keep a CASE/RESPONDENT if at least X # of variables are answered. So, 


"The .2 appended to the end of the SUM function in the above example can be any integer. Use it to indicate the minimum number of valid cases necessary to perform a given calculation." Source: Indiana University IT Services

Also remember Listwise and pairwise deletion a concept SPSS uses while using addition function. According to a discussion group they are defined as:

Listwise - then if the respondent has any missing value for any variable then the respondent is omitted from all your data analysis.

Pairwise - not as harsh as listwise in that the respondent is dropped only on analyses involving variables that have missing values.

Also check the IBM site and Psychwiki for more on list and pairwise deletion.

Tuesday, 24 July 2012

Factor Analysis in short (not my writing)

What is Factor Analysis?*
"Factor analysis is a form of exploratory multivariate analysis that is used to either reduce the number of variables in a model or to detect relationships among variables. All variables involved in the factor analysis need to be interval and are assumed to be normally distributed."

SPSS syntax:

/variables read write math science socst
/criteria factors(2)
/extraction pc
/rotation varimax
/plot eigen.

Here is the syntax in SPSS from ANU course notes:

/VARIABLES q34_1 to q34_12

Crate SCALE using FA

/VARIABLES q34_1 to q34_12

*Introduction to SAS. UCLA: Academic Technology Services, Statistical Consulting Group. from (accessed November 24, 2007).

Measuring unmet need for family planning.

“Millions of women would prefer to avoid becoming pregnant either right away or ever, but are not using contraception. These women have an unmet need for family planning. Programs can serve many of these women by developing strategies that respond directly to their concern.” Ref: Population Reports, Sept 1996.

Unmet need is defined on the basis of women’s responses to survey questions and following are some of the definitions that have been used since 1970’s.

The KAP-Gap
Definition one: Women who wanted to have no more children but were not using contraception. (Ignored spacers, exposure to risk of pregnancy)

The world fertility survey (WFS 1972-1984)
Definition two: Same as above but excluded pregnant and amenorrheic women, because they did not currently need contraception. (Ignored spacers)

In 1981, John Anderson and Leo Morris measured the percentage of women of reproductive age who are “exposed to the risk of unintended pregnancy and are not using contraceptive”. (Included spacers). Next year Nortman and Gary developed a model by including pregnant, breast feeding, or amenorrheic in the definition of unmet need.

After ICPD 1994, Sinding and Fathalla have suggested to measure unmet need more broadly including unmet need among people who are using contraception but may be dissatisfied with their method. By using both qualitative and quantitative data, they suggest experience with sideeffects, discontinuation and other problems of contraception could help extend the focus of unmet need from use of any method to the quality of care.

Arguments over who is at risk, should we include inappropriate method use and method failure. DHS started asking questions on intentions about current pregnancy, therefore, including pregnant women. Recently included category is unmarried women. In short, include all women who are “at risk” of an unintended or mistimed pregnancy.

Considering the importance of measurement of unmet need, now all DHS and FP/RH Survey questionnaire ask about extended definition of unmet need.

Casterline (1997) pointed out that there can be inaccuracies in the reporting of contraceptive use and in the reporting of fertility preferences, and both pieces of information are required for estimating unmet need. Furthermore, his work shows that unmet need is subject to different definitions, and its measurement is not straightforward. Therefore, any survey undertaken for the measurement of unmet need must consider issues of definition in advance.

Following chart shows the standard formulation of unmet need.

Naming multiple variables at the same time, with syntax

Of course, it has to be with SYNTAX... I like to do everything with Syntax because of so many reasons but mostly to keep a log of what I am doing and secondly to reduce the key strokes I have to make for repeated jobs!!!

So, in case you have some variables for which you have to assign VALUE AND VARIABLE LABELS wouldn't it be handy to if you are able to do them with one command.I know it is a small thing and most people who use SPSS would laugh at me for even writing a blog entry on this, but believe me it is easy to forget little things especially if you go out of touch for a year or two. So, here is the command:


q29a '(RHC)Number of hours facility open for consultantion'
/q29b 'Number of hours facility open for consultantion BHU'
/q29c 'Number of hours facility open for consultantion MCH center'
/q29d 'Number of hours facility open for consultantion Dispensary'
/q29e 'Number of hours facility open for consultantion govt hospital'
/q29f 'Number of hours facility open for consultantion Pvt hospital'
/q29g 'Number of hours facility open for consultantion Dispensary/Compoder'
/q29h 'Number of hours facility open for consultantion Nurse/LHV'
/q29i 'Number of hours facility open for consultantion Hakeem/ Homeopath'
/q29j 'Number of hours facility open for consultantion FWC'
/q29k 'Number of hours facility open for consultantion 'RHS-A'
/q29l '(Others)Number of hours facility open for consultantion'.

Please note in the above syntax, After VAR LAB the first variable name is written as is, but the rest precede with a backslash "/".

Data transfer (migration) from Access to SPSS


Your data has been entered in MS Access, where all the variables (fields) have been defined names, width, type etc. and there are look up arrays/tables linked with each of the fields to describe the Response Values. But then you need to run some stats in SPSS. So, you basically EXPORT the file to some data analysis software like SPSS. One way to do it is export to MS Excel format and open/Import the Excel sheet into SPSS, which is pretty straightforward and simple. But then you examine the file and you will notice that in this transition, all the nifty labels of fields and values are gone and you will have to either make guesses or look at your data collection instruments to make sense of the numbers.

So, the choice you have is either to keep doing that or manually assign all the labels in SPSS. It is fine if you have only a handful of variables, but if you have a long list it is a lot of work!!!

What do you do? I have been trying to get around this problem for months now with no success. There is a Script on my favorite SPSS site:

which should do the needful but I am certainly not doing it right. Need help. All the google search and various discussion groups have proven to be of no use also. Apparently I can create a link through ODBC but I am too lame to figure that out...

Any help?

Update 1: No luck with VB or Python or ODBC etc. because I am too dumb to learn them on my own! However, I learnt that if you have to do that a lot, there is a handy program that can do it for you. It is called Stat/Transfer. One caveat is that it is not FREE. The student version costs $59. In future I would buy it if I am stuck with multiple transfers between various databases. In the past they also used to have DBMS copy for such things but it does not exist anymore. I have tried to search for a Open Source version for Stat/Transfer but no luck yet!!