Wednesday 7 March 2007

Aggregate

Well, why do we need this command... Basically to collaspe a data file on the basis of one or some variables!! So, in other words if you have an individual level file, and you have community or household information in that file, you can AGGREGATE that file into a household/community file by using the id variables, and of course some common sense/research questions in selecting how to collapse the individual level characteristics on a bigger level.. What makes sense!!!

Aggregate helps perform some mathmetical functions on variables, for all the records, at an aggregate level. for example, you cannot use syntex to ADD all the beds given in a numeric variable called BEDS, for all CASES. Nor can you pick mean, median, max, min etc. values, just across all the CASES!! So, if you understand this concept you can go ahead and perform the AGGREGATE commad on your data file. I NEED AN EXPERT TO GIVE MORE DETAILS!!!

Here is the syntex:

****the vars used in BREAK get included automatically in the outfile'***.

1) Get the file to aggregate
2) give command:

AGGREGATE
/OUTFILE='NEW FILE NAME WITH PATH.sav'
/BREAK= INDEX VARIABLES
/VAR1 OR NEWLY ASSIGNED NAME*=FIRST (VAR1)
/VAR2 =MAX (VAR2)
/VAR3 =LAST (VAR3)
/VAR4 =MIN (VAR4)
/VAR5 =SD (VAR5)
/VAR6 =SUM (VAR6)
/VAR7 =MEAN (VAR7)
/VAR8 =MEDIAN (VAR8).
* for all variables
**It is obvious that the mathmatical functions make sense only for continous variables, for catagorical variable use first, last, min, and max.

3) Get the new file.

Tuesday 6 March 2007

Joining two files

There are two concepts of joining files in databases, either you join the columns (variables) or the rows (cases). I will give the formula for the former.

First of all, to match two files with different variables, it is imperative to have one common variable in the files.

1) Sort the files to be matched on the basis of index variable.

SORT CASES BY variable_name.

2) save the sorted files.

3) The join command in SPSS syntax:

MATCH FILES

/FILE ='FILE_A_NAME_WITH_PATH.sav'
/FILE ='FILE_B_NAME_WITH_PATH.sav'
/BY index variable(s).

ANOTHER NOTE: I think we should TURN OFF Weights before this command.

**If one of the files has different unit of analysis, for example a household file, or a community file and the other is a individual level file, then the household/community file should be written like the following:

/TABLE ='FILE_A_NAME_WITH_PATH.sav'

** If one of the data files is already open in the SPSS, then only a asterik can be used instead of full path and file name:

/TABLE=*
OR
/FILE=*

4) the resultant file is a new matched file, save with a new name.


Important Note:
If the files to be matched have variables with similar names but different information, such as FileA may have q9 which is respondent's residency status, while FileB also has q9 which is about No. of courses they are taking. The file match command would choose the first files values only without giving you a choice. In order to deal with the problem, you can utilize the RENAME variable sub-command option:
You can rename the variables in the MATCH FILES command (which renames the variables before doing the matching). This allows you to select variable names that do not conflict with each other, as illustrated below.
MATCH FILES FILE="FILEA.sav" /RENAME=(inc98 = dadinc98)
/FILE="FILEB.sav" /RENAME=(inc96 inc97 inc98 = faminc96 faminc97 faminc98)
/BY id.

(Source: http://www.ats.ucla.edu/stat/spss/modules/merge.htm)


TIPS:
****GET THE FILES AND DROP USELESS VARS****.
***FIRST SORT THE FILES TO MATCH ****.
**** IF REQ. SAVE AS NEW FILES****.

Calculate age from survey data

The calculation of age from survey data may be as following:

AGE MONTHS=(Data of survey-Date of birth)/30.42)
where 30.42 is the average month length all over the year.

or

AGEYEARS=((Date of survey or interview-Date of birth)/365.25)

You should have the two dates (date of survey and date of birth) in NUMERIC format.
You also can use the option of integer in the beginning of the formula if you want the results to be in integer months or years (no fractions).