Tuesday 6 March 2007

Joining two files

There are two concepts of joining files in databases, either you join the columns (variables) or the rows (cases). I will give the formula for the former.

First of all, to match two files with different variables, it is imperative to have one common variable in the files.

1) Sort the files to be matched on the basis of index variable.

SORT CASES BY variable_name.

2) save the sorted files.

3) The join command in SPSS syntax:

MATCH FILES

/FILE ='FILE_A_NAME_WITH_PATH.sav'
/FILE ='FILE_B_NAME_WITH_PATH.sav'
/BY index variable(s).

ANOTHER NOTE: I think we should TURN OFF Weights before this command.

**If one of the files has different unit of analysis, for example a household file, or a community file and the other is a individual level file, then the household/community file should be written like the following:

/TABLE ='FILE_A_NAME_WITH_PATH.sav'

** If one of the data files is already open in the SPSS, then only a asterik can be used instead of full path and file name:

/TABLE=*
OR
/FILE=*

4) the resultant file is a new matched file, save with a new name.


Important Note:
If the files to be matched have variables with similar names but different information, such as FileA may have q9 which is respondent's residency status, while FileB also has q9 which is about No. of courses they are taking. The file match command would choose the first files values only without giving you a choice. In order to deal with the problem, you can utilize the RENAME variable sub-command option:
You can rename the variables in the MATCH FILES command (which renames the variables before doing the matching). This allows you to select variable names that do not conflict with each other, as illustrated below.
MATCH FILES FILE="FILEA.sav" /RENAME=(inc98 = dadinc98)
/FILE="FILEB.sav" /RENAME=(inc96 inc97 inc98 = faminc96 faminc97 faminc98)
/BY id.

(Source: http://www.ats.ucla.edu/stat/spss/modules/merge.htm)


TIPS:
****GET THE FILES AND DROP USELESS VARS****.
***FIRST SORT THE FILES TO MATCH ****.
**** IF REQ. SAVE AS NEW FILES****.