Thursday 3 November 2011

Uses of (System) missing

Sometimes you want to assign your newly computed variable a system missing value (which is a . in spss database). Here is the command for that:

COMPUTE temp = $sysmis.

(this syntax will create a variable called temp which will initially have all values set as missing)

For a conditional function:

IF sysmis(v1) v2=$sysmis.

You can also use missing values in RECODE.

RECODE (sysmis=99)

or

RECODE (99=sysmis)

RECODE [your command] (ELSE=sysmis).
Also read the UCLA SPSS page  and CDC page on handling of missing data.

Tuesday 1 November 2011

PURPOSIVE SAMPLING

PURPOSIVE SAMPLING - Subjects are selected because of some characteristic. Patton (1990) has proposed the following cases of purposive sampling. Purposive sampling is popular in qualitative research.

  • Extreme or Deviant Case - Learning from highly unusual manifestations of the phenomenon of interest, such as outstanding success/notable failures, top of the class/dropouts, exotic events, crises.
  • Intensity - Information-rich cases that manifest the phenomenon intensely, but not extremely, such as good students/poor students, above average/below average.
  • Maximum Variation - Purposefully picking a wide range of variation on dimensions of interest...documents unique or diverse variations that have emerged in adapting to different conditions. Identifies important common patterns that cut across variations.
  • Homogeneous - Focuses, reduces variation, simplifies analysis, facilitates group interviewing.
  • Typical Case - Illustrates or highlights what is typical, normal, average.
  • Stratified Purposeful - Illustrates characteristics of particular subgroups of interest; facilitates comparisons.
  • Critical Case - Permits logical generalization and maximum application of information to other cases because if it's true of this once case it's likely to be true of all other cases.
  • Snowball or Chain - Identifies cases of interest from people who know people who know people who know what cases are information-rich, that is, good examples for study, good interview subjects.
  • Criterion - Picking all cases that meet some criterion, such as all children abused in a treatment facility. Quality assurance.
  • Theory-Based or Operational Construct - Finding manifestations of a theoretical construct of interest so as to elaborate and examine the construct.
  • Confirming or Disconfirming - Elaborating and deepening initial analysis, seeking exceptions, testing variation.
  • Opportunistic - Following new leads during fieldwork, taking advantage of the unexpected, flexibility.
  • Random Purposeful - (still small sample size) Adds credibility to sample when potential purposeful sample is larger than one can handle. Reduces judgment within a purposeful category. (Not for generalizations or representativeness.)
  • Politically Important Cases - Attracts attention to the study (or avoids attracting undesired attention by purposefully eliminating from the sample politically sensitive cases).
  • Convenience - Saves time, money, and effort. Poorest rational; lowest credibility. Yields information-poor cases.
  • Combination or Mixed Purposeful - Triangulation, flexibility, meets multiple interests and needs. (Patton, 1990)
Patton, M. Q. (1990). Qualitative evaluation and research methods (2nd ed.). Newbury Park, CA: Sage Publications.

Thursday 27 October 2011

What can you do with COMPUTE in SPSS

Note what you can do with Mean.# and Sum.# to tackle missing values. And also take note of LAG which takes up the value of the preceding case.

COMPUTE y=ABS(x). absolute value of x. ABS(!7) =7.
COMPUTE y=SQRT(x). square root
COMPUTE y=LN(x). natural logarithm
COMPUTE y=LG10(x). base 10 logarithm
COMPUTE y=EXP(x). exponential: ex
COMPUTE y=TRUNC(x). integer part. TRUNC(5.7)=5.
COMPUTE y=RND(x). round to nearest integer. RND(5.7)=6
COMPUTE y=MOD(x,11). remainder after division by 11
COMPUTE y=SUM(x1,x2,x3). sum of 3 variables if at least one is non-missing
COMPUTE y=SUM.5(x1 TO x10). sum of 10 variables if at least 5 are non-missing.
COMPUTE y=MEAN.2(x1,x2,x3). mean of 3 variables if at least 2 are non-missing
COMPUTE y=LAG(x). x from previous case
COMPUTE y=$SYSMIS. sets Y to sysmis.

Source: SPSS for Windows 8, 9 and 10 by Svend Juul

Dropping missing values in SPSS

(This needs some more thought and cleaning up)

The SELECT command with the SYSMIS() function can drop all missing cases from the current SPSS data set. Consider the following:

SELECT IF NOT (SYSMIS (amount)). SAVE OUTFILE= 'newfile.sav'.


This example drops all cases whose value of the variable amount is missing, and then saves this data to an SPSS system file called newfile.sav.


If the dataset has more than one coding for missing values, as is often the case for survey data, select all of the different codings for missing values with the AND operator:

SELECT IF NOT (SYSMIS(amount1)) AND NOT (SYSMIS(amount2)). SAVE OUTFILE= 'newfile.sav'.



http://kb.iu.edu/data/afay.html

look at this post (multi variate anal)

http://core.ecu.edu/psyc/wuenschk/spss/SPSS-MV.htm

Thursday 7 July 2011

little issue with Excel sorting

I was going nuts trying to sort a little table in Excel which was created using some formulas... Excel was just sorting based on the formula in the cell rather than the cell value.... after googling a lot, I found out that in Excel 2007 the "Calculation" default option is set to "Automatic".... To avoid my problem, I just needed to turn if OFF and switch it to Manual. The option is in "Formulas" tab in Excel 2007, the right most choice "Calculation Options."

Found the answer on a Excel discussion board.

Tuesday 14 June 2011

Remove ALL spaces from cells in MS Excel

This blog pertains to data cleaning after you download data from surveyshare.com.

One more thing you MAY have to do when you download your data from surveyshare.com is to remove all unnecessary spaces from certain fields, especially if they happen to be your index variable. I my case I had to do it from the email field  which was used to match the responses to other surveys!!! Here is the macro

 I have found:

Sub TrimEText()
' This module will trim extra spaces from BOTH SIDES and excessive spaces from inside the text.
Dim MyCell As Range
On Error Resume Next
For Each MyCell In Selection.Cells
MyCell.Value = Application.WorksheetFunction.Substitute(Trim(MyCell.Value), " ", " ")
MyCell.Value = Application.WorksheetFunction.Substitute(Trim(MyCell.Value), " ", " ")
MyCell.Value = Application.WorksheetFunction.Substitute(Trim(MyCell.Value), " ", " ")
MyCell.Value = Application.WorksheetFunction.Substitute(Trim(MyCell.Value), " ", "")
Next
On Error GoTo 0
End Sub


Really grateful to the author of the macro!!

Thursday 9 June 2011

Excel Macro to convert the CASE of a range of TEXT

Really useful macro. I used it to clean up surveyshare data file before bringing it to SPSS.

Before using:

"Uncomment" (remove the apostrophe from) the line of code that changes the text to the case you want. For example I needed everything to converted to lower case and hence I removed the apostrophe from "' Rng.Value = StrConv(Rng.Text, vbLowerCase)"


Sub ChangeCase()
Dim Rng As Range
On Error Resume Next
Err.Clear
Application.EnableEvents = False
For Each Rng In Selection.SpecialCells(xlCellTypeConstants, _
xlTextValues).Cells
If Err.Number = 0 Then
' Rng.Value = StrConv(Rng.Text, vbUpperCase)
' Rng.Value = StrConv(Rng.Text, vbLowerCase)
' Rng.Value = StrConv(Rng.Text, vbProperCase)
End If
Next Rng
Application.EnableEvents = True
End Sub

Source: http://www.cpearson.com/excel/ChangingCase.aspx

Wednesday 27 April 2011

What is the difference between causation and correlation?

What is the difference between causation and correlation?

One of the most common errors we find in the press is the confusion between correlation and causation in scientific and health-related studies. In theory, these are easy to distinguish — an action or occurrence can cause another (such as smoking causes lung cancer), or it can correlate with another (such as smoking is correlated with alcoholism). If one action causes another, then they are most certainly correlated. But just because two things occur together does not mean that one caused the other, even if it seems to make sense.

Tuesday 15 February 2011

Truncating a string variable & other things

This text has been copied from UCLA website!!!

Create a String Variable up that will be the name converted into upper case, lo that will be the name converted to lower case, and sub that will be the third through eighth character in the persons name. Note that we first had to use the string command to tell SPSS that up lo and sub are string variables that will have a length of up to 14 characters. Had we omitted the string command, these would have been treated as numeric variables, and when SPSS tried to assign a character value to the numeric variables, it would have generated an error. We also create len that is the length of the name variable, and len2

that is the length of the persons name.

STRING up lo (A14)
/sub (A6).

COMPUTE up = UPCASE(name).
COMPUTE lo = LOWER(name).
COMPUTE sub = SUBSTR(name,3,8).
COMPUTE len = LENGTH(name).

COMPUTE len2 = LENGTH(RTRIM(name)).

For more info visit: http://www.ats.ucla.edu/stat/spss/modules/functions.htm

Tuesday 8 February 2011

Random Forest

group of many decision trees. learn more:

http://en.wikipedia.org/wiki/Random_forest

Assigning Student Grades Using Excel

Here is the formula from MS office website:

=IF(A2>89,"A",IF(A2>79,"B", IF(A2>69,"C",IF(A2>59,"D","F"))))

If more than 6 conditions to check, better to use LOOKUP then IF/THEN

=LOOKUP(A2,{0,60,63,67,70,73,77,80,83,87,90,93,97},{"F","D-","D","D+","C-","C","C+","B-","B","B+","A-","A","A+"})


source: http://office.microsoft.com/en-us/excel-help/if-HP005209118.aspx

Thursday 13 January 2011

a very simple table using CTables


group Universe vs sample

1 Universe

2 Sample

Total Respondents

Column N %

Count

Column N %

Count

Column N %

Count

1 Female

53.0%

904

61.0%

153

54.0%

1057

2 Male

46.3%

789

39.0%

98

45.3%

887

Not specified

.7%

12

.0%

0

.6%

12


To get the above table use the following syntex:


CTABLES /TABLE gender2 by group [colpct count]
/CATEGORIES VARIABLES=group TOTAL=YES LABEL='Total Respondents'.

&


Main groups

Our big univ

our sample

Total

Column N %

Count

Column N %

Count

Column N %

Count


1 Female

53.0%

904

61.0%

153

54.0%

1057

2 Male

46.3%

789

39.0%

98

45.3%

887

3 Not specified

.7%

12

.0%

0

.6%

12

Total

100.0%

1705

100.0%

251

100.0%

1956


For the above, here is the syntax (notice the columns also have totals now):
CTABLES /TABLE gender2 by group [colpct count]
/CATEGORIES VARIABLES=group TOTAL=YES LABEL='Total Respondents'
/CATEGORIES VARIABLES= gender2 TOTAL=YES POSITION=AFTER.