Thursday 26 January 2012

Entering quantitative data into the computer

Well there are many many ways you can enter the quantitative data (generated from surveys) into a computer. Most of the people I know use either use a simple Excel table or very very complex MS Access data entry file. However both of these programs are not created for data entry purposes and hence are not so handy for that purpose.

There are several software out there in the market which you can buy to do the job, or you can outsource the data for entry or create a program for the very purpose yourself. All of these options require extra resources. SPSS, the most popular data analysis program also has a data entry module which is great but that is not free.

The best option would be to have a data entry program which is made just for that purpose, is easy to learn and use and most importantly is free. Thankfully there are 2 such software:

1) CSPro
"The Census and Survey Processing System (CSPro) is a public domain software package used by hundreds of organizations and tens of thousands of individuals for entering, editing, tabulating, and disseminating census and survey data."

2) EpiData
"EpiData Entry is used for simple or programmed data entry and data documentation. Entry handles simple forms or related systems Optimised documentation."

In my view using using Excel and Access have their strength but for any survey with more than a page long questionnaire and more than 40-50 respondents, it is better to use one or the other of the above.

Range checks, value lengths, logical jumps, automatically calculated fields, data validation on double entry, find duplicates, assign values for missing or not applicable values etc. are some the things that needs to be done to maintain the integrity of data. Both of the above software can do that.

The files that are created in these programs can easily be exported to major analytical packages like SPSS and STATA.

Thursday 12 January 2012

Converting a table to Data File

I have been looking for this for so long.....

Thank you Mr. John Walkenbach


Here is a nifty Macro in Excel to do that!!!
Sub ReversePivotTable()
' Before running this, make sure you have a summary table with column headers.
' The output table will have three columns.
Dim SummaryTable As Range, OutputRange As Range
Dim OutRow As Long
Dim r As Long, c As Long
On Error Resume Next
Set SummaryTable = ActiveCell.CurrentRegion
If SummaryTable.Count = 1 Or SummaryTable.Rows.Count < 3 Then
MsgBox "Select a cell within the summary table.", vbCritical
Exit Sub
End If
SummaryTable.Select
Set OutputRange = Application.InputBox(prompt:="Select a cell for the 3-column output", Type:=8) '
Convert the range OutRow = 2 Application.ScreenUpdating = False
OutputRange.Range("A1:C3") = Array("Column1", "Column2", "Column3")
For r = 2 To SummaryTable.Rows.Count
For c = 2 To SummaryTable.Columns.Count
OutputRange.Cells(OutRow, 1) = SummaryTable.Cells(r, 1)
OutputRange.Cells(OutRow, 2) = SummaryTable.Cells(1, c)
OutputRange.Cells(OutRow, 3) = SummaryTable.Cells(r, c)
OutputRange.Cells(OutRow, 3).NumberFormat = SummaryTable.Cells(r, c).NumberFormat
OutRow = OutRow + 1
Next c
Next r
End Sub

VlookUp in Excel

Learnt a lot about VlookUp today. It is really one of the most powerful features of MS Excel.

Few things to remember.
The formula will return a "0" (without quotes) if the index variable was matched but there as not corresponding value to return. If the index variable does not match, Excel will return #NA.

-remember the column number you want to retrieve. This column number is counted from the column where you "looked" for the index variable.

-You can lookup in another file.
-Linked files get updated even if they are not open.
-To keep track of what changes has been made it is a good idea to use the "TRACK CHANGES" feature.
-Excel cares about trailing spaces but does not care for difference in CASE and the type of the cell.
-Cells cannot be protected, hence you have to be very careful when you use vlookup.

Example of VLookup with data in 2 different sheets.

=VLOOKUP(A:A,Sheet3!A:B,2,FALSE)

Another one,
=VLOOKUP(O2,B2:K41,5)

In this example O2 (the first condition) is the address of the value that you want to MATCH, for example registration number, ssn etc. (the index) with  the source data file. You can also provide a value here, but giving a cell address makes it easier to copy a range. The second term, B2:K41 is the range of source data in which the first column includes your INDEX variable. The third value, 5 is the column number after the index. For example if your INDEX is in column B and your VALUE (ie name) is in column E, you should write 4 here). Make sure to remember that you need to select the complete range of columns for the lookup. For example, if you have school names in column A,  freq of particular symptom in column B and you want to get the total number of health room visits from columns E (while D has the corresponding school name), you need to select both column D and E for the second condition (not just one column), then put 1 for the index number.

Comparing/Updating multiple files against a Master file

UPDATE FILE command can be used for that. More text to add later.

Meanwhile check this page.
http://www.ats.ucla.edu/stat/spss/faq/update.htm

Tuesday 10 January 2012

Excel oddities

Check this page for some oddities and quirks related to Excel.

http://spreadsheetpage.com/index.php/oddities

Thursday 5 January 2012

Working with a sub-group of data

Sometimes you want to include only a sub-set or sample of the data in your analysis. There are different ways of doing that.

1. Using a sample of the data
If you just want to randomly choose a percentage of the data to work on it, here is the formula to select 30% of the responses:
TEMPORARY. SAMPLE .30.
You can also select a number of cases, ie:
SAMPLE 50 from 100.
will choose 50 cases randomly from the first 100 cases.

Note: Adding the "TEMPORARY" before makes sure that your original data does not change.

2. Selected group of cases

You can use select if (by itself or with temporary) on the basis of a criteria to run your statistics. For example, if you want to see the sleep time for mothers with under 2 year old children, you may have to select Female in the sex field (if there is one) and the age of child. This command can also be used with SYSMIS (system missing values) also.

3. Filter

You can filter the data according to a criteria assigned in a variable ie work only with boys or girls sample or children who are in school or not in school. The only thing to remember is that the variable has to be a dummy variable (with 0 and 1 values). The filter "turns off" the zeros, ie if the variable gender has assigned 0 to boys and 1 to girls, when you use filter it will generate statistics only for girls. To do so,

FILTER BY variable name.


When you are done, don't forget to use:
FILTER OFF.

4. Split file

First of all sort the cases by the variable you want to split the file with.

SORT CASES BY variable name.
Now use the split command.

SPLIT FILE BY variable name.

Don't forget to do turn the split function off.

SPLIT FILE OFF.


(thanks to UCLA SPSS Learning Modules for their online guide)