tag:blogger.com,1999:blog-72276238722302315082024-03-16T11:52:59.084-07:00Life with SPSS and other similar thingsI will try to collect all my SPSS/STATA and other stats notes here!!Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.comBlogger77125tag:blogger.com,1999:blog-7227623872230231508.post-19525398033840896922016-07-30T00:15:00.001-07:002016-07-30T00:15:51.725-07:00Creating new codes from one meta codeThis is another life saver syntax that I learnt from Minhaj. Some datasets come with only one large code with various information. For example it is common to have PSU code which includes the information about the respondent's location, sex etc.<br />
<br />
Here is a syntax that can be modified.<br />
<br />
SET OVars Both ONumbers Both TVars Both TNumbers Both.<br />
SET OVars Labels ONumbers Labels TVars Labels TNumbers Labels.<br />
<br />
<br />
comp province = trunc (psu/100000).<br />
comp temp = trunc (psu/10000).<br />
*comp locality = temp - (province*10).<br />
var lab province 'Province'.<br />
val lab province 1'Punjab' 2'Sindh' 3'NWFP' 4'Balochistan'.<br />
<br />
format province (f1.0).<br />
<div>
fre province.</div>
<div>
<br /></div>
<div>
In this syntax the point to note is that variable "psu" has 6 characters and the code for "province" is stored in position 1. Hence, it is divided by "10000" which will truncate the last 5 characters. </div>
<div>
<br /></div>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com8tag:blogger.com,1999:blog-7227623872230231508.post-69157792425770358622016-02-03T04:46:00.004-08:002016-02-03T04:46:59.613-08:00More sources of social science dataHere is something you may consider to explore to see what social science data is available.<br />
<br />
<a href="https://github.com/caesar0301/awesome-public-datasets#social-sciences" target="_blank">Github Social Science data</a>.<br />
<br />
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com1tag:blogger.com,1999:blog-7227623872230231508.post-55598158042493332532016-01-31T21:39:00.001-08:002016-01-31T21:44:22.628-08:00"LIST" command to display the dataYesterday I was at loss as I could not remember the "LIST" command to show or display the data "as is" rather than in aggregated form. Searched the internet and racked my brain for hours but all in vain. But thanks to Muhammad Ali, who sometimes helps in data entry, I was relieved of this torture this morning.<br />
<br />
It is a really simple command and that is why it was difficult to find it through good search. So, the command to show the data "as it is" present in the data file, here is the command.<br />
<br />
LIST [variable name(s)].<br />
<br />
This command is used a lot for data cleaning and for looking at string variables where you don't expect to get phrases/sentences/words that match with each other. This is also useful to list all the responses of one respondent side by side.<br />
<br />
<br />
<br />
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com1tag:blogger.com,1999:blog-7227623872230231508.post-76227230189276144052014-12-20T05:06:00.003-08:002014-12-20T05:06:59.109-08:00Weighting in SPSSI learnt this from Minhaj.<br />
<br />
It is common in surveys to over or under sample and use weights later to compensate for such decisions. For example in Pakistan the largest province by population is undersampled while the smallest one, Balochistan is oversampled. In these case, for data analysis, it is critical that the cases are "weighted."<br />
<br />
In the data files I have been using the weights are included in a variable which can be applied to use this technique. In the following example, the weight variable is labelled as "weight."<br />
<br />
If you just multiply the cases by weight<br />
WEIGHT BY weight.<br />
<div>
<br /></div>
The proportions in your data analysis come out right but not the Ns which means you need to apply not just the weight, but need to reduce the weight variable to represent each case. (it seems baffling until you do it). In order to do that, create a new weight variable, let's call it "whi." For that you must divide the "weight" by the SUM of all the CASES and multiply by total number of cases. In this case the SUM=13557999999 and total cases=13558.<br /><br />
comp whi = (weight/13557999999) * 13558.<br />
format whi (f16.11).<br />
var lab whi "Reduced weighted N to 13558".<br />
<br />
<br />
Now WEIGHT your file again, with the new variable. The percentages and the Ns, both will come out right.<br />
<br />
weight BY whi.<br />
<br />
To turn off weight, use the following,<br />
<br />
WEIGHT OFF.<br />
<div>
<br /></div>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com3tag:blogger.com,1999:blog-7227623872230231508.post-23021997852049717802013-05-01T02:21:00.001-07:002013-05-01T02:21:38.095-07:00Creating dummy variables (from UCLA site)<pre><b>* make dummies, method 1 .
COMPUTE race1=(race=1).
COMPUTE race2=(race=2).
COMPUTE race3=(race=3).
crosstabs /tables = race by race1
/tables = race by race2
/tables = race by race3.
</b>
<b>* make dummies, method 2 .
DO REPEAT A=race1 race2 race3 /B=1 2 3.
COMPUTE A=(race=B).
END REPEAT.
crosstabs /tables = race by race1
/tables = race by race2
/tables = race by race3.</b></pre>
<pre><b>http://www.ats.ucla.edu/stat/spss/code/dummy.htm</b></pre>
<pre><b> </b></pre>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com3tag:blogger.com,1999:blog-7227623872230231508.post-15399442994579091772013-04-11T10:55:00.002-07:002013-04-11T10:55:39.556-07:00PSPP another data analysis softwareyou can download it for free <a href="http://sourceforge.net/projects/pspp4windows/files/" target="_blank">here</a>. <div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com1tag:blogger.com,1999:blog-7227623872230231508.post-49931297018277285962013-03-28T20:16:00.001-07:002013-03-28T20:18:04.327-07:00to explore more: Create a pivot table using Python<pre class="syntaxblock">#Create a pivot table
table = spss.BasePivotTable("Group Means",
"OMS table subtype")
table.Append(spss.Dimension.Place.row,
spss.GetVariableLabel(groupIndex))
table.Append(spss.Dimension.Place.column,
spss.GetVariableLabel(sumIndex))
category2 = spss.CellText.String("Mean")
for cat in sorted(Counts):
category1 = spss.CellText.Number(cat)
table[(category1,category2)] = \
spss.CellText.Number(Totals[cat]/Counts[cat])</pre>
<pre class="syntaxblock"> </pre>
<pre class="syntaxblock">Source: <a href="http://pic.dhe.ibm.com/infocenter/spssstat/v21r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.python.help%2Fpython_package_baseprocedure.htm" target="_blank">SPSS online help</a>. </pre>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com2tag:blogger.com,1999:blog-7227623872230231508.post-49197013137099790572013-03-22T12:11:00.002-07:002013-03-22T12:11:40.282-07:00Creating new variables with R<code># Three examples for doing the same computations<br />
<br />
mydata$sum <- br="" mydata="" x1="" x2="">
mydata$mean <- br="" mydata="" x1="" x2="">
<br />
attach(mydata)<br />
mydata$sum <- br="" x1="" x2="">
mydata$mean <- br="" x1="" x2="">
detach(mydata)<br />
<br />
mydata <- br="" mydata="" transform="">
sum = x1 + x2,<br />
mean = (x1 + x2)/2 <br />
) </-></-></-></-></-></code><br />
<br />
<code><a href="http://www.statmethods.net/management/variables.html" target="_blank">Source </a>for the above code. </code><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com1tag:blogger.com,1999:blog-7227623872230231508.post-34578966050554903482013-03-21T20:57:00.001-07:002013-03-21T20:57:06.824-07:00R help with read<a href="http://127.0.0.1:20062/library/base/html/Paren.html" target="_blank">help</a><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com2tag:blogger.com,1999:blog-7227623872230231508.post-80047047439344601252013-03-19T14:59:00.000-07:002013-03-19T14:59:52.478-07:00Copying value labels from an existing variable...<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]--><br />
<!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="156">
</w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
</style>
<![endif]-->
<br />
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
OR you could use the values from another variable which
has the same labels.</div>
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
APPLY DICTIONARY</div>
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
<span style="mso-spacerun: yes;"> </span>/FROM *</div>
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
<span style="mso-spacerun: yes;"> </span>/SOURCE VARIABLES
= <b style="mso-bidi-font-weight: normal;">b9a</b></div>
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
<span style="mso-spacerun: yes;"> </span>/TARGET VARIABLES
=<span style="mso-spacerun: yes;"> </span><b style="mso-bidi-font-weight: normal;">occupf</b></div>
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
<span style="mso-spacerun: yes;"> </span>/FILEINFO</div>
<div class="MsoNormal" style="line-height: 150%; mso-layout-grid-align: none; text-autospace: none;">
<span style="mso-spacerun: yes;"> </span>/VARINFO ALIGNMENT
FORMATS LEVEL MISSING VALLABELS = REPLACE VARLABEL WIDTH .</div>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-85517675979315052502012-12-13T13:16:00.001-08:002012-12-13T13:16:25.443-08:00Missing Values<pre class="syntaxblock">You may have used the command MISSING VALUES in a file to declare a certain value(s) </pre>
<pre class="syntaxblock">as missing, for example:</pre>
<pre class="syntaxblock">
</pre>
<pre class="syntaxblock">MISSING VALUES V1 (8,9)</pre>
<pre class="syntaxblock"> </pre>
<pre class="syntaxblock">Here the values 8,9 will be considered missing in the data and will not be included </pre>
<pre class="syntaxblock">in computations. </pre>
<pre class="syntaxblock"> </pre>
<pre class="syntaxblock"> But, sometimes you want those values back. One way would be to close the file and </pre>
<pre class="syntaxblock">reload but that is cumbersome. The short way to do that is:</pre>
<pre class="syntaxblock"> </pre>
<pre class="syntaxblock">MISSING VALUES V1 ().</pre>
<div class="bullet">
<br /></div>
<div class="bullet">
The above command will remove any previously declared values from missing category to the data category.</div>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-6563946760879155252012-11-08T12:49:00.003-08:002012-11-08T12:53:10.196-08:00Non-parametric testsUseful reviews of Non-parametric tests<br />
<br />
<a href="http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/nonparametric_test.html">http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/nonparametric_test.html</a><br />
<br />
<a href="http://www.graphpad.com/support/faqid/1790/">http://www.graphpad.com/support/faqid/1790/</a><br />
<br />
<br />
<a href="http://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests" target="_blank">http://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests </a><br />
<br />
<a href="http://changingminds.org/explanations/research/analysis/parametric_non-parametric.htm">http://changingminds.org/explanations/research/analysis/parametric_non-parametric.htm</a><br />
<br />
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-23908550882919780312012-08-16T14:19:00.002-07:002012-08-16T14:19:59.668-07:00Data Cleaning (draft entry)My next post will be about Data Cleaning. I am not the expert on this but I know few things. One simple way to do this is to compare data entered by 2 different people. The command in SPSS is called<br />
<br />
UPDATE FILE<br />
<br />
Here is an example from <a href="http://www.ats.ucla.edu/stat/spss/faq/update.htm" target="_blank">UCLA site</a>:<br />
<br />
<pre><b>update file = "D:\person1.sav"
/in = flag1
/file = "D:\person2.sav"
/by all.
exe.</b></pre>
<pre><b> </b></pre>
<pre><b>More valuable information in this pdf. </b></pre>
<pre><a href="http://www.ats.ucla.edu/stat/sas/library/nesug99/ss123.pdf"><b>http://www.ats.ucla.edu/stat/sas/library/nesug99/ss123.pdf</b></a></pre>
<pre><b> </b></pre>
<pre><b>
</b></pre>
<pre><b>Need to update this blog!! </b></pre>
<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-10833998784367667712012-08-15T14:53:00.003-07:002012-08-15T14:53:51.769-07:00Effect Size<br />
"Statistical significance only tells the researcher
how likely it is that an observed finding could have occurred by chance. It
does not say anything about magnitude of the effect observed. <i>Effect
size</i> is a name given to a group of statistics that measure the magnitude of
a treatment effect. In many cases, effect size is a better measure of research
outcomes than the significance level. This is because with large samples, one
can observe statistically significant group differences even when only a tiny
effect is present. Unlike significance tests, effect size indices are
independent of sample size." source: <a href="http://www.umdnj.edu/idsweb/shared/effect_size.htm">http://www.umdnj.edu/idsweb/shared/effect_size.htm</a><br />
<br />
<a href="http://www.cognitiveflexibility.org/effectsize/" target="_blank">Effect size calculator</a><br />
<br />
<a href="http://www.campbellcollaboration.org/resources/effect_size_input.php" target="_blank">another calculator</a><br />
<br />
<a href="http://www.polyu.edu.hk/mm/effectsizefaqs/calculator/calculator.html" target="_blank">another calculator</a><br />
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-78360430039779958422012-08-14T12:44:00.003-07:002021-06-16T17:12:35.799-07:00Data/Software/information sources (free) <div class="post-header">
</div>
<div class="post-body entry-content" id="post-body-758926861485580633" itemprop="articleBody">
This is a loose compilation of sources of meta data/journals/software
etc. related to population and health, concerning international issues
in general but in particular the USA and Pakistan. I think this can be
potentially very useful for graduate students of these two countries.<br />
<br />
<a href="http://www.asianbarometer.org/newenglish/surveys/DataRelease.htm" target="_blank">Asian Barometer</a> <span style="font-size: small;">"The Asian Barometer (ABS) is an applied research program on
public opinion on political values, democracy, and
governance around the region. The regional network
encompasses research teams from 13 East Asian political
systems (Japan, Mongolia, South Koreas, Taiwan, Hong Kong,
China, the Philippines, Thailand, Vietnam, Cambodia,
Singapore, Indonesia, and Malaysia), and 5 South Asian countries
(India, Pakistan, Bangladesh, Sri Lanka, and Nepal)."</span><br />
<br />
<br />
Databases/software (free) for social sciences and public health:<br />
<a href="http://en.citizendium.org/wiki/Free_statistical_software#_note-idams">http://en.citizendium.org/wiki/Free_statistical_software#_note-idams</a><br />
<a href="http://www.bls.census.gov/ferretftp.htm"><br />Current Population Survey</a> (CPS) Datasets for download (free; SAS format only)<br />
<br />
<a href="http://www.hhs-stat.net/scripts/datafinder.cfm"><br />Department of Health and Human Services</a> (HHS) Data Finder<br />
<a href="http://www.hhs-stat.net/scripts/datafinder.cfm"><br /></a></div>
<div class="post-body entry-content" id="post-body-758926861485580633" itemprop="articleBody">
<a href="http://data.worldbank.org/"><br /></a><span id="hpcContent">The <a href="http://www.norc.org/GSS+Website/">General Social Survey</a>
(GSS) contains a standard 'core' of demographic, behavioral, and
attitudinal questions, plus topics of special interest. Many of the core
questions have remained unchanged since 1972 to facilitate time-trend
studies as well as replication of earlier findings. The GSS takes the
pulse of America, and is a unique and valuable resource. It has tracked
the opinions of Americans over the last four decades.</span><br />
<br />
Download data (SPSS format) from <a href="http://www.norc.org/GSS+Website/Download/SPSS+Format/">here</a>.<br />
<a href="http://www.worldvaluessurvey.org/"><br /></a>
Univ of Michigan Database of data files<br />
<a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/themes/index.jsp">http://www.icpsr.umich.edu/icpsrweb/ICPSR/themes/index.jsp</a><br />
<br />
<br />
Princeton university <a href="http://dss.priceton.edu/cgi-bin/dataresources/newdataresources.cgi?term=62"><span class="blsp-spelling-error" id="SPELLING_ERROR_0">Dataset</span> sources for Pakistan</a><br />
</div><div class="post-body entry-content" id="post-body-758926861485580633" itemprop="articleBody"></div><div class="post-body entry-content" id="post-body-758926861485580633" itemprop="articleBody"><br /><span>The <a href=" https://dash.nichd.nih.gov/" target="_blank">NICHD Data </a>and Specimen Hub (DASH) is a centralized resource
that allows researchers to share and access de-identified data from
studies funded by NICHD. DASH also serves as a portal for requesting
biospecimens from selected DASH studies.</span></div> <br /><div class="post-body entry-content" id="post-body-758926861485580633" itemprop="articleBody"> </div><div class="post-body entry-content" id="post-body-758926861485580633" itemprop="articleBody">The <a href="http://dolphn.aimglobalhealth.org/Default.asp?page=SearchFrame.asp"><b>Data Online for Population, Health and Nutrition</b></a> (<span class="blsp-spelling-error" id="SPELLING_ERROR_1">DOLPHN</span>)
system is an online statistical data resource containing selected
current and historical country-level demographic and health indicator
data. The <span class="blsp-spelling-error" id="SPELLING_ERROR_2">DOLPHN</span>
system is designed to provide users with quick and easy access to
frequently used statistics and can be helpful as both a reference and
analytical tool.<br />
<br />
Stanford University Data sets (free) <a href="http://data.stanford.edu/">http://data.stanford.edu/</a><br />
<br />
Interesting link for PhD students<br />
<a href="http://www2.hud.ac.uk/research/gradcentre/links.php">http://www2.hud.ac.uk/research/gradcentre/links.php</a><br />
<a href="http://www.wssinfo.org/home/introduction.html"><br />UNICEF/WHO </a>sanitation and water<br />
<br />
<br />
<a href="http://www.arl.org/sparc/">Open Source Publishing</a><br />
<br />
<br />
<span class="blsp-spelling-error" id="SPELLING_ERROR_3">Jstor</span> Data<br />
<a href="http://dfr.jstor.org/">http://dfr.jstor.org/</a><br />
<a href="http://www3.interscience.wiley.com/crossref.html"><br />Google and Wiley <span class="blsp-spelling-error" id="SPELLING_ERROR_4">Interscience</span></a><br />
<br />
<br />
<a href="http://www.google.com/publicdata/home">Google public data visualization</a>The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate.<br />
<a href="http://dvn.iq.harvard.edu/dvn/dv/dataconnect"><br />Harvard data </a>related to public health<br />
The
purpose of this website is to provide public health professionals,
researchers, policy makers and students with a comprehensive catalog of
Maternal and Child Health (<span class="blsp-spelling-error" id="SPELLING_ERROR_5">MCH</span>) data sets, interactive tools and other resources.<br />
<br />
<a href="http://wonder.cdc.gov/">CDC Wonder</a><br />
Wide-ranging Online Data for Epidemiologic Research<br />
<br />
<br />
CDC newborn feeding practices <span class="blsp-spelling-error" id="SPELLING_ERROR_7">datasets</span><br />
<a href="http://www.cdc.gov/ifps/data/index.htm">http://www.cdc.gov/ifps/data/index.htm</a><br />
<br />
CDC <span class="blsp-spelling-error" id="SPELLING_ERROR_8">datasets</span> on breastfeeding practices:<br />
<a href="http://www.cdc.gov/breastfeeding/data/index.htm">http://www.cdc.gov/breastfeeding/data/index.htm</a><br />
<br />
The <span class="blsp-spelling-error" id="SPELLING_ERROR_9">Cochare</span> Library (great for public health publications)<br />
<a href="http://www.thecochranelibrary.com/view/0/index.html?gclid=CPWu8trNy6ACFdNA6wodqn_u0A">http://www.thecochranelibrary.com/view/0/index.html?gclid=CPWu8trNy6ACFdNA6wodqn_u0A</a><br />
<br />
<span class="blsp-spelling-error" id="SPELLING_ERROR_10">JHUCCP</span> research tool database<br />
<a href="http://new.jhuccp.org/research/researchDB/">http://new.jhuccp.org/research/researchDB/</a><br />
<br />
<br />
<a href="http://pewresearch.org/databank/datasets/">Pew Research Center Databases</a><br />
You
can download the data collected by Pew Research Center from here for
their various national and international surveys (the religion project
includes Pakistan).<br />
<br />
<a href="http://www.prb.org/datafinder.aspx"><br /><span class="blsp-spelling-error" id="SPELLING_ERROR_11">PRB</span> Data Finder</a><br />
<a href="https://www.researchgate.net/application.Index.html"><br />Research Gate</a><br />
Professional network for scientists.<br />
<br />
<br />
<a href="http://www.rand.org/about/tools.html" target="_blank">RAND <span class="blsp-spelling-error" id="SPELLING_ERROR_6">data</span></a><br />
<br />
<br />
A UH student analysis on different meta-data sources:<br />
<a href="http://www2.hawaii.edu/%7Ejacso/extra/">http://www2.hawaii.edu/~jacso/extra/</a><br />
<br />
<a href="http://www.un.org/esa/population/unpop.htm">UN population data</a><br />
<br />
<a href="http://www.census.gov/ipc/www/idb/informationGateway.php"><br />US Census, international population statistics</a><br />
<br />
<a href="http://data.worldbank.org/data-catalog">World Bank Datasets</a><br />
<br />
<a href="http://data.worldbank.org/"><br />World Bank Data</a><br />
<a href="http://data.worldbank.org/"><br /></a>
<a href="http://www.worldvaluessurvey.org/">World Values Survey</a><br />
<br />
<a href="http://www.ipums.org/">http://www.ipums.org/ </a><i>Integrated Public Use Microdata Series</i> from Minnesota University<br />
<br />
<a href="http://www.nber.org/data/">National Bureau of Economic Research</a> data from diff. sources related to American Demographics and Economics</div>
<div class="post-footer-line post-footer-line-1">
<span class="post-author vcard">
Posted by
<span class="fn">
<a href="http://www.blogger.com/profile/02947902937912894228" itemprop="author" rel="author" title="author profile">
azeema
</a>
</span>
</span>
<span class="post-timestamp">
at
<a class="timestamp-link" href="http://spss-statistics.blogspot.com/2010/05/datasoftwareiformation-sources-free.html" itemprop="url" rel="bookmark" title="permanent link"><abbr class="published" itemprop="datePublished" title="2010-05-14T18:43:00-07:00">18:43</abbr></a>
</span>
<span class="reaction-buttons">
</span>
<span class="star-ratings">
</span>
<span class="post-comment-link">
</span>
<span class="post-backlinks post-comment-link">
</span>
<span class="post-icons">
</span>
<br />
<div class="post-share-buttons goog-inline-block">
</div>
</div>
<div class="post-footer-line post-footer-line-2">
<span class="post-labels">
Labels:
<a href="http://spss-statistics.blogspot.com/search/label/analysis" rel="tag">analysis</a>,
<a href="http://spss-statistics.blogspot.com/search/label/data" rel="tag">data</a>,
<a href="http://spss-statistics.blogspot.com/search/label/journals" rel="tag">journals</a>,
<a href="http://spss-statistics.blogspot.com/search/label/software" rel="tag">software</a>,
<a href="http://spss-statistics.blogspot.com/search/label/sources" rel="tag">sources</a> </span><br />
<br />
<span class="post-labels">The <a href="http://www.socialexplorer.com/pub/ReportData/Home.aspx" target="_blank">Social Explorer </a>(free edition) has the RCMS (religious congregation) dataset. </span><br />
<br />
<span class="post-labels"><a href="https://international.ipums.org/international/index.shtml" target="_blank">IPUMS</a> Data </span>
</div>
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com1tag:blogger.com,1999:blog-7227623872230231508.post-61615074781808360442012-07-25T20:31:00.000-07:002012-07-26T12:27:10.821-07:00To ADD (or SUM) in SPSS<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">Well, in SPSS you can add a series of variables in two different ways. First is you add two variables i.e., boys and girls and get the total children. Or, you want to create an Index based on a series of scores but want to ignore the respondent who missed out on any of the variables in the series (i.e., there is a MISSING value in 1 or more variables for them). <br /> </span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">compute t_child=sons+daughters.</span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">OR</span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">compute t_child=sum (sons, daughters).<br /><br />"The difference between the two procedures above is that in the first procedure, the case on total would be missing if any one of the four variables had missing values on a case; in the second procedure, the total would be computed while ignoring missing values on the four variables." <b>No cases will be dropped</b> due to a missing value in any of the variables. "Essentially SPSS treats the missing value as ZERO." </span></div>
<div style="font-family: Verdana,sans-serif;">
<br /></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">In the SUM argument the variables must be separated by comma but if there are multiple variables you can use the option of TO to provide a range. For example, if you want to construct a happiness index based on 12 indicators/variables hap1 thru hap12, you can use the following syntax: </span><span style="font-size: small;"><br /></span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;"><br /></span> </div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">compute happiness=sum (hap1 thru hap12).</span> </div>
<div style="font-family: Verdana,sans-serif;">
<br /></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">Source: <a href="http://kb.iu.edu/data/afsh.html" target="_blank">Indiana University IT Services</a> and others.<br /><br /> </span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">Another point to note is that "the SUM() function is
evidently flexible enough to respect more complex statements like
SUM(Var1+Var2, Var3-Var4, Var5*Var6). Hence, <b>do not use </b>the addition symbol when you use SUM unless that is part of the list of arguments. Source: <a href="http://spssx-discussion.1045642.n5.nabble.com/Computing-Variables-Missing-Data-td1091945.html" target="_blank">SPSSX Discussion group</a></span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;"></span></div>
<br />
<span style="font-size: small;">While talking about the flexibility and greatness of SUM, there is another neat function that you can take note of. So, in case you want to limit the CASE DROPPING based on any MISSING values, you can provide a number to TELL the computer to keep a CASE/RESPONDENT if at least X # of variables are answered. So, </span><br />
<br />
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;"><span class="example">COMPUTE V3 = SUM.2(V1, V2).
EXECUTE .</span></span>
</div>
<div style="font-family: Verdana,sans-serif;">
<br /></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;">"The <code>.2</code> appended to the end of the <code>SUM</code>
function in the above example can be any integer. Use it to indicate
the minimum number of valid cases necessary to perform a given
calculation." </span><span style="font-size: small;">Source: <a href="http://kb.iu.edu/data/afsh.html" target="_blank">Indiana University IT Services</a> </span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;"><br /></span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;"></span></div>
<div style="font-family: Verdana,sans-serif;">
<span style="font-size: small;"></span></div>
<span style="font-size: small;">Also remember Listwise and pairwise deletion a concept SPSS uses while using addition function. According to a <a href="http://www.statisticsmentor.com/vbforum/showthread.php?t=14" target="_blank">discussion group </a>they are defined as:</span><br />
<span style="font-size: small;"><br />
<b>Listwise -</b> then if the respondent has any missing value for any variable
then the respondent is omitted from all your data analysis. <br />
<br />
<b>Pairwise </b>- not as harsh as listwise in that the respondent is dropped
only on analyses involving variables that have missing values.</span><br />
<br />
<span style="font-size: small;">Also check the <a href="http://www-01.ibm.com/support/docview.wss?uid=swg21475199" target="_blank">IBM site </a>and <a href="http://www.psychwiki.com/wiki/Dealing_with_Missing_Data" target="_blank">Psychwiki</a> for more on list and pairwise deletion. </span><br />
<br />
<div style="font-family: Verdana,sans-serif;">
</div><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com5tag:blogger.com,1999:blog-7227623872230231508.post-46822123377539805612012-07-24T19:44:00.003-07:002012-07-24T19:44:32.134-07:00Factor Analysis in short (not my writing)What is Factor Analysis?*<br />
"Factor analysis is a form of exploratory multivariate analysis that is used to either reduce the number of variables in a model or to detect relationships among variables. All variables involved in the factor analysis need to be interval and are assumed to be normally distributed."<br />
<br />
SPSS syntax:<br />
<br />
factor<br />
/variables read write math science socst <br />
/criteria factors(2) <br />
/extraction pc<br />
/rotation varimax<br />
/plot eigen.<br />
<br />
Here is the syntax in SPSS from ANU course notes:<br />
<br />
FACTOR<br />
/VARIABLES q34_1 to q34_12<br />
/MISSING LISTWISE /ANALYSIS q34_1 to q34_12<br />
/PRINT INITIAL KMO REPR EXTRACTION ROTATION<br />
/CRITERIA MINEIGEN(1) ITERATE(25)<br />
/FORMAT SORT<br />
/EXTRACTION PAF<br />
/CRITERIA ITERATE(25)<br />
/ROTATION VARIMAX<br />
/METHOD=CORRELATION .<br />
<br />
Crate SCALE using FA<br />
<br />
FACTOR<br />
/VARIABLES q34_1 to q34_12<br />
/MISSING LISTWISE /ANALYSIS q34_1 to q34_12<br />
/PRINT INITIAL KMO REPR EXTRACTION ROTATION<br />
/CRITERIA MINEIGEN(1) ITERATE(25)<br />
/FORMAT SORT<br />
/EXTRACTION PAF<br />
/PLOT EIGEN<br />
/CRITERIA ITERATE(25)<br />
/ROTATION VARIMAX<br />
/SAVE REG (2)<br />
/METHOD=CORRELATION .<br />
<br />
<br />
<br />
*Introduction to SAS. UCLA: Academic Technology Services, Statistical Consulting Group. from <a href="http://www.ats.ucla.edu/stat/sas/notes2/">http://www.ats.ucla.edu/stat/sas/notes2/</a> (accessed November 24, 2007).<br />
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com2tag:blogger.com,1999:blog-7227623872230231508.post-51715042748061678762012-07-24T19:41:00.000-07:002012-07-24T19:41:26.263-07:00<div>
<b><span style="font-size: medium;">Measuring unmet need for family planning.</span></b><br />
<br />
<b>Preamble</b><br />
“Millions of women would prefer to avoid becoming pregnant either right away or ever, but are not using contraception. These women have an unmet need for family planning. Programs can serve many of these women by developing strategies that respond directly to their concern.” Ref: Population Reports, Sept 1996.<br />
<br />
Unmet need is defined on the basis of women’s responses to survey questions and following are some of the definitions that have been used since 1970’s.<br />
<br />
<b>The KAP-Gap</b><br />
Definition one: Women who wanted to have no more children but were not using contraception. (Ignored spacers, exposure to risk of pregnancy)<br />
<br />
The world fertility survey (WFS 1972-1984)<br />
Definition two: Same as above but excluded pregnant and amenorrheic women, because they did not currently need contraception. (Ignored spacers)<br />
<br />
In 1981, John Anderson and Leo Morris measured the percentage of women of reproductive age who are “exposed to the risk of unintended pregnancy and are not using contraceptive”. (Included spacers). Next year Nortman and Gary developed a model by including pregnant, breast feeding, or amenorrheic in the definition of unmet need.<br />
<br />
After ICPD 1994, Sinding and Fathalla have suggested to measure unmet need more broadly including unmet need among people who are using contraception but may be dissatisfied with their method. By using both qualitative and quantitative data, they suggest experience with sideeffects, discontinuation and other problems of contraception could help extend the focus of unmet need from use of any method to the quality of care.<br />
<br />
Arguments over who is at risk, should we include inappropriate method use and method failure. DHS started asking questions on intentions about current pregnancy, therefore, including pregnant women. Recently included category is unmarried women. In short, include all women who are “at risk” of an unintended or mistimed pregnancy.<br />
<br />
Considering the importance of measurement of unmet need, now all DHS and FP/RH Survey questionnaire ask about extended definition of unmet need. </div>
<br />
<div>
<br />
Casterline (1997) pointed out that there can be inaccuracies in the reporting of contraceptive use and in the reporting of fertility preferences, and both pieces of information are required for estimating unmet need. Furthermore, his work shows that unmet need is subject to different definitions, and its measurement is not straightforward. Therefore, any survey undertaken for the measurement of unmet need must consider issues of definition in advance.</div>
<br />
<div>
</div>
<br />
<div>
<b>Following chart shows the standard formulation of unmet need.</b></div>
<br />
<div>
<b></b> </div>
<br />
<div>
</div>
<br />
<div>
</div>
<br />
<div>
</div><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-13816826587551837582012-07-24T19:36:00.002-07:002012-07-24T19:36:55.376-07:00Naming multiple variables at the same time, with syntaxOf course, it has to be with SYNTAX... I like to do everything with Syntax because of so many reasons but mostly to keep a log of what I am doing and secondly to reduce the key strokes I have to make for repeated jobs!!!<br />
<br />
<br />
So, in case you have some variables for which you have to assign VALUE AND VARIABLE LABELS wouldn't it be handy to if you are able to do them with one command.I know it is a small thing and most people who use SPSS would laugh at me for even writing a blog entry on this, but believe me it is easy to forget little things especially if you go out of touch for a year or two. So, here is the command:<br />
<br />
VAR LAB<br />
<br />
q29a '(RHC)Number of hours facility open for consultantion'<br />
/q29b 'Number of hours facility open for consultantion BHU'<br />
/q29c 'Number of hours facility open for consultantion MCH center'<br />
/q29d 'Number of hours facility open for consultantion Dispensary'<br />
/q29e 'Number of hours facility open for consultantion govt hospital'<br />
/q29f 'Number of hours facility open for consultantion Pvt hospital'<br />
/q29g 'Number of hours facility open for consultantion Dispensary/Compoder'<br />
/q29h 'Number of hours facility open for consultantion Nurse/LHV'<br />
/q29i 'Number of hours facility open for consultantion Hakeem/ Homeopath'<br />
/q29j 'Number of hours facility open for consultantion FWC'<br />
/q29k 'Number of hours facility open for consultantion 'RHS-A'<br />
/q29l '(Others)Number of hours facility open for consultantion'.<br />
<br />
<br />
Please note in the above syntax, After VAR LAB the first variable name is written as is, but the rest precede with a backslash "/".<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-78526383664502471132012-07-24T19:31:00.000-07:002012-07-26T19:14:21.363-07:00Data transfer (migration) from Access to SPSS<b>Problem:</b><br />
<br />
Your data has been entered in MS Access, where all the variables (fields) have been defined names, width, type etc. and there are look up arrays/tables linked with each of the fields to describe the Response Values. But then you need to run some stats in SPSS. So, you basically EXPORT the file to some data analysis software like SPSS. One way to do it is export to MS Excel format and open/Import the Excel sheet into SPSS, which is pretty straightforward and simple. But then you examine the file and you will notice that in this transition, all the nifty labels of fields and values are gone and you will have to either make guesses or look at your data collection instruments to make sense of the numbers.<br />
<br />
So, the choice you have is either to keep doing that or manually assign all the labels in SPSS. It is fine if you have only a handful of variables, but if you have a long list it is a lot of work!!!<br />
<br />
What do you do? I have been trying to get around this problem for months now with no success. There is a Script on my favorite SPSS site:<br />
<a href="http://www.spsstools.net/Scripts/ImportExport/ExportLabelsFromAccessToSPSS.txt">http://www.spsstools.net/Scripts/ImportExport/ExportLabelsFromAccessToSPSS.txt</a><br />
<br />
which should do the needful but I am certainly not doing it right. Need help. All the google search and various discussion groups have proven to be of no use also. Apparently I can create a link through ODBC but I am too lame to figure that out...<br />
<br />
Any help?<br />
<br />
Update 1: No luck with VB or Python or ODBC etc. because I am too dumb to learn them on my own! However, I learnt that if you have to do that a lot, there is a handy program that can do it for you. It is called Stat/Transfer. One caveat is that it is not FREE. The student version costs $59. In future I would buy it if I am stuck with multiple transfers between various databases. In the past they also used to have DBMS copy for such things but it does not exist anymore. I have tried to search for a Open Source version for Stat/Transfer but no luck yet!!<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-36948841869482470422012-01-26T00:29:00.000-08:002012-01-26T01:08:47.144-08:00Entering quantitative data into the computerWell there are many many ways you can enter the quantitative data (generated from surveys) into a computer. Most of the people I know use either use a simple Excel table or very very complex MS Access data entry file. However both of these programs are not created for data entry purposes and hence are not so handy for that purpose.<br /><br />There are several software out there in the market which you can buy to do the job, or you can outsource the data for entry or create a program for the very purpose yourself. All of these options require extra resources. SPSS, the most popular data analysis program also has a data entry module which is great but that is not free.<br /><br />The best option would be to have a data entry program which is made just for that purpose, is easy to learn and use and most importantly is free. Thankfully there are 2 such software:<br /><br />1) <a href="http://www.census.gov/population/international/software/cspro/index.html">CSPro</a><br />"The Census and Survey Processing System (CSPro) is a public domain software package used by hundreds of organizations and tens of thousands of individuals for entering, editing, tabulating, and disseminating census and survey data."<br /><br />2) <a href="http://www.epidata.dk/index.htm">EpiData<br /></a>"<span class="st"><em>EpiData</em> Entry is used for simple or programmed data entry and data documentation. Entry handles simple forms or related systems Optimised documentation."</span><br /><br />In my view using using Excel and Access have their strength but for any survey with more than a page long questionnaire and more than 40-50 respondents, it is better to use one or the other of the above.<br /><br />Range checks, value lengths, logical jumps, automatically calculated fields, data validation on double entry, find duplicates, assign values for missing or not applicable values etc. are some the things that needs to be done to maintain the integrity of data. Both of the above software can do that.<br /><br />The files that are created in these programs can easily be exported to major analytical packages like SPSS and STATA.<div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com1tag:blogger.com,1999:blog-7227623872230231508.post-11172678857168475202012-01-12T20:28:00.000-08:002012-01-26T01:12:31.014-08:00Converting a table to Data FileI have been looking for this for so long.....<br /><br />Thank you Mr. John Walkenbach<br /><br /><br />Here is a nifty Macro in Excel to do that!!!<br /><pre>Sub ReversePivotTable()<br />' Before running this, make sure you have a summary table with column headers.<br />' The output table will have three columns. <br />Dim SummaryTable As Range, OutputRange As Range <br />Dim OutRow As Long <br />Dim r As Long, c As Long <br />On Error Resume Next <br />Set SummaryTable = ActiveCell.CurrentRegion <br />If SummaryTable.Count = 1 Or SummaryTable.Rows.Count < 3 Then <br />MsgBox "Select a cell within the summary table.", vbCritical <br />Exit Sub <br />End If <br />SummaryTable.Select <br />Set OutputRange = Application.InputBox(prompt:="Select a cell for the 3-column output", Type:=8) ' <br />Convert the range OutRow = 2 Application.ScreenUpdating = False <br />OutputRange.Range("A1:C3") = Array("Column1", "Column2", "Column3") <br />For r = 2 To SummaryTable.Rows.Count <br />For c = 2 To SummaryTable.Columns.Count <br />OutputRange.Cells(OutRow, 1) = SummaryTable.Cells(r, 1) <br />OutputRange.Cells(OutRow, 2) = SummaryTable.Cells(1, c) <br />OutputRange.Cells(OutRow, 3) = SummaryTable.Cells(r, c) <br />OutputRange.Cells(OutRow, 3).NumberFormat = SummaryTable.Cells(r, c).NumberFormat <br />OutRow = OutRow + 1 <br />Next c <br /> Next r<br />End Sub</pre><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-28507021657743100692012-01-12T13:43:00.000-08:002020-03-10T18:15:53.505-07:00VlookUp in ExcelLearnt a lot about VlookUp today. It is really one of the most powerful features of MS Excel.<br />
<br />
Few things to remember.<br />
The formula will return a "0" (without quotes) if the index variable was matched but there as not corresponding value to return. If the index variable does not match, Excel will return #NA.<br />
<br />
-remember the column number you want to retrieve. This column number is counted from the column where you "looked" for the index variable.<br />
<br />
-You can lookup in another file.<br />
-Linked files get updated even if they are not open.<br />
-To keep track of what changes has been made it is a good idea to use the "TRACK CHANGES" feature.<br />
-Excel cares about trailing spaces but does not care for difference in CASE and the type of the cell.<br />
-Cells cannot be protected, hence you have to be very careful when you use vlookup.<br />
<br />
Example of VLookup with data in 2 different sheets.<br />
<br />
=VLOOKUP(A:A,Sheet3!A:B,2,FALSE)<br />
<br />
Another one,<br />
=VLOOKUP(O2,B2:K41,5)<br />
<br />
In this example O2 (the first condition) is the address of the value that you want to MATCH, for example registration number, ssn etc. (the index) with the source data file. You can also provide a value here, but giving a cell address makes it easier to copy a range. The second term, B2:K41 is the range of source data in which the first column includes your INDEX variable. The third value, 5 is the column number after the index. For example if your INDEX is in column B and your VALUE (ie name) is in column E, you should write 4 here). Make sure to remember that you need to select the complete range of columns for the lookup. For example, if you have school names in column A, freq of particular symptom in column B and you want to get the total number of health room visits from columns E (while D has the corresponding school name), you need to select both column D and E for the second condition (not just one column), then put 1 for the index number.<br />
<br /><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com5tag:blogger.com,1999:blog-7227623872230231508.post-36437892937283979962012-01-12T13:41:00.000-08:002012-01-12T20:32:23.130-08:00Comparing/Updating multiple files against a Master fileUPDATE FILE command can be used for that. More text to add later.<br /><br />Meanwhile check this page.<br /><a href="http://www.ats.ucla.edu/stat/spss/faq/update.htm">http://www.ats.ucla.edu/stat/spss/faq/update.htm</a><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0tag:blogger.com,1999:blog-7227623872230231508.post-39564139958012314522012-01-10T13:13:00.000-08:002012-01-10T13:14:41.782-08:00Excel odditiesCheck this page for some oddities and quirks related to Excel.<br /><br /><a href="http://spreadsheetpage.com/index.php/oddities">http://spreadsheetpage.com/index.php/oddities</a><div class="blogger-post-footer">Author: Azeema</div>Azeema Faizunnisahttp://www.blogger.com/profile/02947902937912894228noreply@blogger.com0