<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-265057525197038547</atom:id><lastBuildDate>Sun, 15 Nov 2009 14:03:09 +0000</lastBuildDate><title>Randomness</title><description></description><link>http://www.jeremymiles.co.uk/randomness/</link><managingEditor>noreply@blogger.com (J)</managingEditor><generator>Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-1067009773272032896</guid><pubDate>Wed, 11 Nov 2009 17:43:00 +0000</pubDate><atom:updated>2009-11-11T09:43:58.369-08:00</atom:updated><title>test ignore</title><description>test&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-1067009773272032896?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2009/11/test-ignore.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-1588723201620761353</guid><pubDate>Wed, 11 Nov 2009 17:20:00 +0000</pubDate><atom:updated>2009-11-11T09:43:03.380-08:00</atom:updated><title>Ten Statisticians and their Impacts for Psychologists</title><description>&lt;a href="http://www.fiu.edu/%7Edwright/"&gt;Dan Wright&lt;/a&gt; has written an article about &lt;a href="http://www.fiu.edu/%7Edwright/pdf/tenstats.pdf"&gt;Ten statisticians and their impacts for psychologists&lt;/a&gt;, which is well worth reading.  As psychologists, we learn about, and use statistics, but we don't learn enough (in my opinion) about the historical and philosophical underpinnings - which are linked to the individuals - how many psychologists know that Fisher and Pearson did not get on, for example.&lt;br /&gt;&lt;br /&gt;As well as describing these statisticians early work, he gives some snippets of details about their lives - I didn't know, for example, that Jerzy Neyman had ever been imprisoned by the Bolsheviks after the Russian revolution (it's Neyman that we can thank for people using 0.05 as a stringent cutoff).&lt;br /&gt;&lt;br /&gt;Two snippets he didn't mention - perhaps for lack of space.  He  mentions that there are only four females in a list of 100 prominent statisticians - but one of these (F N David) is named after one of the others (Florence Nightingale).  In addition, he mentions Ronald Fisher (obviously), George Box, and Fisher's daughter, Joan Fisher Box (who wrote Fisher's biography), but does not mention that Joan Fisher Box acquired her name (you know where this is going, don't you) by marrying George Box.&lt;br /&gt;&lt;br /&gt;I've also heard it said that George Box and David Cox knew each other for a long time, and thought it would be cool to publish a paper together - but they worked in very different areas.  Eventually they did publish one on a method for transformations, now known as Box-Cox transformations, and sometimes just called Box-Coxing, as in "Have you tried Box-Coxing that dataset?"  If anyone had any confirmation that it was true, I'd be interested.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-1588723201620761353?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2009/11/dan-wright-has-written-article-about.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-416215724988656003</guid><pubDate>Fri, 06 Nov 2009 19:26:00 +0000</pubDate><atom:updated>2009-11-06T11:40:37.613-08:00</atom:updated><title>No, REALLY don't do statistics with Excel</title><description>There was a query on the SAS mailing list today - someone got inconsistent results for confidence intervals between Excel and SAS.  In Excel, they were using the confidence() function, which I'd not come across before.  And I'm glad about that.&lt;br /&gt;&lt;br /&gt;See, to calculate a confidence interval, you multiply the standard error of the distribution for the critical value from the t-distribution. &lt;br /&gt;&lt;br /&gt;You can find that value using (say) R, with the qt() function or Excel, with the tinv() function.  The t-distribution approximates the normal distribution as the sample size increases - you need a sample size of infinity for them to be exactly the same, but if the same size is large enough, then it's close.  With the normal distribution, if you want a 95% confidence interval, the critical value is 1.96, which is so close to 2 that you can pretty much us 2 and get away with it.  (Around 95% of cases lie within 2 SDs of the mean in a normal distribution).&lt;br /&gt;&lt;br /&gt;If you have a sample of 500, then the critical value for t is 1.96 - the same as for the normal distribution.  But even if your sample is as low as 50, the critical value is 2.0, which is close enough for almost anything (if you're using a computer it will work it out, and use it, to 16 decimal places, and we don't need to worry anyway).  &lt;br /&gt;&lt;br /&gt;However, this person had a sample size of 6.  With a sample size of 6, the critical value is 2.6, that's a pretty long way off.  Well, it turns out that Excel doesn't use the t distribution, it uses the normal distribution.  If you have a large sample, this is going to cause you no problems, but if you've got a small sample, you are going to be off - off by almost 30%, which is quite a long way for a confidence interval to be off.&lt;br /&gt;&lt;br /&gt;I googled (not Binged, of course) around this a bit, because I don't believe I'm the only person to have noticed, and I found &lt;a href="http://blogs.msdn.com/excel/archive/2009/09/14/function-consistency-improvements-in-excel-2010.aspx"&gt;this web page&lt;/a&gt;.  Which is called "Function Consistency Improvements in Excel 2010", and renames the confidence() function to confidence.t(), and says: "Consistent definition with industry best practices. Confidence function assuming a Student’s t distribution."&lt;br /&gt;&lt;br /&gt;Oh, that's great.  It wasn't wrong or a cock up or a mistake.  It was just inconsistent with best industry practices.  Well, I'll try that at home next time I tread dog turd into the living room / spill tomato soup on the sofa / allow the children to play with the rat poison.  That wasn't wrong, or a mistake, I will claim, as I am reprimanded. In the future I will eat tomato soup at the table, in order to ensure that my eating behavior is consistent with best dining practice .&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-416215724988656003?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2009/11/no-really-dont-do-statistics-with-excel.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-343154256926685081</guid><pubDate>Thu, 19 Feb 2009 04:52:00 +0000</pubDate><atom:updated>2009-02-18T20:53:59.428-08:00</atom:updated><title>SAS Macro for Standard Error of Skewness and Standard Error of Kurtosis</title><description>For reasons that are too dull to go into, I wanted to calculate the standard error of skew and standard error of kurtosis in SAS.  I did a bit of trawling on the interwebz, and failed to find anything.  So I wrote my own, and put it here, in case anyone needs the same thing.&lt;br /&gt;&lt;br /&gt;In the last line, change dataset and variable to match your dataset and variable.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;%macro seskewkurt(data, variable);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    proc means data = &amp;amp;data n skew kurtosis;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;          var &amp;variable;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        output out=outmeans n=n skew=skew kurtosis=kurtosis;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;      proc print data = outmeans;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    data _null_; set outmeans;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;      call symput('getn', n);   &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;      call symput('getkurtosis', kurtosis);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;      call symput('getskew', skew);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    run;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %let seskew=%sysevalf((((6*&amp;amp;getn)*(&amp;amp;getn-1))/((&amp;amp;getn-2)*(&amp;amp;getn+1)*(&amp;amp;getn+3)))**0.5);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %let sekurt=%sysevalf(2*&amp;amp;seskew*((&amp;amp;getn**2*(2-1))/((&amp;amp;getn-3)*(&amp;amp;getn+5)))**0.5) ;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %let zkurt =  %sysevalf(&amp;amp;getkurtosis/&amp;amp;sekurt);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %let zskew =  %sysevalf(&amp;amp;getskew/&amp;amp;seskew);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put N is &amp;amp;getn ;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put Skew is &amp;getskew;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put SE of skew is &amp;amp;seskew ;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put Z score of skew is &amp;amp;zskew ;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put Kurtosis is &amp;amp;getkurtosis ;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put SE of kurtosis is &amp;amp;sekurt ;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    %put Z score of Kurtosis is &amp;amp;zkurt ;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;%mend;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;%seskewkurt(dataset, variable);&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-343154256926685081?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2009/02/sas-macro-for-standard-error-of.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-3218688779565848382</guid><pubDate>Mon, 22 Dec 2008 18:29:00 +0000</pubDate><atom:updated>2008-12-22T10:38:52.469-08:00</atom:updated><title>Prediction vs Explanation in Regression</title><description>One issue in using regression analysis is to determine whether you are developing a model to &lt;span style="font-style: italic;"&gt;predict&lt;/span&gt; an outcome, or to &lt;span style="font-style: italic;"&gt;explain&lt;/span&gt; an outcome.  It's often a little bit hazy which one you are actually doing - in science, we like to say that we are explaining, but it's difficult (not impossible) to argue that we're doing much more than predicting.&lt;br /&gt;&lt;br /&gt;However, one place where prediction is all that matters is in finance.  Credit card companies like to lend people money, but they only like to lend people money who are going to give it back.  And they don't care &lt;span style="font-style: italic;"&gt;why&lt;/span&gt; people don't give them their money back, they just want to predict who will give them their money back.&lt;br /&gt;&lt;br /&gt;But this policy often leads to confusion, as in &lt;a href="http://consumerist.com/5115522/amex-lowers-your-credit-limit-if-you-shop-where-deadbeats-shop"&gt;Amex lowers your credit limit if you shop where deadbeats shop&lt;/a&gt;.  At the moment, credit card companies are feeling kind of nervous - they think that a lot of people might not be giving them their money back, and so they are running regression models to predict who those people might be.  They find that people who shop in some stores are less likely to pay them back, and so if you look like one of those people, they might lower your credit limit - this is pure prediction. &lt;br /&gt;&lt;br /&gt;In science, it's common to mistake prediction for explanation - I've found a correlation, and so you think I've found the reason something happens.  But in this credit card example, it's the other way around - all they have is a prediction.  It doesn't mean anything, but people interpret it as some sort of slight against them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-3218688779565848382?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2008/12/prediction-vs-explanation-in-regression.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-3815564283789941152</guid><pubDate>Mon, 04 Aug 2008 05:09:00 +0000</pubDate><atom:updated>2008-08-03T22:19:05.825-07:00</atom:updated><title>Running out of adjectives</title><description>One of the problems that we have when trying to measure things like trauma or depression is that we run out of adjectives very fast.&lt;br /&gt;&lt;br /&gt;For example, we might ask:&lt;br /&gt;How upset were you that your computer crashed?&lt;br /&gt; - Very upset.&lt;br /&gt;How upset were you that your dog died?&lt;br /&gt; - Very, very upset.&lt;br /&gt;How upset were you that your spouse and children were killed in the volcano?&lt;br /&gt; - Very, very, very upset.&lt;br /&gt;How upset are you that a large asteroid is going to wipe out all of life on earth next week?&lt;br /&gt;&lt;br /&gt;Banyard and Shevlin wrote a&lt;a href="www.ijpm.org/content/pdf/47/football.pdf"&gt; short paper&lt;/a&gt; a few years ago which reported high levels of psychological distress in supporters of football (soccer) teams that were demoted.  I've always suspected that this effect is simply one of people using extremes - they were very, very upset that that goal was disallowed.&lt;br /&gt;&lt;br /&gt;My favorite blog written by an anonymous ER doctor is &lt;a href="http://whitecoatrants.wordpress.com"&gt;WhiteCoatRants&lt;/a&gt;, and in&lt;a href="http://whitecoatrants.wordpress.com/2008/02/23/describing-the-pain-scale/"&gt; this post&lt;/a&gt; he describes a similar problem when trying to get a patient to describe how much pain they are in:&lt;br /&gt;&lt;blockquote&gt;&lt;p&gt;The one [description] I use is that 10 out of 10 pain is pain that is bad enough that you are “on the ground wailing and pounding your fists on the floor because the pain is so bad.” This gives me an objective way to follow up the subjective ratings of “10.”&lt;/p&gt; &lt;p&gt;“So using my description, how bad is your pain from 1-10?”&lt;br /&gt;The patient, sitting on the bed munching Doritos and watching TV, says “Oh, it’s definitely a 10.”&lt;br /&gt;I reply, “That’s funny, because you’re still sitting on the bed, you’re not pounding your fists on the floor, and you’re not wailing. In fact, you appear to be rather comfortable.”&lt;br /&gt;The usual response?&lt;/p&gt; &lt;p&gt;“Oh, then it’s a nine and a half.”&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;/p&gt;Some interesting discussion about this problem followed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-3815564283789941152?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2008/08/running-out-of-adjectives.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-265057525197038547.post-7479064389017717864</guid><pubDate>Mon, 04 Aug 2008 04:42:00 +0000</pubDate><atom:updated>2008-08-03T22:02:05.824-07:00</atom:updated><title>Google for Statistics</title><description>I like to use Google to conduct little surveys on how things are done, or what people think.  For example, I was thinking about what the relative popularity of SPSS and SAS were for teaching statistics to psychologists, in UK and US universities.  So I searched for&lt;br /&gt;[&lt;a href="http://www.google.com/search?q=SAS+psychology+anova+site%3Aedu&amp;amp;sourceid=navclient-ff&amp;amp;ie=UTF-8&amp;amp;rlz=1B3GGGL_enUS276US276"&gt;SAS psychology anova site:edu&lt;/a&gt;] and got (about) 10,000 hits.&lt;br /&gt;[&lt;a href="http://www.google.com/search?q=SPSS+psychology+anova+site%3Aedu&amp;amp;sourceid=navclient-ff&amp;amp;ie=UTF-8&amp;amp;rlz=1B3GGGL_enUS276US276"&gt;SPSS psychology anova site:edu&lt;/a&gt;]  6,000 hits. &lt;br /&gt;(I added anova to make sure it didn't pick up other meanings for SAS).  Conclusion: SPSS is a little more popular than SAS, in US university psychology departments.  (I also ran it with 'regression' instead of 'anova', with a similar result, but SPSS's lead was a lot smaller).&lt;br /&gt;&lt;br /&gt;Do it again for the UK and we find:&lt;br /&gt;[&lt;a href="http://www.google.com/search?hl=en&amp;amp;rlz=1B3GGGL_enUS276US276&amp;amp;q=SAS+psychology+anova+site%3Aac.uk&amp;amp;btnG=Search"&gt;SAS psychology anova site:ac.uk&lt;/a&gt;] 607 hits.&lt;br /&gt;[&lt;a href="http://www.google.com/search?hl=en&amp;amp;rlz=1B3GGGL_enUS276US276&amp;amp;q=SPSS+psychology+anova+site%3Aac.uk&amp;amp;btnG=Search"&gt;SPSS psychology anova site:ac.uk&lt;/a&gt;]  1,930 hits. &lt;br /&gt;&lt;br /&gt;So SPSS has the lead in both, but it's got a bigger lead in the UK than the US.  (For regression, it's also got about three times more).&lt;br /&gt;&lt;br /&gt;However, here's a comic from &lt;a href="http://xkcd.com"&gt;xkcd&lt;/a&gt; which uses the same approach in a much more interesting way:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://xkcd.com/458/"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px;" src="http://imgs.xkcd.com/comics/regrets.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/265057525197038547-7479064389017717864?l=www.jeremymiles.co.uk%2Frandomness'/&gt;&lt;/div&gt;</description><link>http://www.jeremymiles.co.uk/randomness/2008/08/google-for-statistics.html</link><author>noreply@blogger.com (J)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item></channel></rss>