Stat Counter

Friday, June 25, 2010

Error messages and their value

I use R for most of my statistical work these days. Why you say? Well, its open source, free and has the most comprehensive set of add-on packages I have ever seen.

Its also sometimes incomprehensible and annoying. Take for instance, an error message I got a while back: Error in cov.wt(z) : 'x' must contain finite values only

I was attempting to do a factor analysis, and the above popped up. Naturally I was a little confused, as I hadn't allowed for participants in my surveys to respond "infinite". However, upon Googling, I discovered that factor analysis (along with most statistical methods) is highly sensitive to missing values, and R appears to treat these as infinite - i would prefer to call them undetermined, but I didn't write R.

The same thing happened to me today, when working out some correlations and I kept getting NA as the result. Given that i knew these correlations should exist in some form, I was confused. However, the problem was again missing values and when i used "pairwise complete obs" i got some sensible results.

The point (insofar as I have one) is that I had been using SPSS for many years, and never really copped to the issue that missing values were. The convenience of the GUI was preventing me from learning about the methods I was using.

And that, ladies and gentlemen, is one of the many reasons why I will continue to use R. (Don't worry, I'll go into excruciating details about the benefits thereof at another time).

Monday, June 21, 2010

On publishing and journals

So, I'm currently writing my first paper for publication. Woo hoo, and what not.

Therefore, I've started to pay attention to things like impact factors. Impact factors, for those of you who don't know, are numbers that reflect how often an average paper from a journal has been cited over the last 5 year period. Think of it as a journals reputation, if you will.

Many people claim that the bigger the impact factor, the better the journal. This occurred most recently in the Chronicle of Higher Education where a number of people moaned about the amount of research that goes uncited. Of course, this doesn't control for the number of people in a field and the "sexiness" of a topic, so obviously its not the whole story.

Now, the other major factor (for me at least) is the time taken to review. Psychology apparently has a long, long time to review journal articles. I've seen many papers that show a two year lag between submission and print publication. Most journals operate a pre-printing service these days, which means that one might only wait a year before others see your work.

So, when choosing a journal, I find myself making a trade-off. Should I go for a lower impact journal that reviews quickly, or a higher impact journal that will take longer, but make my research more visible?

Another point to remember is that you can't submit to multiple journals at once, so the reviewing time is an opportunity cost for the researcher. This is of particular relevance for students like myself, who need to get papers published quickly in order to be able to show them on a CV and thus get a job (and the opportunity to do more research.

I'm still not decided on which route to go, and time is running out. The deadline is somewhat external and somewhat self-imposed. My funders want a report in ten days, and i'd like to be able to claim that a paper is under review by that time. If anyone is reading and has advice, it would be greatly appreciated.

Thursday, June 17, 2010

On the absurdity of marking schemes

So, I'm a PhD student somewhere in the South of Ireland.
Recently, I taught my very first class, which was nice.

Even more recently, I had to mark all the scripts, which was not so nice.

You see, in my University, psychology (which i have been assured is, in fact, a science) is examined like a liberal arts degree i.e. with essay questions.

All well and good, you say. However, the marking scheme - which is handed down from on high - is crazy. And not in a good funny sortof entertaining way but in the hair-pulling chair destroying data falsifying kindof way.

Here's a breakdown of how the marks work:
A (or First) 70-100%
B(OR 2.1) 60-69%
C (or 2.2) 50-59%
D (or 3.1) 45-49%
E (or pass) 40-44%
F (for fail) 0-39%

Now, I'm sure that many of you can spot the issues here, but I'll illustrate anyway. The A covers 30% of the scale, and is subdivided into three (as are the others). However, the A grades are separated by 10% each (75,85, 95) while the E grades are separated by only 1% each.

So essentially what the marking scheme dictates is that there is ten times more difference between the A grades than the E grades. Its absurd, and yet occurs everywhere on this emerald isle (and also in the UK, but don't quote me on that).

The worst part, for me at least, is that the A is rare, very rare in fact, and most of the marks are squashed into the 55-69% range, which gives students a very misleading idea of their relative standing in the class and between classes.

There's a sizeable majority of the scale (A and F) that is used perhaps 3-5% of the time, and everyone else just gets pushed into the dank and unwholesome middle. Now personally, I'd prefer it if the scale was divided into 15 points per grade and if A's were a possibility rather than a carrot used to urge undergraduates to engage in insane amounts of study for very little reward.

Unfortunately, its not up to me, but rather up to the NUI, and I hardly think they'll change it because of this blog post. In the unlikely event that they do change it, i would of course accept recompense from any grateful students or teachers.