Stat Counter

Thursday, September 16, 2010

Sweave, LaTeX and R

Yesterday, I finally got the hang of Sweave, R and LaTeX.

This essentially means that I can write my scientific paper in  LaTeX, insert code chunks in the text, feed it to R (through the Sweave package) and get perfectly formatted output in APA style for any paper I choose to write. Its taken me a few months of on and off trying, but I've finally done it. That being said, I'd like to share some of the things that caught me out, so that others can benefit.


Before Installation:
If you've been using LaTeX or R on Windows, they were probably installed in the  Program Files folder. This will cause you no end of problems.
Re-install these programs on C, in a path with no spaces, as LaTeX doesn't like spaces in path names. For example, install LaTeX in C:\Miktex and install R in C:\bin\R. This will head off a lot of problems that you will encounter.

When installing LaTeX be aware that there are a number of text writing programs which you can use. Of these, I am using TeXniccenter, as it came with my distribution. Its also open source, which is a plus. Others include WinEdT, which is shareware and apparently quite good. Vim, Emacs Speaks Statistics are the only text editors that provide completion for R code, but both of these programs take a lot of effort to learn. In any case, working entirely from LaTeX as the start is very difficult.

The next step is learning to use R. If you are a psychologist, download the arm package and the psych package, as this will give you useful regression diagnostics, and psych provides all the psychometric tools one could need (see here for the authors website, which i devoured when i started laerning R). Unfortunately it doesnt provide IRT methods, but these can be accessed through the ltm and eRm packages, which are also easy to obtain (Select the install packages option in R menus, select a mirror site close to you and select the package name - done).

For exporting to LaTeX, there are a number of packages which do different things. I'm currently using xtable, but this doesn't have a defined outfit profile for factor matrices, which is pain. The manual does show you how to define new classes though, and I will share my results when I have made this work.

One extremely important thing to remember (and something that stumped me for a while) is the syntax for inserting R code.
<>=
some R code
@
The double arrows and the equals sign signal the start of the R code chunk, the options within the arrows define how the output looks (echo=FALSE means that the R code will not be shown, and results=tex tells LaTeX to format the results in its own format). The @ sign ends the code chunk. Now, the part that got me was this: the R code, arrows and @ need to be left justified, otherwise this does not work. This means that if you want to insert a table from your results, do this after running the code through Sweave.

At the moment (since I am neither an expeRt or a TeXpert) I am creating the LaTex objects in R, and then telling R to print them to LaTeX. This allows me to ensure that the objects are created properly before I send them to LaTeX. Sweave will tell you if it has a coding problem and where that occurred,but some things look OK until you actually see them.

The next step is to download the apa package for LaTeX, this will allow you to format the paper in APA style. This is the part that tends not to work if your LaTeX distribution path has spaces in it, so make sure that doesn't happen (I actually reinstalled R and LaTeX on my machine in the recommended places, and now it works like a dream).

You will probably need to learn a little LaTeX, but if you use WinEdT, TeXniccenter or Lyx, then there is  GUI with menus that can aid in this. There are some Sweave templates scattered about the web, and you should probably use one of these. Its probably worth reading either this or this (or both) guide to using LaTeX.

With R, as long as you understand some statistics, its easy enough to Google and then read the recommended files. The introduction is extremely terse and focuses on fundamentals rather than applied analysis, but its useful for its description of summary, plot, lm and the other really useful generic functions.

Friday, September 3, 2010

My blogging motivations

1. Sum up your blogging motivation, philosophy and experience in exactly 10 words:

A ranty blog about life and science which I feel gets ignored.

Pass it on to 10 others. 

If you read this and are blogging and have not yet done this, consider yourself tagged.

From: Dr. Girlfriend

LaTeX success!

Today, I successfully typeset my first paper using LaTeX and BibTeX.

I know that no-one else cares, but nonetheless, I feel much better about my life.

LaTex, for those of you who don't know, is an open source typesetting program which allows you to create all kinds of text documents through the use of simple scripting commands and output the results in a variety of files (I tend to use PDF).

LaTeX can also be used with the open source software R, to embed the results of analyses neatly in one file, which then creates your paper (I haven't made this work yet, but tomorrow is another day). 

The major advantages (as I see it) are that you can update analyses and the finished paper much more easily, and LaTeX draws all the tables for you (which is good, because R is not good at producing pretty tables).

LaTex was originally invented to produce mathematical documents, so equations et al are very easy to do.

LaTeX is awesome and the future of science and reproducible research.

R is also awesome, but I've talked about that before.

With these tools, I shall be a publication machine! (if i can collect enough useable data).

Another advatange is that the three tools are available on all formats, and so soon (perhaps next month) I shall delete Windows and go completely free software. You should too.