Stat Counter

Thursday, September 16, 2010

Sweave, LaTeX and R

Yesterday, I finally got the hang of Sweave, R and LaTeX.

This essentially means that I can write my scientific paper in  LaTeX, insert code chunks in the text, feed it to R (through the Sweave package) and get perfectly formatted output in APA style for any paper I choose to write. Its taken me a few months of on and off trying, but I've finally done it. That being said, I'd like to share some of the things that caught me out, so that others can benefit.


Before Installation:
If you've been using LaTeX or R on Windows, they were probably installed in the  Program Files folder. This will cause you no end of problems.
Re-install these programs on C, in a path with no spaces, as LaTeX doesn't like spaces in path names. For example, install LaTeX in C:\Miktex and install R in C:\bin\R. This will head off a lot of problems that you will encounter.

When installing LaTeX be aware that there are a number of text writing programs which you can use. Of these, I am using TeXniccenter, as it came with my distribution. Its also open source, which is a plus. Others include WinEdT, which is shareware and apparently quite good. Vim, Emacs Speaks Statistics are the only text editors that provide completion for R code, but both of these programs take a lot of effort to learn. In any case, working entirely from LaTeX as the start is very difficult.

The next step is learning to use R. If you are a psychologist, download the arm package and the psych package, as this will give you useful regression diagnostics, and psych provides all the psychometric tools one could need (see here for the authors website, which i devoured when i started laerning R). Unfortunately it doesnt provide IRT methods, but these can be accessed through the ltm and eRm packages, which are also easy to obtain (Select the install packages option in R menus, select a mirror site close to you and select the package name - done).

For exporting to LaTeX, there are a number of packages which do different things. I'm currently using xtable, but this doesn't have a defined outfit profile for factor matrices, which is pain. The manual does show you how to define new classes though, and I will share my results when I have made this work.

One extremely important thing to remember (and something that stumped me for a while) is the syntax for inserting R code.
<>=
some R code
@
The double arrows and the equals sign signal the start of the R code chunk, the options within the arrows define how the output looks (echo=FALSE means that the R code will not be shown, and results=tex tells LaTeX to format the results in its own format). The @ sign ends the code chunk. Now, the part that got me was this: the R code, arrows and @ need to be left justified, otherwise this does not work. This means that if you want to insert a table from your results, do this after running the code through Sweave.

At the moment (since I am neither an expeRt or a TeXpert) I am creating the LaTex objects in R, and then telling R to print them to LaTeX. This allows me to ensure that the objects are created properly before I send them to LaTeX. Sweave will tell you if it has a coding problem and where that occurred,but some things look OK until you actually see them.

The next step is to download the apa package for LaTeX, this will allow you to format the paper in APA style. This is the part that tends not to work if your LaTeX distribution path has spaces in it, so make sure that doesn't happen (I actually reinstalled R and LaTeX on my machine in the recommended places, and now it works like a dream).

You will probably need to learn a little LaTeX, but if you use WinEdT, TeXniccenter or Lyx, then there is  GUI with menus that can aid in this. There are some Sweave templates scattered about the web, and you should probably use one of these. Its probably worth reading either this or this (or both) guide to using LaTeX.

With R, as long as you understand some statistics, its easy enough to Google and then read the recommended files. The introduction is extremely terse and focuses on fundamentals rather than applied analysis, but its useful for its description of summary, plot, lm and the other really useful generic functions.

2 comments:

  1. I hope you're planning on continuing the placebo series eventually.

    ReplyDelete
  2. Why yes, yes I am. Unfortunately, my desire to get published in journals outweighs my desire to blog. Will try to move on to neurobiology this week though.

    ReplyDelete