Resources for R
R is a versatile, powerful and free software environment for statistical computing and graphics. R consists in the basic software and a myriad of packages developed by enthusiasts.
Why R?
You need some motivation to start? You want to be convinced that R is worth being learnt? Here are a few links (among many many others):
- An Econ grad students gives his reasons, most of which are really good
- Some more objective reasons with a focus for economists
- Big online courses have already made the switch, which shows where the industry is going
Setting it up
To set up R, you need to install R on your computer, as well as an editor.
- R proposes to install the software for all OS. You will also find all packages on this website
- RStudio is an integrated platform, ideal for beginners
Getting started in R
There are many tutorials and introductions to get started in R. Here are some of them:
- R For Data Science: an excellent presentation of the modern R tools to import, clean, treat and analyse data
- UCLA Introduction to R: a classic for the very first steps
- swirl couse to learn R, in R
- R For Stata Users: R coding advice for those of you with a good knowledge in Stata. Matthieu Gomez’ site is especially recommended because it includes state-of-the-art coding practices in R (using
dplyr
,readr
andggplot2
)
Getting serious with R
Programming, optimising your code, debugging…
- R Programming For Data Science: what you need to know about R’s under-the-hood mechanics (object classes, control structures, loops), to write good programming in R
- Following Google’s R Code Style Guide to write clean and easily shareable code
- Debugging in RStudio
- Rcpp is a family of packages, initiated by Romain François and Dirk Eddelbuettel, to allow the easy integration of C++ code in R functions. A great vignette (by Hadley Wickham) motivates it well and explains how to get started
- purrr is an effort to make functional programming easier in R. See this tutorial (or that one) to understand how the very useful map function can replace fastidious loops
- memoize is a convenient function wrapper that keeps a cache of previous calls to a function with given arguments. Great to speed up scripts where functions are called many times with the same arguments and to avoid scraping many times the same URL
The best R packages (subjective list)
One of R’s best features is that coding practices and possibilities evolve rapidly, thanks to the great packages that enthusiastic programmers/statisticians bring to the communities. Here is a personal list of favourites.
- ggplot2 has officially made R the single best software to produce high-quality graphs of all kinds: the documentation is fantastic and it has its own Springer book if you want to push further
- dplyr has changed the way we clean and manipulate data in R. Recommended for both transforming a dataset and merging several datasets, it also includes window operations, helpful for panel data for instance. Panel data users should not miss the tidyr package either, which make long-to-wide and wide-to-long transformations incredibly easy
- haven makes it easy to import from/export to Stata/SPSS/SAS and readr provides a fast and friendly way to import flat/tabular files (e.g. csv). Like ddplyr, ggplot2 and tidyr, these packages are due to the prolific Hadley Wickham
- stringr offers a wrapper around the very rich stringi to make the manipulation of character string easier in R. Regular expressions are well supported
- lfe has become, for me, the go-to package to run reduced-form econometrics regressions. felm is a versatile function, which allows to include multi-way fixed effects and IVs in linear models, and takes care of clustering. The results can be tidied/viewed with the very convenient broom package
- stargazer is not ideal and the aesthetics of the generated tables won’t please everybody, but it is still a versatile and convenient package to get LaTeX tables out of regression outputs
Getting help
R has a very active community of users.
- Stack Overflow is where most people go nowadays. Create a profile and be sure to exhaust existing questions and answers before posting your own. If your question is more on the statistical side, no problem: Cross Validated is the place you need.
Resources for LaTeX
LaTeX is a document preparation system, the best one to prepare scientific documents, books or reports. Writers edit a plain-text document, which is then compiled to produce the final output, usually a pdf file.
On-line editors
These editors are ideal to begin with LaTeX, because they do not require the installation of LaTeX on one’s computer. The compilation is automatically done on a remote server. Both sharelatex and overleaf propose to see at the same time the latex document and the compiled output (with a slight delay). They also have the advantage to store your documents online and to allow several users to edit the document at the same time without conflict. Choosing one or the other is a matter of taste.
Getting help
- TeX is the Stack Exchange forum dedicated to LaTeX, a very good place to obtain information (most questions have already been asked)
Miscellaneous
- Classeur is a lightweight in-browser Markdown editor. Interesting features include: sharing and collaborating mode, easy export of documents to LaTeX, html (particularly helpful to write clean content for your webpages) and to pdf. Best of all is the offline mode, as documents edited with Classeur are also locally stored
- Pandoc is a simple but powerful software to convert documents of different formats. For instance, quite helpful to convert Markdown notes into LaTeX
- Matt Gentzkow and Jesse Shapiro have put together a document with incredibly helpful advice on code and data management that everyone should read
- Ethics in scraping: Before scraping, read this general reflection by James Densmore and this entry of the great rud.is blog about how to set wait times