Just got back from JSM2009 in Washington, D.C. (we were newsworthy), where I also attended an R graphics workshop given by Hadley Wickham, Di Cook, and Heike Hofmann. (Next offering is October in Austin. Take your laptop.) Wow! was I behind the curve in using new R packages. Here are the goodies I picked up:
- ggplot2: sophisticated multivariate graphics based on the ideas from Leland Wilkinson’s The Grammar of Graphics. GoG annoyed the crap out of me when it first came out in 1999, because the implementation only existed in a prohibitively priced SPSS add-on. Wickham has built an R implementation, and distributed it free to the world. Hadley’s book version, ggplot2 is freshly available from Springer, but if you hurry you can download it from his website.
- rggobi: GGobi is a great interactive visual tool for exploring high-dimensional multivariate data sets. Di and Heike walked a mixed gang of quants (me, included) through the details of ggobi-ing via R. Di has a book on R and Ggobi, Interactive and Dynamic Graphics for Data Analysis, also from Springer.
- RExcel: irate that Excel can’t do a simple boxplot, let alone any really useful statistics? Complain no more. One of my earliest sessions at JSM was also the most productive. Richard Heiberger and Erich Neuwirth demonstrated RExcel, gleefully invocing R functions and running R scripts from the comfort of an Excel worksheet. You can get a thorough rundown in Heiberger and Neuwirth’s new book R Through Excel, part of Springer’s new Use R! series. If you’re unfamiliar with R, download the RAndFriends package at the RExcel link–it installs EVERYTHING.
But wait! there’s more. At least two universities and two prominent government agencies have constructively used your tax dollars to put official data online in a more-than-just accessible form.
- The Minnesota Population Center at the University of Minnesota maintains an extensive population database which includes the Integrated Public Use Microdata Series, containing samples from the US census and the annual American Community Survey; the American Time Use Survey; the Current Population Survey; IPUMS International; the National Historical Geographic Information System with both statistical and GIS data; and the North Atlantic Population Project. The icing on the cake is that most of these have embedded query and analysis tools which allow you to compute statistics and generate graphs online.
- The Institute for Social Research at the University of Michigan maintains the Substance Abuse and Mental Health Data Archive for the US Department of Health and Human Services. This site, too, has integrated query and analysis tools.
- The Centers for Disease Control have upgraded their online retrieval tools with Health, United States, 2008 and Birth Data.
- Finally, the Census Bureau has added a site on Local Employment Dynamics, which generates some great maps, and Data Ferret, a whole world of data search in itself.
Leave a comment