Wednesday, December 7, 2011

A pre-requisite to be a Data Scientist

So what should be in the toolkit of people who call themselves a data scientist?

A fundamental skill is the ability to manipulate data. A data scientist should be familiar and comfortable with a number of platforms and scripting tools to get the job done. What is difficult in Excel might be trivial in R. And when R struggles, you should switch to Unix (or use a programming language such as Python) get that portion of the data munging done. Along the way, you pick up a lot of tips and tricks. For example: how to read a big datafile in R?

The goal is to get the job done. Familiarity with a wide variety of tools, and expertise in some is the hallmark of any good would-be data scientist.

Friday, December 2, 2011

O'Reilly's Data Science Kit - Books

It is not as if I don't have enough books (and material on the web) to read. But this list compiled by the O'Reilly team should make any data analyst salivate.

The Books and Video included in the set are:

  1. Data Analysis with Open Source Tools
  2. Designing Data Visualizations
  3. An Introduction to Machine Learning with Web Data (Video)
  4. Beautiful Data
  5. Think Stats
  6. R Cookbook
  7. R in a Nutshell
  8. Programming Collective Intelligence