freadがシェルのコマンドを受け付けてくれて便利

テキストファイルの読み込みに関して、read.tableやらそのラッパーのread.csvよりdata.tableパッケージのfread関数が速いというのはよく知られた事実です。
http://www.slideshare.net/sfchaos/datatable

さて、そのfreadがシェルのコマンドを受け付けてくれるようになりました。
https://r-forge.r-project.org/scm/viewvc.php/pkg/NEWS?view=markup&root=datatable
関数の中身をみるとpaste0で文字列結合してshell(もしくはsystem)に投げているだけですが地味に便利。
使用例は以下のような感じです。
enjoy!!!

# install.packages("data.table",repos="http://R-Forge.R-project.org", type="source")
library(data.table)

# サンプルデータ
DT <- data.table( a=1:1000,
                 d=rep(c("foo", "bar"), 500)
                 )
write.csv(DT, "test.csv")

# すべて読み込むと1,000件
fread("test.csv")

        V1    a   d
   1:    1    1 foo
   2:    2    2 bar
   3:    3    3 foo
   4:    4    4 bar
   5:    5    5 foo
  ---              
 996:  996  996 bar
 997:  997  997 foo
 998:  998  998 bar
 999:  999  999 foo
1000: 1000 1000 bar

# fooを含む行だけを抽出してfreadで読み込む
fread("grep foo test.csv")

      V1  V2  V3
  1:   1   1 foo
  2:   3   3 foo
  3:   5   5 foo
  4:   7   7 foo
  5:   9   9 foo
 ---            
496: 991 991 foo
497: 993 993 foo
498: 995 995 foo
499: 997 997 foo
500: 999 999 foo

# verbose=TRUEにすることで何が起きているかを把握できる

fread("grep foo test.csv", verbose=TRUE)

Input contains no \n. Taking this to be a filename to open
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 3 columns
First row with 3 fields occurs on line 1 (either column names or first row of data)
Some fields on line 1 are not type character (or are empty). Treating as a data row and using default column names.
Count of eol after first data row: 501
Subtracted 1 for last eol and any trailing empty lines, leaving 500 data rows
Type codes: 414 (first 5 rows)
Type codes: 414 (+middle 5 rows)
Type codes: 414 (+last 5 rows)
Type codes: 414 (after applying colClasses and integer64)
Type codes: 414 (after applying drop or select (if supplied)
Allocating 3 column slots (3 - 0 NULL)
   0.000s (  0%) Memory map (rerun may be quicker)
   0.000s (  0%) sep and header detection
   0.000s (  0%) Count rows (wc -l)
   0.001s (100%) Column type detection (first, middle and last 5 rows)
   0.000s (  0%) Allocation of 500x3 result (xMB) in RAM
   0.000s (  0%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.001s        Total
      V1  V2  V3
  1:   1   1 foo
  2:   3   3 foo
  3:   5   5 foo
  4:   7   7 foo
  5:   9   9 foo
 ---            
496: 991 991 foo
497: 993 993 foo
498: 995 995 foo
499: 997 997 foo
500: 999 999 foo