freadがシェルのコマンドを受け付けてくれて便利
テキストファイルの読み込みに関して、read.tableやらそのラッパーのread.csvよりdata.tableパッケージのfread関数が速いというのはよく知られた事実です。
http://www.slideshare.net/sfchaos/datatable
さて、そのfreadがシェルのコマンドを受け付けてくれるようになりました。
https://r-forge.r-project.org/scm/viewvc.php/pkg/NEWS?view=markup&root=datatable
関数の中身をみるとpaste0で文字列結合してshell(もしくはsystem)に投げているだけですが地味に便利。
使用例は以下のような感じです。
enjoy!!!
# install.packages("data.table",repos="http://R-Forge.R-project.org", type="source") library(data.table) # サンプルデータ DT <- data.table( a=1:1000, d=rep(c("foo", "bar"), 500) ) write.csv(DT, "test.csv") # すべて読み込むと1,000件 fread("test.csv") V1 a d 1: 1 1 foo 2: 2 2 bar 3: 3 3 foo 4: 4 4 bar 5: 5 5 foo --- 996: 996 996 bar 997: 997 997 foo 998: 998 998 bar 999: 999 999 foo 1000: 1000 1000 bar # fooを含む行だけを抽出してfreadで読み込む fread("grep foo test.csv") V1 V2 V3 1: 1 1 foo 2: 3 3 foo 3: 5 5 foo 4: 7 7 foo 5: 9 9 foo --- 496: 991 991 foo 497: 993 993 foo 498: 995 995 foo 499: 997 997 foo 500: 999 999 foo # verbose=TRUEにすることで何が起きているかを把握できる fread("grep foo test.csv", verbose=TRUE) Input contains no \n. Taking this to be a filename to open Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=',' Found 3 columns First row with 3 fields occurs on line 1 (either column names or first row of data) Some fields on line 1 are not type character (or are empty). Treating as a data row and using default column names. Count of eol after first data row: 501 Subtracted 1 for last eol and any trailing empty lines, leaving 500 data rows Type codes: 414 (first 5 rows) Type codes: 414 (+middle 5 rows) Type codes: 414 (+last 5 rows) Type codes: 414 (after applying colClasses and integer64) Type codes: 414 (after applying drop or select (if supplied) Allocating 3 column slots (3 - 0 NULL) 0.000s ( 0%) Memory map (rerun may be quicker) 0.000s ( 0%) sep and header detection 0.000s ( 0%) Count rows (wc -l) 0.001s (100%) Column type detection (first, middle and last 5 rows) 0.000s ( 0%) Allocation of 500x3 result (xMB) in RAM 0.000s ( 0%) Reading data 0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered 0.000s ( 0%) Coercing data already read in type bumps (if any) 0.000s ( 0%) Changing na.strings to NA 0.001s Total V1 V2 V3 1: 1 1 foo 2: 3 3 foo 3: 5 5 foo 4: 7 7 foo 5: 9 9 foo --- 496: 991 991 foo 497: 993 993 foo 498: 995 995 foo 499: 997 997 foo 500: 999 999 foo