State of sas7bdat files readers

Published Sep 23, 2017Last updated Feb 26, 2018

Notes on all packages (that I know of) that can read sas7bdat files into R, Python, or Julia.

Packages list

Package CRAN (R only) Compressed (B) Compressed (C) Chunks Specific Columns Speed Rank Write
SASLib.jl NA Y Y Y Y 1 N
haven Y N Y N Y 2 Y
pandas/saslib NA Y Y Y N 3
sas7bdat Y N N N N ? N
sas7bdat.parso N N N N N ? N

Meaning of columns

  • CRAN is the package on CRAN
  • Compressed can the package read compressed sas7bdat files? (B) is binary and (C) is char. SAS compress = Yes is (C)
  • Chunks can the package read the file in chunks instead of trying load the whole dataset in memory at once?
  • Specific Columns can the package read only columns that the user specified instead of every column
  • Speed Rank A ranking of how quickly the package and read/write sas7bdat files

Background

Recently I have had to experience the displeasure of having to use Base SAS on a laptop to process large amounts of data again as part of a consulting engagement. I wanted to read SAS files in chunks and process them using R, however none of the R packages has all the features that I think are crucial including the ability to read the data one chunk at a time. Therefore I want to keep a record of the current state of SAS reader in the Julia/R/Python-verse and hopefully raise awareness regarding important (but missing) features in these packages.

Discover and read more posts from ZJ
get started