State of sas7bdat files readers
Notes on all packages (that I know of) that can read sas7bdat files into R, Python, or Julia.
Packages list
Package | CRAN (R only) | Compressed (B) | Compressed (C) | Chunks | Specific Columns | Speed Rank | Write |
---|---|---|---|---|---|---|---|
SASLib.jl | NA | Y | Y | Y | Y | 1 | N |
haven | Y | N | Y | N | Y | 2 | Y |
pandas/saslib | NA | Y | Y | Y | N | 3 | |
sas7bdat | Y | N | N | N | N | ? | N |
sas7bdat.parso | N | N | N | N | N | ? | N |
Meaning of columns
- CRAN is the package on CRAN
- Compressed can the package read compressed sas7bdat files? (B) is binary and (C) is char. SAS compress = Yes is (C)
- Chunks can the package read the file in chunks instead of trying load the whole dataset in memory at once?
- Specific Columns can the package read only columns that the user specified instead of every column
- Speed Rank A ranking of how quickly the package and read/write sas7bdat files
Background
Recently I have had to experience the displeasure of having to use Base SAS on a laptop to process large amounts of data again as part of a consulting engagement. I wanted to read SAS files in chunks and process them using R, however none of the R packages has all the features that I think are crucial including the ability to read the data one chunk at a time. Therefore I want to keep a record of the current state of SAS reader in the Julia/R/Python-verse and hopefully raise awareness regarding important (but missing) features in these packages.
Thank you for this post. I’m currently working on a C++17 reader (https://github.com/olivia76/cpp-sas7bdat).