| Title: | Efficient Serialization of R Objects |
|---|---|
| Description: | Streamlines and accelerates the process of saving and loading R objects, improving speed and compression compared to other methods. The package provides two compression formats: the 'qs2' format, which uses R serialization via the C API while optimizing compression and disk I/O, and the 'qdata' format, featuring custom serialization for slightly faster performance and better compression. Additionally, the 'qs2' format can be directly converted to the standard 'RDS' format, ensuring long-term compatibility with future versions of R. |
| Authors: | Travers Ching [aut, cre, cph], Yann Collet [ctb, cph] (Yann Collet is the author of the bundled zstd), Facebook, Inc. [cph] (Facebook is the copyright holder of the bundled zstd code), Reichardt Tino [ctb, cph] (Contributor/copyright holder of zstd bundled code), Skibinski Przemyslaw [ctb, cph] (Contributor/copyright holder of zstd bundled code), Mori Yuta [ctb, cph] (Contributor/copyright holder of zstd bundled code), Francesc Alted [ctb, cph] (Shuffling routines derived from Blosc library) |
| Maintainer: | Travers Ching <[email protected]> |
| License: | GPL-3 |
| Version: | 0.2.2 |
| Built: | 2026-06-03 08:29:01 UTC |
| Source: | https://github.com/qsbase/qs2 |
Decodes a Z85 encoded string back to binary
base85_decode(encoded_string)base85_decode(encoded_string)
encoded_string |
A string. |
The original raw vector.
Encodes binary data (a raw vector) as ASCII text using Z85 encoding format.
base85_encode(rawdata)base85_encode(rawdata)
rawdata |
A raw vector. |
Z85 is a binary to ASCII encoding format created by Pieter Hintjens in 2010 and is part of the ZeroMQ RFC. The encoding has a dictionary using 85 out of 94 printable ASCII characters. There are other base 85 encoding schemes, including Ascii85, which is popularized and used by Adobe. Z85 is distinguished by its choice of dictionary, which is suitable for easier inclusion into source code for many programming languages. The dictionary excludes all quote marks and other control characters, and requires no special treatment in R and most other languages. Note: although the official specification restricts input length to multiples of four bytes, the implementation here works with any input length. The overhead (extra bytes used relative to binary) is 25%. In comparison, base 64 encoding has an overhead of 33.33%.
A string representation of the raw vector.
https://rfc.zeromq.org/spec/32/
Decodes a basE91 encoded string back to binary
base91_decode(encoded_string)base91_decode(encoded_string)
encoded_string |
A string. |
The original raw vector.
Encodes binary data (a raw vector) as ASCII text using basE91 encoding format.
base91_encode(rawdata, quote_character = "\"")base91_encode(rawdata, quote_character = "\"")
rawdata |
A raw vector. |
quote_character |
The character to use in the encoding, replacing the double quote character. Must be either a single quote ( |
basE91 (capital E for stylization) is a binary to ASCII encoding format created by Joachim Henke in 2005.
The overhead (extra bytes used relative to binary) is 22.97% on average. In comparison, base 64 encoding has an overhead of 33.33%.
The original encoding uses a dictionary of 91 out of 94 printable ASCII characters excluding - (dash), \ (backslash) and ' (single quote).
The original encoding does include double quote characters, which are less than ideal for strings in R. Therefore,
you can use the quote_character parameter to substitute dash or single quote.
A string representation of the raw vector.
https://base91.sourceforge.net/
Shuffles a raw vector using BLOSC shuffle routines.
blosc_shuffle_raw(data, bytesofsize)blosc_shuffle_raw(data, bytesofsize)
data |
A raw vector to be shuffled. |
bytesofsize |
Either |
The shuffled vector.
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
Un-shuffles a raw vector using BLOSC un-shuffle routines.
blosc_unshuffle_raw(data, bytesofsize)blosc_unshuffle_raw(data, bytesofsize)
data |
A raw vector to be unshuffled. |
bytesofsize |
Either |
The unshuffled vector.
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
A helper function for encoding and compressing a file or string to ASCII using base91_encode() and qs_serialize() with the highest compression level.
decode_source(string)decode_source(string)
string |
A string to decode. |
The original (decoded) object.
encode_source() for more details.
A helper function for encoding and compressing a file or string to ASCII using base91_encode() and qs_serialize() with the highest compression level.
encode_source(x = NULL, file = NULL, width = 120)encode_source(x = NULL, file = NULL, width = 120)
x |
The object to encode (if |
file |
The file to encode (if |
width |
The output will be broken up into individual strings, with |
The encode_source() and decode_source() functions are useful for storing small amounts of data or text inline to a .R or .Rmd file.
A character vector in base91 representing the compressed original file or object.
set.seed(1); data <- sample(500) result <- encode_source(data) # Note: the result string is not guaranteed to be consistent between qs or zstd versions # but will always properly decode regardless print(result) result <- decode_source(result) # [1] 1 2 3 4 5 6 7 8 9 10set.seed(1); data <- sample(500) result <- encode_source(data) # Note: the result string is not guaranteed to be consistent between qs or zstd versions # but will always properly decode regardless print(result) result <- decode_source(result) # [1] 1 2 3 4 5 6 7 8 9 10
Creates a small deterministic data frame with string, numeric, and integer columns. The numeric and integer columns combine seeded random noise with a mild linear trend so downstream smoke tests exercise both repetition and variation.
generate_test_data(nrows, seed = 1L)generate_test_data(nrows, seed = 1L)
nrows |
Number of rows to generate. |
seed |
Integer seed used to make the output reproducible. |
A data frame with columns string, numeric, and int.
Deserializes a raw vector to an object using the qdata format.
qd_deserialize(input, use_alt_rep = qopt("use_alt_rep"), validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))qd_deserialize(input, use_alt_rep = qopt("use_alt_rep"), validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))
input |
The raw vector to deserialize. |
use_alt_rep |
Request ALTREP when reading qdata string data. This option is temporarily disabled; if TRUE, qs2 warns and falls back to ordinary character vectors (the initial value is FALSE). |
validate_checksum |
If TRUE, validate checksum before deserialization and error on mismatch (or missing checksum). If FALSE, checksum is computed during read and mismatches (or missing checksum) produce a warning after reading (the initial value is FALSE). |
nthreads |
The number of threads to use when reading data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
The deserialized object.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns TRUE
Reads an object that was saved to disk in the qdata format.
qd_read(file, use_alt_rep = qopt("use_alt_rep"), validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))qd_read(file, use_alt_rep = qopt("use_alt_rep"), validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))
file |
The file name/path. |
use_alt_rep |
Request ALTREP when reading qdata string data. This option is temporarily disabled; if TRUE, qs2 warns and falls back to ordinary character vectors (the initial value is FALSE). |
validate_checksum |
If TRUE, validate checksum before deserialization and error on mismatch (or missing checksum). If FALSE, checksum is computed during read and mismatches (or missing checksum) produce a warning after reading (the initial value is FALSE). |
nthreads |
The number of threads to use when reading data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
The object stored in file.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns TRUE
Saves an object to disk using the qdata format.
qd_save(object, file, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), warn_unsupported_types = qopt("warn_unsupported_types"), nthreads = qopt("nthreads"))qd_save(object, file, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), warn_unsupported_types = qopt("warn_unsupported_types"), nthreads = qopt("nthreads"))
object |
The object to save. |
file |
The file name/path. |
compress_level |
The compression level used (the initial value is 3L). The maximum and minimum possible values depend on the version of the ZSTD library used. As of ZSTD 1.5.7 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (the initial value is TRUE). |
warn_unsupported_types |
Whether to warn when saving an object with an unsupported type (the initial value is TRUE). |
nthreads |
The number of threads to use when compressing data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
No value is returned. The file is written to disk.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns TRUE
Serializes an object to a raw vector using the qdata format.
qd_serialize(object, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), warn_unsupported_types = qopt("warn_unsupported_types"), nthreads = qopt("nthreads"))qd_serialize(object, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), warn_unsupported_types = qopt("warn_unsupported_types"), nthreads = qopt("nthreads"))
object |
The object to save. |
compress_level |
The compression level used (the initial value is 3L). The maximum and minimum possible values depend on the version of the ZSTD library used. As of ZSTD 1.5.7 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (the initial value is TRUE). |
warn_unsupported_types |
Whether to warn when saving an object with an unsupported type (the initial value is TRUE). |
nthreads |
The number of threads to use when compressing data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
The serialized object as a raw vector.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns TRUE
Get or set a global qs2 option.
qopt(parameter, value = NULL)qopt(parameter, value = NULL)
parameter |
A character string specifying the option to access. Must be one of "compress_level", "shuffle", "nthreads", "validate_checksum", "warn_unsupported_types", or "use_alt_rep". |
value |
If |
This function provides an interface to retrieve or update internal qs2 options such as compression level, shuffle flag, number of threads, checksum validation, warning for unsupported types, and requested ALTREP usage. It directly calls the underlying C-level functions.
The default settings are:
compress_level: 3L
shuffle: TRUE
nthreads: 1L
validate_checksum: FALSE
warn_unsupported_types: TRUE (used only in qd_save)
use_alt_rep: FALSE (accepted by qd_read and qd_deserialize, but temporarily disabled)
When parameter = "use_alt_rep" is set to TRUE, qdata reads currently
warn and fall back to ordinary character vectors.
When value is NULL, the current value of the specified option is returned.
Otherwise, the option is set to value and the new value is returned invisibly.
If value is NULL, returns the current value of the specified option.
Otherwise, sets the option and returns the new value invisibly.
# Get the current compression level: qopt("compress_level") # Set the compression level to 5: qopt("compress_level", value = 5) # Get the current shuffle setting: qopt("shuffle") # Get the current setting for warn_unsupported_types (used in qd_save): qopt("warn_unsupported_types") # Get the current setting for use_alt_rep: qopt("use_alt_rep")# Get the current compression level: qopt("compress_level") # Set the compression level to 5: qopt("compress_level", value = 5) # Get the current shuffle setting: qopt("shuffle") # Get the current setting for warn_unsupported_types (used in qd_save): qopt("warn_unsupported_types") # Get the current setting for use_alt_rep: qopt("use_alt_rep")
Helper function for caching objects for long running tasks
qs_cache( expr, name, envir = parent.frame(), cache_dir = ".cache", clear = FALSE, prompt = TRUE, qs_save_params = list(), qs_read_params = list(), verbose = TRUE )qs_cache( expr, name, envir = parent.frame(), cache_dir = ".cache", clear = FALSE, prompt = TRUE, qs_save_params = list(), qs_read_params = list(), verbose = TRUE )
expr |
The expression to evaluate. |
name |
The cached expression name (see details). |
envir |
The environment to evaluate |
cache_dir |
The directory to store cached files in. |
clear |
Set to |
prompt |
Whether to prompt before clearing. |
qs_save_params |
List of parameters passed on to |
qs_read_params |
List of parameters passed on to |
verbose |
Boolean. If |
This is a (very) simple helper function to cache results of long running calculations. There are other packages specializing in caching data that are more feature complete.
The evaluated expression is saved with qs_save() in <cache_dir>/<name>.qs2.
If the file already exists instead, the expression is not evaluated and the cached result is read using qs_read() and returned.
To clear a cached result, you can manually delete the associated .qs2 file, or you can call qs_cache() with clear = TRUE.
If prompt is also TRUE a prompt will be given asking you to confirm deletion.
If name is not specified, all cached results in cache_dir will be removed.
cache_dir <- tempdir() a <- 1 b <- 5 # not cached result <- qs_cache({a + b}, name="aplusb", cache_dir = cache_dir, qs_save_params = list(compress_level = 5)) # cached result <- qs_cache({a + b}, name="aplusb", cache_dir = cache_dir, qs_save_params = list(compress_level = 5)) # clear cached result qs_cache(name="aplusb", clear=TRUE, prompt=FALSE, cache_dir = cache_dir)cache_dir <- tempdir() a <- 1 b <- 5 # not cached result <- qs_cache({a + b}, name="aplusb", cache_dir = cache_dir, qs_save_params = list(compress_level = 5)) # cached result <- qs_cache({a + b}, name="aplusb", cache_dir = cache_dir, qs_save_params = list(compress_level = 5)) # clear cached result qs_cache(name="aplusb", clear=TRUE, prompt=FALSE, cache_dir = cache_dir)
Deserializes a raw vector to an object using the qs2 format.
qs_deserialize(input, validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))qs_deserialize(input, validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))
input |
The raw vector to deserialize. |
validate_checksum |
If TRUE, validate checksum before deserialization and error on mismatch (or missing checksum). If FALSE, checksum is computed during read and mismatches (or missing checksum) produce a warning after reading (the initial value is FALSE). |
nthreads |
The number of threads to use when reading data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
The deserialized object.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns TRUE
Reads an object that was saved to disk in the qs2 format.
qs_read(file, validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))qs_read(file, validate_checksum = qopt("validate_checksum"), nthreads = qopt("nthreads"))
file |
The file name/path. |
validate_checksum |
If TRUE, validate checksum before deserialization and error on mismatch (or missing checksum). If FALSE, checksum is computed during read and mismatches (or missing checksum) produce a warning after reading (the initial value is FALSE). |
nthreads |
The number of threads to use when reading data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
The object stored in file.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns TRUE
Reads an object in a file serialized to disk using qs_savem().
qs_readm(file, env = parent.frame(), ...)qs_readm(file, env = parent.frame(), ...)
file |
The file name/path. |
env |
The environment where the data should be loaded. Default is the calling environment ( |
... |
additional arguments will be passed to qs_read. |
This function extends qs_read to replicate the functionality of base::load() to load multiple saved objects into your workspace.
Nothing is explicitly returned, but the function will load the saved objects into the workspace.
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns truex1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns true
Saves an object to disk using the qs2 format.
qs_save(object, file, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), nthreads = qopt("nthreads"))qs_save(object, file, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), nthreads = qopt("nthreads"))
object |
The object to save. |
file |
The file name/path. |
compress_level |
The compression level used (the initial value is 3L). The maximum and minimum possible values depend on the version of the ZSTD library used. As of ZSTD 1.5.7 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (the initial value is TRUE). |
nthreads |
The number of threads to use when compressing data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
No value is returned. The file is written to disk.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns TRUE
Saves (serializes) multiple objects to disk.
qs_savem(...)qs_savem(...)
... |
Objects to serialize. Named arguments will be passed to |
This function extends qs_save() to replicate the functionality of base::save() to save multiple objects. Read them back with qs_readm().
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns truex1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns true
Serializes an object to a raw vector using the qs2 format.
qs_serialize(object, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), nthreads = qopt("nthreads"))qs_serialize(object, compress_level = qopt("compress_level"), shuffle = qopt("shuffle"), nthreads = qopt("nthreads"))
object |
The object to save. |
compress_level |
The compression level used (the initial value is 3L). The maximum and minimum possible values depend on the version of the ZSTD library used. As of ZSTD 1.5.7 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (the initial value is TRUE). |
nthreads |
The number of threads to use when compressing data (the initial value is 1L). When TBB is not available, values greater than 1 emit a warning and fall back to 1. |
The serialized object as a raw vector.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns TRUEx <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns TRUE
Converts a file saved in the qs2 format to the RDS format.
qs_to_rds(input_file, output_file, compress_level = 6)qs_to_rds(input_file, output_file, compress_level = 6)
input_file |
The |
output_file |
The |
compress_level |
The gzip compression level to use when writing the RDS file (a value between 0 and 9). |
No value is returned. The converted file is written to disk.
qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) qs_save(x, qs_tmp) qs_to_rds(input_file = qs_tmp, output_file = rds_tmp) x2 <- readRDS(rds_tmp) stopifnot(identical(x, x2))qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) qs_save(x, qs_tmp) qs_to_rds(input_file = qs_tmp, output_file = rds_tmp) x2 <- readRDS(rds_tmp) stopifnot(identical(x, x2))
Exports the uncompressed binary serialization to a list of raw vectors for both qs2 and qdata formats.
For testing and exploratory purposes mainly.
qx_dump(file)qx_dump(file)
file |
A file name/path. |
A list containing uncompressed binary serialization and metadata.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) binary_data <- qx_dump(myfile)x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) binary_data <- qx_dump(myfile)
Converts a file saved in the RDS format to the qs2 format.
rds_to_qs(input_file, output_file, compress_level = 3)rds_to_qs(input_file, output_file, compress_level = 3)
input_file |
The |
output_file |
The |
compress_level |
The zstd compression level to use when writing the |
rds_to_qs() currently supports only gzip-compressed RDS input files.
Files that are uncompressed or use another compression format are rejected.
The shuffle parameter is currently not supported when converting from RDS to qs2.
No value is returned. The converted file is written to disk.
qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) saveRDS(x, rds_tmp, compress = "gzip") rds_to_qs(input_file = rds_tmp, output_file = qs_tmp) x2 <- qs_read(qs_tmp, validate_checksum = TRUE) stopifnot(identical(x, x2))qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) saveRDS(x, rds_tmp, compress = "gzip") rds_to_qs(input_file = rds_tmp, output_file = qs_tmp) x2 <- qs_read(qs_tmp, validate_checksum = TRUE) stopifnot(identical(x, x2))
Data from the International Astronomical Union. An official list of the 336 internationally recognized named stars, updated as of June 1, 2018.
data(starnames)data(starnames)
A data.frame with official IAU star names and several properties, such as coordinates.
Naming Stars | International Astronomical Union.
E Mamajek et. al. (2018), WG Triennial Report (2015-2018) - Star Names, Reports on Astronomy, 22 Mar 2018.
data(starnames)data(starnames)
Calculates a 64-bit XXH3 hash.
xxhash_raw(data)xxhash_raw(data)
data |
The data to hash. |
The 64-bit hash.
x <- as.raw(c(1,2,3)) xxhash_raw(x)x <- as.raw(c(1,2,3)) xxhash_raw(x)
Exports the compress bound function from the zstd library. Returns the
maximum potential compressed size of an object of length size.
zstd_compress_bound(size)zstd_compress_bound(size)
size |
A single non-negative whole number. Values larger than
|
A numeric scalar giving the maximum compressed size.
zstd_compress_bound(100000) zstd_compress_bound(2^31)zstd_compress_bound(100000) zstd_compress_bound(2^31)
Compresses to a raw vector using the zstd algorithm. Exports the main zstd compression function.
zstd_compress_raw(data, compress_level = qopt("compress_level"))zstd_compress_raw(data, compress_level = qopt("compress_level"))
data |
Raw vector to be compressed. |
compress_level |
The compression level used. |
The compressed data as a raw vector.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
Decompresses a zstd compressed raw vector.
zstd_decompress_raw(data)zstd_decompress_raw(data)
data |
A raw vector to be decompressed. |
The decompressed data as a raw vector.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
Helpers for compressing and decompressing zstd files.
A utility function to compresses a file with zstd.
A utility function to decompresses a zstd file to disk.
zstd_compress_file(input_file, output_file, compress_level = qopt("compress_level")) zstd_decompress_file(input_file, output_file, max_output_bytes = NULL)zstd_compress_file(input_file, output_file, compress_level = qopt("compress_level")) zstd_decompress_file(input_file, output_file, max_output_bytes = NULL)
compress_level |
The compression level used. |
input_file |
Path to the input file. |
output_file |
Path to the output file. |
max_output_bytes |
Optional maximum number of decompressed output bytes. When supplied, decompression stops with an error before writing a chunk that would exceed this limit. |
No value is returned. The file is written to disk.
No value is returned. The file is written to disk.
infile <- tempfile() writeBin(as.raw(1:5), infile) outfile <- tempfile() zstd_compress_file(infile, outfile, compress_level = 1) stopifnot(file.exists(outfile)) infile <- tempfile() writeBin(as.raw(1:5), infile) zfile <- tempfile() zstd_compress_file(infile, zfile, compress_level = 1) outfile <- tempfile() zstd_decompress_file(zfile, outfile) stopifnot(identical(readBin(infile, what = "raw", n = 5), readBin(outfile, what = "raw", n = 5)))infile <- tempfile() writeBin(as.raw(1:5), infile) outfile <- tempfile() zstd_compress_file(infile, outfile, compress_level = 1) stopifnot(file.exists(outfile)) infile <- tempfile() writeBin(as.raw(1:5), infile) zfile <- tempfile() zstd_compress_file(infile, zfile, compress_level = 1) outfile <- tempfile() zstd_decompress_file(zfile, outfile) stopifnot(identical(readBin(infile, what = "raw", n = 5), readBin(outfile, what = "raw", n = 5)))
Substitutes a zstd compressed file for a regular input file. The zstd compressed
file is decompressed to the input FUN.
Substitutes a zstd compressed file for a regular output file. The output of FUN is
converted to a zstd compressed file at the target zstd file path.
zstd_in( FUN, ..., envir = parent.frame(), tmpfile = tempfile(), max_output_bytes = NULL ) zstd_out(FUN, ..., envir = parent.frame(), tmpfile = tempfile())zstd_in( FUN, ..., envir = parent.frame(), tmpfile = tempfile(), max_output_bytes = NULL ) zstd_out(FUN, ..., envir = parent.frame(), tmpfile = tempfile())
FUN |
Function to call. |
... |
Arguments passed to |
envir |
Environment for |
tmpfile |
Intermediate file path used during conversion. It is removed on exit, whether supplied or auto-generated. |
max_output_bytes |
Optional maximum number of decompressed output bytes
passed through to |
This is a generic wrapper that works with any function that reads from a file.
This is a generic wrapper that works with any function that writes to a file.
The value returned by FUN.
The value returned by FUN, with its visibility preserved.
if (requireNamespace("data.table", quietly = TRUE)) { zfile <- tempfile(fileext = ".csv.zst") zstd_out(data.table::fwrite, mtcars, file = zfile) dt <- zstd_in(data.table::fread, file = zfile) print(nrow(dt)) }if (requireNamespace("data.table", quietly = TRUE)) { zfile <- tempfile(fileext = ".csv.zst") zstd_out(data.table::fwrite, mtcars, file = zfile) dt <- zstd_in(data.table::fread, file = zfile) print(nrow(dt)) }