Title: | Efficient Serialization of R Objects |
---|---|
Description: | Streamlines and accelerates the process of saving and loading R objects, improving speed and compression compared to other methods. The package provides two compression formats: the 'qs2' format, which uses R serialization via the C API while optimizing compression and disk I/O, and the 'qdata' format, featuring custom serialization for slightly faster performance and better compression. Additionally, the 'qs2' format can be directly converted to the standard 'RDS' format, ensuring long-term compatibility with future versions of R. |
Authors: | Travers Ching [aut, cre, cph], Yann Collet [ctb, cph] (Yann Collet is the author of the bundled zstd), Facebook, Inc. [cph] (Facebook is the copyright holder of the bundled zstd code), Reichardt Tino [ctb, cph] (Contributor/copyright holder of zstd bundled code), Skibinski Przemyslaw [ctb, cph] (Contributor/copyright holder of zstd bundled code), Mori Yuta [ctb, cph] (Contributor/copyright holder of zstd bundled code), Francesc Alted [ctb, cph] (Shuffling routines derived from Blosc library) |
Maintainer: | Travers Ching <[email protected]> |
License: | GPL-3 |
Version: | 0.1.2 |
Built: | 2024-11-11 21:14:03 UTC |
Source: | https://github.com/qsbase/qs2 |
Decodes a Z85 encoded string back to binary
base85_decode(encoded_string)
base85_decode(encoded_string)
encoded_string |
A string. |
The original raw vector.
Encodes binary data (a raw vector) as ASCII text using Z85 encoding format.
base85_encode(rawdata)
base85_encode(rawdata)
rawdata |
A raw vector. |
Z85 is a binary to ASCII encoding format created by Pieter Hintjens in 2010 and is part of the ZeroMQ RFC. The encoding has a dictionary using 85 out of 94 printable ASCII characters. There are other base 85 encoding schemes, including Ascii85, which is popularized and used by Adobe. Z85 is distinguished by its choice of dictionary, which is suitable for easier inclusion into source code for many programming languages. The dictionary excludes all quote marks and other control characters, and requires no special treatment in R and most other languages. Note: although the official specification restricts input length to multiples of four bytes, the implementation here works with any input length. The overhead (extra bytes used relative to binary) is 25%. In comparison, base 64 encoding has an overhead of 33.33%.
A string representation of the raw vector.
https://rfc.zeromq.org/spec/32/
Decodes a basE91 encoded string back to binary
base91_decode(encoded_string)
base91_decode(encoded_string)
encoded_string |
A string. |
The original raw vector.
Encodes binary data (a raw vector) as ASCII text using basE91 encoding format.
base91_encode(rawdata, quote_character = "\"")
base91_encode(rawdata, quote_character = "\"")
rawdata |
A raw vector. |
quote_character |
The character to use in the encoding, replacing the double quote character. Must be either a single quote ( |
basE91 (capital E for stylization) is a binary to ASCII encoding format created by Joachim Henke in 2005.
The overhead (extra bytes used relative to binary) is 22.97% on average. In comparison, base 64 encoding has an overhead of 33.33%.
The original encoding uses a dictionary of 91 out of 94 printable ASCII characters excluding -
(dash), \
(backslash) and '
(single quote).
The original encoding does include double quote characters, which are less than ideal for strings in R. Therefore,
you can use the quote_character
parameter to substitute dash or single quote.
A string representation of the raw vector.
https://base91.sourceforge.net/
Shuffles a raw vector using BLOSC shuffle routines.
blosc_shuffle_raw(data, bytesofsize)
blosc_shuffle_raw(data, bytesofsize)
data |
A raw vector to be shuffled. |
bytesofsize |
Either |
The shuffled vector
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
Un-shuffles a raw vector using BLOSC un-shuffle routines.
blosc_unshuffle_raw(data, bytesofsize)
blosc_unshuffle_raw(data, bytesofsize)
data |
A raw vector to be unshuffled. |
bytesofsize |
Either |
The unshuffled vector.
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
A helper function for encoding and compressing a file or string to ASCII using base91_encode()
and qs_serialize()
with the highest compression level.
decode_source(string)
decode_source(string)
string |
A string to decode. |
The original (decoded) object.
encode_source()
for more details.
A helper function for encoding and compressing a file or string to ASCII using base91_encode()
and qs_serialize()
with the highest compression level.
encode_source(x = NULL, file = NULL, width = 120)
encode_source(x = NULL, file = NULL, width = 120)
x |
The object to encode (if |
file |
The file to encode (if |
width |
The output will be broken up into individual strings, with |
The encode_source()
and decode_source()
functions are useful for storing small amounts of data or text inline to a .R or .Rmd file.
A character vector in base91 representing the compressed original file or object.
set.seed(1); data <- sample(500) result <- encode_source(data) # Note: the result string is not guaranteed to be consistent between qs or zstd versions # but will always properly decode regardless print(result) result <- decode_source(result) # [1] 1 2 3 4 5 6 7 8 9 10
set.seed(1); data <- sample(500) result <- encode_source(data) # Note: the result string is not guaranteed to be consistent between qs or zstd versions # but will always properly decode regardless print(result) result <- decode_source(result) # [1] 1 2 3 4 5 6 7 8 9 10
Deserializes a raw vector to an object using the qdata
format.
qd_deserialize(input, use_alt_rep = FALSE, validate_checksum = FALSE, nthreads = 1L)
qd_deserialize(input, use_alt_rep = FALSE, validate_checksum = FALSE, nthreads = 1L)
input |
The raw vector to deserialize. |
use_alt_rep |
Use ALTREP when reading in string data (default |
validate_checksum |
Whether to validate the stored checksum in the file (default |
nthreads |
The number of threads to use when reading data (default: |
The deserialized object.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns true
Reads an object that was saved to disk in the qdata
format.
qd_read(file, use_alt_rep = FALSE, validate_checksum=FALSE, nthreads = 1L)
qd_read(file, use_alt_rep = FALSE, validate_checksum=FALSE, nthreads = 1L)
file |
The file name/path. |
use_alt_rep |
Use ALTREP when reading in string data (default |
validate_checksum |
Whether to validate the stored checksum in the file (default |
nthreads |
The number of threads to use when reading data (default: |
The object stored in file
.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns true
Saves an object to disk using the qdata
format.
qd_save(object, file, compress_level = 3L, shuffle = TRUE, warn_unsupported_types=TRUE, nthreads = 1L)
qd_save(object, file, compress_level = 3L, shuffle = TRUE, warn_unsupported_types=TRUE, nthreads = 1L)
object |
The object to save. |
file |
The file name/path. |
compress_level |
The compression level used (default 3). The maximum and minimum possible values depends on the version of ZSTD library used. As of ZSTD 1.5.6 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (default: |
warn_unsupported_types |
Whether to warn when saving an object with an unsupported type (default |
nthreads |
The number of threads to use when compressing data (default: |
No value is returned. The file is written to disk.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qd_save(x, myfile) x2 <- qd_read(myfile) identical(x, x2) # returns true
Serializes an object to a raw vector using the qdata
format.
qd_serialize(object, compress_level = 3L, shuffle = TRUE, warn_unsupported_types = TRUE, nthreads = 1L)
qd_serialize(object, compress_level = 3L, shuffle = TRUE, warn_unsupported_types = TRUE, nthreads = 1L)
object |
The object to save. |
compress_level |
The compression level used (default 3). The maximum and minimum possible values depends on the version of ZSTD library used. As of ZSTD 1.5.6 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (default: |
warn_unsupported_types |
Whether to warn when saving an object with an unsupported type (default |
nthreads |
The number of threads to use when compressing data (default: |
The serialized object as a raw vector.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qd_serialize(x) x2 <- qd_deserialize(xserialized) identical(x, x2) # returns true
Deserializes a raw vector to an object using the qs2
format.
qs_deserialize(input, validate_checksum = FALSE, nthreads = 1L)
qs_deserialize(input, validate_checksum = FALSE, nthreads = 1L)
input |
The raw vector to deserialize. |
validate_checksum |
Whether to validate the stored checksum in the file (default |
nthreads |
The number of threads to use when reading data (default: |
The deserialized object.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns true
Reads an object that was saved to disk in the qs2
format.
qs_read(file, validate_checksum=FALSE, nthreads = 1L)
qs_read(file, validate_checksum=FALSE, nthreads = 1L)
file |
The file name/path. |
validate_checksum |
Whether to validate the stored checksum in the file (default |
nthreads |
The number of threads to use when reading data (default: |
The object stored in file
.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns true
Reads an object in a file serialized to disk using qs_savem()
.
qs_readm(file, env = parent.frame(), ...)
qs_readm(file, env = parent.frame(), ...)
file |
The file name/path. |
env |
The environment where the data should be loaded. Default is the calling environment ( |
... |
additional arguments will be passed to qs_read. |
This function extends qs_read to replicate the functionality of base::load()
to load multiple saved objects into your workspace.
Nothing is explicitly returned, but the function will load the saved objects into the workspace.
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns true
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns true
Saves an object to disk using the qs2
format.
qs_save(object, file, compress_level = 3L, shuffle = TRUE, nthreads = 1L)
qs_save(object, file, compress_level = 3L, shuffle = TRUE, nthreads = 1L)
object |
The object to save. |
file |
The file name/path. |
compress_level |
The compression level used (default 3). The maximum and minimum possible values depends on the version of ZSTD library used. As of ZSTD 1.5.6 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (default: |
nthreads |
The number of threads to use when compressing data (default: |
No value is returned. The file is written to disk.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) x2 <- qs_read(myfile) identical(x, x2) # returns true
Saves (serializes) multiple objects to disk.
qs_savem(...)
qs_savem(...)
... |
Objects to serialize. Named arguments will be passed to |
This function extends qs_save()
to replicate the functionality of base::save()
to save multiple objects. Read them back with qs_readm()
.
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns true
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_savem(x1, x2, file=myfile) rm(x1, x2) qs_readm(myfile) exists('x1') && exists('x2') # returns true
Serializes an object to a raw vector using the qs2
format.
qs_serialize(object, compress_level = 3L, shuffle = TRUE, nthreads = 1L)
qs_serialize(object, compress_level = 3L, shuffle = TRUE, nthreads = 1L)
object |
The object to save. |
compress_level |
The compression level used (default 3). The maximum and minimum possible values depends on the version of ZSTD library used. As of ZSTD 1.5.6 the maximum compression level is 22, and the minimum is -131072. Usually, values in the low positive range offer very good performance in terms of speed and compression. |
shuffle |
Whether to allow byte shuffling when compressing data (default: |
nthreads |
The number of threads to use when compressing data (default: |
The serialized object as a raw vector.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) xserialized <- qs_serialize(x) x2 <- qs_deserialize(xserialized) identical(x, x2) # returns true
Converts a file saved in the qs2
format to the RDS
format.
qs_to_rds(input_file, output_file, compress_level = 6)
qs_to_rds(input_file, output_file, compress_level = 6)
input_file |
The |
output_file |
The |
compress_level |
The gzip compression level to use when writing the RDS file (a value between 0 and 9). |
No value is returned. The converted file is written to disk.
qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) qs_save(x, qs_tmp) qs_to_rds(input_file = qs_tmp, output_file = rds_tmp) x2 <- readRDS(rds_tmp) stopifnot(identical(x, x2))
qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) qs_save(x, qs_tmp) qs_to_rds(input_file = qs_tmp, output_file = rds_tmp) x2 <- readRDS(rds_tmp) stopifnot(identical(x, x2))
Exports the uncompressed binary serialization to a list of raw vectors for both qs2
and qdata
formats.
For testing and exploratory purposes mainly.
qx_dump(file)
qx_dump(file)
file |
A file name/path. |
The uncompressed serialization.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) binary_data <- qx_dump(myfile)
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(state.name, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qs_save(x, myfile) binary_data <- qx_dump(myfile)
Converts a file saved in the RDS
format to the qs2
format.
rds_to_qs(input_file, output_file, compress_level = 3)
rds_to_qs(input_file, output_file, compress_level = 3)
input_file |
The |
output_file |
The |
compress_level |
The zstd compression level to use when writing the |
The shuffle
parameters is currently not supported when converting from RDS
to qs2
.
When reading the resulting qs2
file, validate_checksum
must be set to FALSE
.
No value is returned. The converted file is written to disk.
qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) saveRDS(x, rds_tmp) rds_to_qs(input_file = rds_tmp, output_file = qs_tmp) x2 <- qs_read(qs_tmp, validate_checksum = FALSE) stopifnot(identical(x, x2))
qs_tmp <- tempfile(fileext = ".qs2") rds_tmp <- tempfile(fileext = ".RDS") x <- runif(1e6) saveRDS(x, rds_tmp) rds_to_qs(input_file = rds_tmp, output_file = qs_tmp) x2 <- qs_read(qs_tmp, validate_checksum = FALSE) stopifnot(identical(x, x2))
Data from the International Astronomical Union. An official list of the 336 internationally recognized named stars, updated as of June 1, 2018.
data(starnames)
data(starnames)
A data.frame
with official IAU star names and several properties, such as coordinates.
Naming Stars | International Astronomical Union.
E Mamajek et. al. (2018), WG Triennial Report (2015-2018) - Star Names, Reports on Astronomy, 22 Mar 2018.
data(starnames)
data(starnames)
Calculates 64-bit XXH3 hash
xxhash_raw(data)
xxhash_raw(data)
data |
The data to hash |
The 64-bit hash
x <- as.raw(c(1,2,3)) xxhash_raw(x)
x <- as.raw(c(1,2,3)) xxhash_raw(x)
Exports the compress bound function from the zstd library. Returns the maximum potential compressed size of an object of length size
.
zstd_compress_bound(size)
zstd_compress_bound(size)
size |
An integer size |
maximum compressed size
zstd_compress_bound(100000) zstd_compress_bound(1e9)
zstd_compress_bound(100000) zstd_compress_bound(1e9)
Compresses to a raw vector using the zstd algorithm. Exports the main zstd compression function.
zstd_compress_raw(data, compress_level)
zstd_compress_raw(data, compress_level)
data |
Raw vector to be compressed. |
compress_level |
The compression level used. |
The compressed data as a raw vector.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
Decompresses a zstd compressed raw vector.
zstd_decompress_raw(data)
zstd_decompress_raw(data)
data |
A raw vector to be decompressed. |
The decompressed data as a raw vector.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))