Base class for OpenZGY compression plug-ins.
If anybody wants to add additional compression algorithms it
is recommended but not required to use this base class. See
CompressFactoryImpl.register{Compressor,Decompressor} for how to
use plain functors (C++) or callables (Python) instead.
This class performs triple duty as it handles both compression
and decompression static methods (need not have been together)
and an instance of the class can be used a compressor functor
if a lambda is too limiting. To invoke the methods:
MyCompressPlugin.factory(...)(data)
MyCompressPlugin.compress(data, ...) (NOT recommended)
MyCompressPlugin.decompress(cdata,status,shape,file_dtype,user_dtype)
The following will also work but should only be used for very simple
compressors that have no parameters. In the first case MyCompressPlugin
won't have the option to return None for certain parameters, and in the
second case handling a variable arguent list becomes trickier.
To register this class:
CompressFactoryImpl.registerCompressor("My",MyCompressPlugin.factory)
CompressFactoryImpl.registerDecompressor("My",MyCompressPlugin.decompress)
To use the compression part from client code:
compressor = ZgyCompressFactory("My", ...)
| def openzgy.impl.compress.CompressPlugin.decompress |
( |
|
cdata, |
|
|
|
status, |
|
|
|
shape, |
|
|
|
file_dtype, |
|
|
|
user_dtype |
|
) |
| |
|
static |
This is an abstract method.
Decompress bytes or similar into a numpy.ndarray.
Arguments:
cdata -- bytes or bytes-like compressed data,
possibly with trailing garbage.
status -- Currently always BrickStatus.Compressed,
in the future the status might be used to
distinguish between different compression
algorithms instead of relying on magic numbers.
shape -- Rank and size of the result in case this is
not encoded by the compression algorithm.
file_dtype -- Original value type before compression,
in case the decompressor cannot figure it out.
This will exactly match the dtype of the
data buffer passed to the compressor.
user_dtype -- Required value type of returned array.
Passing an uncompressed brick to this function is an error.
We don't have enough context to handle uncompressed bricks
that might require byteswapping and fix for legacy quirks.
Also cannot handle constant bricks, missing bricks, etc.
The reason user_dtype is needed is to avoid additional
quantization noise when the user requests integer compressed data
to be read as float. the decompressor might need to convert
float data to int, only to have it converted back to float later.
Current assumptions made of all candidate algorithms:
- The compressed data stream may have trailing garbage;
this will be silently ignored by the decompressor.
- The compressed data stream will never be longer than
the uncompressed data. This needs to be enforced by
the compressor. The compressor is allowed to give up
and tell the caller to not compress this brick.
- The reason for the two assumptions above is an
implementation detail; the reported size of a
compressed brick is not completely reliable.
This might change in the next version
- The compressed data stream must start with a magic
number so the decompressor can figure out whether
this is the correct algorithm to use.
If the assumptions cannot be met, the compressor / decompressor
for this particular type could be modified to add an extra header
with the compressed size and a magic number. Or we might add a
(size, algorithm number) header to every compressed block to
relieve the specific compressor / decompressor from worrying
about this. Or the brick status could be used to encode which
algorithm was used, picked up from the MSB of the lup entry.
Which would also require the compressor to return both the
actual compressed data and the code to identify the decompressor.
That is the main reason we are also passed the "status" arg.
Caveat: If adding an extra header, keep in mind that this header
must be included when checking that the compressed stream is not
too big.
Reimplemented in openzgy.impl.zfp_compress.ZfpCompressPlugin.