|
OpenZGY/C++ API and Internals (ALPHA)
Access seismic data stored in ZGY format.
|
#include <genlod.h>
Public Member Functions | |
| GenLodImpl (const index3_t &size, const index3_t &bricksize, RawDataType dtype, const std::array< double, 2 > &histogram_range, std::int32_t nlods_in, const std::vector< LodAlgorithm > &decimation, const std::shared_ptr< HistogramData > &histogram, double defaultvalue, const std::function< bool(std::int64_t, std::int64_t)> &progress, bool verbose) | |
| std::tuple< std::shared_ptr< StatisticData >, std::shared_ptr< HistogramData > > | call () |
Public Member Functions inherited from InternalZGY::GenLodBase | |
| GenLodBase (const index3_t &size, const index3_t &bricksize, RawDataType dtype, const std::array< double, 2 > &histogram_range, std::int32_t nlods_in, const std::vector< LodAlgorithm > &decimation, const std::shared_ptr< HistogramData > &histogram, double defaultvalue, const std::function< bool(std::int64_t, std::int64_t)> &progress, bool verbose) | |
Protected Member Functions | |
| template<typename T > | |
| void | _accumulateT (const std::shared_ptr< const DataBuffer > &data_in) |
| void | _accumulate (const std::shared_ptr< const DataBuffer > &data) |
| std::shared_ptr< DataBuffer > | _calculate (const index3_t &readpos_in, std::int32_t readlod) |
| std::shared_ptr< DataBuffer > | _decimate (const std::shared_ptr< const DataBuffer > &data, std::int64_t lod) |
| std::shared_ptr< DataBuffer > | _paste1 (const std::shared_ptr< DataBuffer > &result, const std::shared_ptr< const DataBuffer > &more, std::int64_t ioff, std::int64_t joff) |
| std::shared_ptr< const DataBuffer > | _paste4 (const std::shared_ptr< const DataBuffer > &d00, const std::shared_ptr< const DataBuffer > &d01, const std::shared_ptr< const DataBuffer > &d10, const std::shared_ptr< const DataBuffer > &d11) |
Protected Member Functions inherited from InternalZGY::GenLodBase | |
| virtual std::shared_ptr< DataBuffer > | _read (std::int32_t lod, const index3_t &pos, const index3_t &size) |
| virtual void | _write (std::int32_t lod, const index3_t &pos, const std::shared_ptr< const DataBuffer > &data) |
| virtual void | _savestats () |
| void | _report (const DataBuffer *data) |
| std::string | _prefix (std::int32_t lod) |
Static Protected Member Functions | |
| static std::array< double, 2 > | suggestHistogramRange (const std::array< double, 2 > &writtenrange, RawDataType dtype) |
Static Protected Member Functions inherited from InternalZGY::GenLodBase | |
| static std::string | _format_result (const std::shared_ptr< DataBuffer > &data) |
Additional Inherited Members | |
Protected Attributes inherited from InternalZGY::GenLodBase | |
| std::int32_t | _nlods |
| std::int64_t | _total |
| std::int64_t | _done |
| index3_t | _surveysize |
| index3_t | _bricksize |
| RawDataType | _dtype |
| std::array< double, 2 > | _histogram_range |
| std::vector< LodAlgorithm > | _decimation |
| std::shared_ptr< HistogramData > | _wa_histogram |
| double | _wa_defaultstorage |
| std::function< bool(std::int64_t, std::int64_t)> | _progress |
| bool | _verbose |
Abstract class for generating low resolution bricks, histogram, and statistics. The inherited methods for I/O are still stubs.
| InternalZGY::GenLodImpl::GenLodImpl | ( | const index3_t & | size, |
| const index3_t & | bricksize, | ||
| RawDataType | dtype, | ||
| const std::array< double, 2 > & | histogram_range, | ||
| std::int32_t | nlods_in, | ||
| const std::vector< LodAlgorithm > & | decimation, | ||
| const std::shared_ptr< HistogramData > & | histogram, | ||
| double | defaultstorage, | ||
| const std::function< bool(std::int64_t, std::int64_t)> & | progress, | ||
| bool | verbose | ||
| ) |
Abstract class for generating low resolution bricks, histogram, and statistics. The inherited methods for I/O are still stubs. See doc/lowres.html for details. This class implements plan C or D which is good for compressed data and acceptable for uncompressed. The ordering of low resolution bricks in the file will not be optimal. For optimal ordering but working only for uncompressed data consider implementing plan B in addition to the plan C already implemented. The implementation can be used as-is in a unit test with mocked I/O.
|
protected |
Keep a running tally of statistics and histogram.
|
protected |
Read data from the specified (readpos, readlod) and store it back. The function will itself decide how much to read. But with several constraints. Always read full traces. Size in i and j needs to be 2* bs * 2^N where bs is the file's brick size in that dimension, Clipped to the survey boundaries. This might give an empty result.
TODO-Performance: Allow the application to configure how much memory we are allowed to use. Increase the block size accordingly. Larger bricks might help the bulk layer to become more efficient.
When readlod is 0 and the data was read from the ZGY file then the writing part is skipped. Since the data is obviously there already.
In addition to reading and writing at the readlod level, the method will compute a single decimated buffer at readlod+1 and return it. As with the read/write the buffer might be smaller at the survey edge. Note that the caller is responsible for storing the decimated data.
Full resolution data (lod 0) will be read from file (plan C) or the application (plan D). Low resolution is computed by a recursive call to this function (plans C and D) or by reading the file (plan B). Note that currently only plan C is fully implemented.
For plans B and C a single call needs to be made to read the brick (there is by definition just one) at the highest level of detail. This will end up computing all possible low resolution bricks and storing them. For plan B the caller must iterate.
The function is also responsible for collecting statisics and histogram data. Note that some of the decimation algorithms use the histogram of the entire file. Ideally the histogram of the entire file should be available before decimation starts but that is impractical. At least make sure the histogram updfate is done early enough and the decimation late enough that the chunk of data being decimated has already been added to the histogram.
|
protected |
Return a decimated version of the input buffer with half the size (rounded up) in each dimension. In total the result will be ~1/8 the size of the input.
Lod refers to the level being generated. Must be >= 1.
|
protected |
See _paste4() for details.
|
protected |
Combine 4 buffers into one. Input buffers may be None (do not paste) or ScalarBuffer (paste a constant value). If all not-None buffers are just scalars then the return from this function will also be a scalar. d01 adds more data in the J direction, so it starts at i=0, j>0 in the target. Similarly d10 adds more in the J direction. And d11 in the diagonal.
Performance note: It is in theory possible to avoid some buffer copies, this one in particular, by passing our 4 or 8 buffers to the decimation algorithm instead of a combined buffer. The algorithms remain just as efficient. BUT if sizes can vary or bricks can be missing or number of bricks differs in level 1 because we read directly from the file then things can get really complicated really fast.
| std::tuple< std::shared_ptr< StatisticData >, std::shared_ptr< HistogramData > > InternalZGY::GenLodImpl::call | ( | ) |
Generate and store statistics, histogram, and all low resolution bricks. Works for plans C and D. If we also need an implementation of plan B then this method wold need to iterate over all bricks and lods, and _calculate would not make any recursive calls.
TODO-Performance: If the bulk layer is made thread safe for writing it is possible to do parallel processing at this high level. E.g. do this for just one level: Split into 4 sub-tasks that each execute in one thread. Special handling will be needed for the simgle highest-level brick. All 4 threads need to be joined before that one can be done. With one level here and with 4 threads used in _calculate() this means we will be reading with 16 threads. But there are serious caveats:
|
staticprotected |
Choose the histogram range to use.
The range of all sample values seen until now, i.e.everything written, is passed in. The possibility to overwrite data or (future) append to existing file makes this not accurate but still probably good enough.
If writtenrange is invalid this means that no finite samples have been written. Choose an arbitrary range in that case instead of ending up with a lot of obscure corner cases. But, TODO-Low the range needs to include defaultvalue as there might be unwritten bricks.
TODO-Low: If data is being appended, we will still re-compute the entire histogram. Include the current value range stored on file in case the update only consists of smaller values. Problem is, the first write might not have bothered to finalize and thus did not save this information. I probably need to "finalize" with an empty histogram. Also bear in mind that the stored range will be in user and not storate values.
Widen the histogram range for integral types to cover the entire possible range, not just the sample range written. For int8/uint8 this is a no-brainer because having less than one possible sample value in each bin inevitably leads to some empty bins even for a completely smooth distribution of the input. For int16/uint16 it is still workable but it had been better to use a compromise: Use a narrower range br narrowed using an integer factor. Or sidestep the issue by internally using a 64k histogram that will get trimmed down to 256 entries later.
TODO-Low should the suggested range for float data include defaultvalue? Which for floats is always zero? Technically the code should keep track of whether all bricks have been explicitly written and if not include zero. It doesn't. This "bug" is probably in the "extreme nitpicking" category.
1.8.17