This is version 0.3 of the document, last updated 2020-09-07.

Copyright

Copyright 2017-2020, Schlumberger

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

ZGY File Format

The ZGY file format exists in two variants:

There exists a closed source library able to read and write uncompressed ZGY and read compressed ZGY. This is freely available in binary form. It is delivered as a zip or tar archive containing API documentation, binary libraries, and headers. No source is provided except for a few examples on how to use the API.

Work is in progress on an open source library called OpenZGY that will be able to read and write the same ZGY format that the closed source library does. At least the uncompressed ZGY is planned to be fully supported. Although our legal department probably won't allow us to promise anything more than what you see is what you get.

No decision has been made whether to open source the existing compressed format.

OpenZGY extends the existing "uncompressed" format to optionally apply ZFP compression to some or all individual bricks. Without compressing the file itself. This is why the existing compressed format is being deprecated and might well be left as closed source.

The reference implementation of an OpenZGY reader will be written in Python. Focus will be on writing code that can explain any subtle issues that are not clear enough in the documentation. A C++ version might be implemented later.

Starting fresh is done for a number of reasons. The closed source code is old and has accrued quite a bit of technical debt, obscure apis, and unused features.

Initial tests of the pure Python accessor show that performance is comparable between the native C++ library, the Python wrapper around native C++, and the new pure Python implementation.

Note that it was a single simple test so this is a very rough comparison. The testing case was a single threaded read from a local file which is all the pure Python version can do for the moment. The test was run on a fairly old machine using a 20 GB file of float32 samples. The machine had 12 GB of ram. Each test was run multiple times so large parts of the file would have been in memory.

To test the raw bandwidth of the machine, "dd" was used to copy the zgy file to /dev/null with a 1 MB block size. For the ZGY library, all the full resolution samples were read, requesting 64x64x896 samples (14 MB) at a time and the result was discarded. At the lowest level this still ends up reading 1 MB blocks.

Raw "dd" 110 MB/s Native C++ 102 MB/s Python wrapper 101 MB/s Pure Python 101 MB/s

This file contains the documentation of the uncompressed ZGY format. This applies both to the old ZGY library and OpenZGY.

File layout

The following describes the format of the physical file.

A ZGY file contains seismic samples for a fixed size 3d cube, and meta information such as the cube annotation and location. Samples are stored in bricks of 64*64*64 samples of int8, int16, or float32 data. So, the size of each brick is either 256 KB, 512 KB, or 1 MB depending on the data type. The enum for data type also includes int32, uint8, uint16, uint32, and ibm_float. Support for those types is limited and they may not work or may not be accessible from the api.

The ZGY file also has room for storing a boolean for each vertical trace telling whether this trace has live data or not. This feature is deprecated and should not be used. As long as these booleans (called alpha tiles) are not written out, the space penalty in the ZGY file is negligible.

ZGY automatically stores decimated versions of the data to speed up access. Level of Detail 0 (LOD 0) is the full resolution. The bricks always contain the same number of samples. This means that each low resolution brick is computed from several higher resolution (lower LOD number) bricks.

The uncompressed format version 2 and onwards store data inside each brick in format ZeroOneTwo, with the vertical (last) dimension varying fastest and the inline (first) dimension varying slowest. Uncompressed format version 1 stores sub-bricks of 8*8*8 samples inside each regular brick, both ordered TwoOneZero i.e. the other way around. Compressed data and the order of entries in lookup tables also use TwoOneZero: the vertical (last) dimension varying slowest and the inline (first) dimension varying fastest.

Some of the meta information is important because it must be known before the size and offset of several other sections can be known. Any code that wants to read ZGY files must implement a function that computes the number of tiles and bricks as described below.

Both the number of bricks and the number of LOD levels are calculated from the cube size. Enough levels of detail will be added to make the highest-level fit in a single brick. The number of bricks in LOD 0 is simply the survey size divided by the brick size and rounded up. The number of bricks in LOD n+1 is the number of bricks in LOD n divided by (2,2,2) and rounded up. The total number of bricks is found by summing the number of bricks in LOD 0, 1, 2, etc. up to and including the LOD that only contains a single brick.

The same calculation is used to calculate the number of alpha tiles, except that the number of LOD levels have already been determined by the bricks. And the vertical size is always 1 since the alpha tiles are 2d only. Readers of v2 and v3 need to be aware of how the deprecated alpha tiles (for flagging dead traces) are computed because this affects where the other headers are stored. For versions of ZGY that contain an offset table (v1, and presumably v4) this is not an issue.

The precise physical layout of the file depends on the version, but can be divided into:

Survey location

Notes on gpiline, gpxline, gpx, gpy:

The grid definition in gpiline, gpxline, gpx, and gpy defines an affine transform from inline, crossline to world X, Y.

Since an affine transform is used it is possible to define a coordinate system where the X and Y axes are not precisely perpendicular to each other. It is up to the application reading the files to decide whether to accept such non-orthogonal coordinate systems. When writing files, think long and hard about whether to allow the user to create a non-orthogonal ZGY file. If you do, you might regret it after a few decades of supporting that feature in all your applications that read ZGY.

On write, gpiline and gpxline should contain the 4 corners of the survey in annotation values. This means they are to be trivially computed from orig, inc, and size. The order of the corners should be:

The gpx and gpy arrays should be set to the world (X,Y) coordinates corresponding to those 4 corner points.

On read, trust that the 3 first annotation points and the 3 first world coordinates refer to the same three locations and that those three points don't overlap and are not colinear. Do not trust that the points are in fact corners of the lattice. There is still enough information to reconstruct the lattice and calculate the actual corners, which the reader should do as soon as possible, so only the "correct" corner points are available in the API.

ZGY Uncompressed Format version 2 and 3

All integer data is stored as little-endian. Unlike format 1, only one level of bricking is used. The brick size is explicitly specified. But currently only bricks of 64*64*64 samples will work.

FileHeader Located at the start of the file.
offsetsizetypenameremarks
0 4uint8 magic[4] Always VBS\0 when viewed as a char[4].
4 4uint32 version Current version is 3.
8 end.
OffsetHeader Consecutive, so this is offset 8 from the start of the file.
offsetsizetypenameremarks
0 1uint8paddingWrite as 0, ignore on read.
1 end.
InfoHeader Consecutive, so this is offset 9 from the start of the file.
offsetsizetypenameremarks
012int32 bricksize[3] Brick size. Values other than (64,64,64) will likely not work.
12 1uint8 datatype Type of samples in each brick: int8 = 0, int16 = 2, float32 = 6.
13 8float32 codingrange[2]If datatype is integral (note that int8 and int16 are the only integral types supported), this is the value range samples will be scaled to when read as float. In this case it must be specified on file creation. If datatype is float then this is the value range of the data and should be set automatically when writing the file.
2116uint8 dataid[16] GUID set on file creation.
3716uint8 verid[16] GUID set each time the file is changed.
5316uint8 previd[16] GUID before last change.
* *char* srcname Optional name of this data set. Rarely used.
* *char* srcdesc Optional description of this data set. Rarely used.
69 1uint8 srctype Optional datatype the samples had before being stored in this file.
7012float32 orig[3] First inline, crossline, time/depth. Unline v1 these are now floating point.
8212float32 inc[3] Increment in inline, crossline, vertical directions.
9412int32 size[3] Size in inline, crossline, vertical directions.
10612int32 curorig[3] Unused. Set to (0,0,0) on write and ignore on read.
11812int32 cursize[3] Unused. Set to size on write and ignore on read.
130 8int64 scnt Count of values used to compute statistics.
138 8float64 ssum Sum of all "scnt" values.
146 8float64 sssq Sum of squared "scnt" values.
154 4float32 smin Statistical (computed) minimum value.
158 4float32 smax Statistical (computed) maximum value.
16212float32 srvorig[3] Unused. Set equal to orig on write. Ignore on read.
17412float32 srvsize[3] Unused. Set to inc*size on write. Ignore on read.
186 1uint8 gdef Grid definition type. Set to 3 (enum: "FourPoint") on write. Ignored on read. See notes for a longer explanation.
18716float64 gazim[2] Unused.
20316float64 gbinsz[2] Unused.
21916float32 gpiline[4] Inline component of 4 control points.
23516float32 gpxline[4] Crossline component of 4 control points.
25132float64 gpx[4] X coordinate of 4 control points.
28332float64 gpy[4] Y coordinate of 4 control points.
* *char* hprjsys Free form description of the projection coordinate system. Usually not parseable into a well known CRS. Petrel neither sets nor uses this field. Keep that in mind when reading files exported from Petrel. For files that will be imported into Petrel you will need another way to help the users load the data correctly.
315 1uint8 hdim Horizontal dimension. Unknown = 0, Length = 1, ArcAngle = 2. Few applications support ArcAngle. Petrel neither sets nor uses the unit- and dimension fields. Both horizontal and vertical get left as Unknown, 1.0, and the empty string respectively. Technically this qualifies as a bug.
316 8float64 hunitfactor Multiply by this factor to convert from storage units to SI units. Applies to gpx, gpy.
* *char* hunitname For annotation only. Use hunitfactor, not the name, to convert to or from SI.
324 1uint8 vdim Vertical dimension. Unknown = 0, Depth = 1, SeismicTWT = 1, SeismicOWT = 3.
325 8float64 vunitfactor Multiply by this factor to convert from storage unite to SI units. Applies to orig[2], inc[2].
* *char* vunitname For annotation only. Use vunitfactor, not the name, to convert to or from SI.
333 4uint32 slbufsize Size of the StringList section.
337 end.
StringList Consecutive, so this is offset 346 from the start of the file.
offsetsizetypenameremarks
0varieschar*  The 5 entries in the InfoHeader above that are variable length strings are stored here, to allow the InfoHeader to have a constant size. The strings are all null terminated and stored consecutively. The total size of the StringList section is stored in slbufsize.
Histogram Consecutive. Location in file depends on size of previous entries.
offsetsizetypenameremarks
0 8int64 cnt Total number of samples.
8 4float32 min Center point of first bin.
12 4float32 max Center point of last bin.
162048int64 bin[256] Histogram.
2064 end.
AlphaLup Consecutive. Location in file depends on size of previous entries.
offsetsizetypenameremarks
0variesint64[] 

Offsets into the file for each alpha tile. The size of this section depends on "size" and "bricksize". "bricksize" is implicit (64,64,64) in version 1.

This section should be written as all zeros and ignored on read. Had it been in use, entries would have the same meaning as in BrickLup except that bricksize[2]=1 and valuetype is uint8. With the default bricksize this means that alpha tiles are 4 KB each.

BrickLup Consecutive. Location in file depends on size of previous entries.
offsetsizetypenameremarks
0variesint64[] 

Offsets into the file for each data brick. The size of this section depends on "size".

On write, all offsets should be a multiple of the brick size. On read, misaligned offsets should be tolerated but might result in significantly reduced performance. Version 1 files were usually written misaligned.

Zero means brick does not exist, i.e. was never written. Reading such a brick should return the default value, which is the sample value that after conversion to float is the one closest to zero.

Entries with the most significant bit set signify a brick where all samples have the same value. That value is stored in the least significant byte or bytes of the entry. The value type of the constant is the same as that of regular samples, and the constant is subject to the same conversion when the application requests float data. As with the missing bricks, these constant-value bricks do not take up space in the file apart from the 8-bit entry in the lookup table.

An entry of 1 is treated the same as 0x8000000000000000, i.e. constant value zero before conversion to float. New files version 3 should write 0x8000000000000000 instead of 1, but both alternatives should be recognized on read.

Any other entries are used as the file offset to the start of the brick. The size of the brick is given by bricksize * sizeof(valuetype).

ZGY Uncompressed Format version 1

Most numeric data is stored as little-endian, but 64 bit integers are stored as two 32-bit little-endian integers with the most significant half first. So, these are half big-endian, half little-endian. 64-bit integers are used in a couple of discrete properties and in all file offsets.

Two levels of bricking are used in version 1. The primary brick size is 64 in each direction. The data inside each brick is further subdivided into 8*8(*8) bricks.

FileHeader Located at the start of the file.
offsetsizetypenameremarks
0 4uint8 magic[4] Always VBS\0 when viewed as a char[4].
4 4uint32 version In this case, 1.
8 end.
OffsetHeader Consecutive, so this is offset 8 from the start of the file.
offsetsizetypenameremarks
0 8int64 infoheader_offThe offsets are stored as two little-endian 32-bit integers, with the most significant half first.
8 8int64 alphalup_off 
16 8int64 bricklup_off 
24 8int64 histogram_off 
32 end.
InfoHeader Location in the file is specified in the OffsetHeader.
offsetsizetypenameremarks
012int32 size[3] Integer size in inline, crossline, vertical directions.
1212int32 orig[3] First inline, crossline, time/depth. Only integral values allowed.
2412int32 inc[3] Integer increment in inline, crossline, vertical directions.
3612float32 incfactor[3] Unused. Write as (1,1,1), ignore on read.
4816int32 gpiline[4] Inline component of 4 control points.
6416int32 gpxline[4] Crossline component of 4 control points.
8032float64 gpx[4] X coordinate of 4 control points.
11232float64 gpy[4] Y coordinate of 4 control points.
144 1uint8 datatype Type of samples in each brick: int8 = 0, int16 = 2, float32 = 6.
145 1uint8 coordtype Coordinate type: unknown = 0, meters = 1, feet = 2, degrees*3600 = 3, degrees = 4, DMS = 5.
146 end.
Histogram Location in the file is specified in the OffsetHeader.
offsetsizetypenameremarks
0 4float32 max Center point of first bin.
4 4float32 min Center point of last bin.
81024uint32 bin[256] Histogram.
1032 end.
AlphaLup Location in the file is specified in the OffsetHeader.
offsetsizetypenameremarks
0variesint64[]  As AlphaLup in version 2 and 3, except that entries are stored using a mix of big- and little-endian as described above.
BrickLup Location in the file is specified in the OffsetHeader.
offsetsizetypenameremarks
0variesint64[]  As BrickLup in version 2 and 3, except that entries are stored using a mix of big- and little-endian as described above.

The deprecated DMS format for coordtype is degrees, minutes, seconds encoded in a decimal format, so e.g. 3°12'59" becomes 31259. Add one second of arc to that number and you get 3°13'00" or 31300 (i.e. not consecutive numbers).

LOD generation

The Zgy library is responsible for generating the smaller, subsampled data at LOD>0. Spatially, each sample in LOD N maps to 2x2x2=8 samples of LOD N-1. The algorithms used for subsampling are non-trivial.

LOD level 1 generation discards 3 of the 4 vertical traces that form the input. In the vertical direction a low pass filter with length 10 is applied, before discarding every other sample of its result.

LOD level 2 and above used weighted averaging of all 8 source samples. Sample values that are common in the cube (as reported by the histogram) are presumably less interesting, so they receive less weight.

An unfortunate effect of these algorithms is that the LOD>1 bricks cannot be generated until all the LOD=0 bricks have been written, since the histogram of the entire file is needed to produce LOD=2 and above. This is not an issue if the LOD bricks are generated in a separate pass after all full resolution data has been written.

Other rules

Application code is allowed to read from or write to the padding area between the survey edge and the rest of the brick. This is even encouraged because writing full bricks can be more efficient. It is unspecified whether the padding samples written in this manner are retained when the data is read back.

Figure showing physical layout