Last updated

Introduction

In this tutorial we will explain:

Prerequisites

Required Python packages

Before to start to write bulk data through Wellbore DDMS API's, you will need to install the Python packages below:

  • The pandas module and its Pandas.Dataframe json format to structure log bulk data to be written to the Wellbore DDMS.
  • The pyarrow module to transform Pandas.Dataframe to parquet file through the pyarrow engine.
  • The httpx module that allows to post request to the Wellbore DDMS.
# Prerequisite to run this notebook
!python -m pip install pip --upgrade
!pip install pandas numpy httpx pyarrow

Authorization

For any call to Wellbore DDMS API's you need to pass into the header of the request a valid bearer token. This token can be obtained from any API catalog on the developer portal. You will need first to request a developer base subscription. Then from the developer base subscription pick any API and execute it. A valid bearer token is returns in the Curl section of the response. Copy this token value and assign it to the TOKEN variable below.

TOKEN = '' # Paste here the token without the bearer prefix

Utility methods

Helper functions used in the different sample scripts of this tutorial.
from typing import List
import httpx
import pandas as pd
import numpy as np
import io
from IPython.display import display_html, display, HTML
from itertools import chain, cycle

def generate_df_typed(columns, index):
    def gen_values(col_name, size):
        if col_name.startswith('float'):
            return np.random.random_sample(size=size)
        if col_name.startswith('str'):
            return [f'string_value_{i}' for i in range(size)]
        if col_name.startswith('bool'):
            return np.random.choice(a=[False, True], size=size) 
        if col_name.startswith('date'):
            return (np.datetime4('2021-01-01') + days for days in range(size))
        return np.random.randint(-100, 1000, size=size)

    df = pd.DataFrame({c: gen_values(c, len(index))
                      for c in columns}, index=index)
    return df

def multi_table(table_list):
    '''Acceps a list of IpyTable objects and returns a table which contains each IpyTable in a cell'''
    return HTML(
        '<table><tr style="background-color:white;">' + 
        ''.join(['<td>' + table._repr_html_() + '</td>' for table in table_list]) +
        '</tr></table>'
    )

def gen_color(color):
    def fct(val=None):
         return f'color: {color}'
    return fct

def display_operation(before, sent, after):
    colors = ['blue', 'green', 'orange', 'red']
    color_fct = [gen_color(c) for c in colors]
    sent_st = [sent[i].style.set_caption(f'chunk {i+1} sent').applymap(color_fct[i]) for i in range(len(sent))]
    def color_output(s):
        res = []
        for r in s.index:
            c = ''
            for i in range(len(sent)):
                if s.name in sent[i] and int(r) in sent[i][s.name]:
                    c = color_fct[i]()#f'color: {colors[i]}'
            res.append(c)
        return res

    margin = '65'
    after_st = after.style.set_table_attributes(f"style='margin-left:{margin}px'").apply(color_output).highlight_null(null_color='lightyellow').set_caption('Final data - After session commit')   
    display(multi_table([before.style.set_table_attributes(f"style='margin-right:{margin}px'").set_caption('Initial data - Before session'), *sent_st, after_st]))
    
def display_side_by_side(dfs:list, captions:list):
    """Display tables side by side to save vertical space
    Input:
        dfs: list of pandas.DataFrame
        captions: list of table captions
    """
    output = ""
    combined = dict(zip(captions, dfs))
    for caption, df in combined.items():
        output += df.style.set_table_attributes("style='display:inline'").set_caption(caption)._repr_html_()
        output += "\xa0\xa0\xa0"
    display(HTML(output))
    
def generate_df(columns: List[str], index):
    nbrows = len(index)
    df = pd.DataFrame(
        np.random.randint(-100, 1000, size=(nbrows, len(columns))), index=index)
    df.columns= columns
    return df


def print_response(resp):
    print(f'{resp.request.method} : {resp.url} -> {resp.status_code}')
    if resp.status_code != httpx.codes.OK:
        display(resp.content)

        
def create_df_from_response(response):
    """Returns a dataframe created from the WellLog bulk data response
    Input:
        response: a httpx.response object
    Output:
        dataframe: a pandas.dataframe object
    """
    content_type = response.headers.get('content-type')
    
    if content_type == 'application/json':
        return pd.DataFrame.from_dict(response.json())
    
    elif content_type == 'application/x-parquet':
        f = io.BytesIO(response.content)
        f.seek(0)
        return pd.read_parquet(f)
    
    raise ValueError(f"Unknown content-type: '{content_type}'")
    
def display_previous_and_current_well_log_data_versions(record_id):
    """Display the previous and current WellLog data versions for a given record id and highlight differences between them.
    Input:
        record_id: a WellLog record id
    """
    # list record version
    results_response = client.get(f'{welllog_dms_url}/{record_id}/versions')
    wellLog_versions_response = results_response.json()
    versions = wellLog_versions_response['versions']
    
    is_previous_results = False
    is_current_results = False
    if len(versions) >= 2:
        previous_version_id = versions[len(versions)-2]
        curl = f'{welllog_dms_url}/{record_id}/versions/{previous_version_id}/data'
        results_response = client.get(curl)
        if results_response.status_code == 200:
            previous_results = create_df_from_response(results_response)
            is_previous_results = True
    
        current_version_id = versions[len(versions)-1]
        curl = f'{welllog_dms_url}/{record_id}/versions/{current_version_id}/data'
        results_response = client.get(curl)
        if results_response.status_code == 200:
            current_results = create_df_from_response(results_response)
            is_current_results = True
    
    colors = ['blue', 'red']
    color_fct = [gen_color(c) for c in colors]
    def color_output(s):
        res = []        
        for r in s.index:
            c = ''
            if s.name in previous_results and int(r) in previous_results[s.name]:
                c = color_fct[0]()
            else:
                c = color_fct[1]()
            res.append(c)
        return res
    
    margin = '65'
    tables = []
    if is_previous_results:
        previous_results_st = previous_results.style.set_table_attributes(f"style='margin-left:{margin}px'").highlight_null(null_color='lightyellow').set_caption('Previous WellLog data version').applymap(color_fct[0])  
        tables.append(previous_results_st)
        
    if is_current_results:
        if is_previous_results:
            current_results_st = current_results.style.set_table_attributes(f"style='margin-left:{margin}px'").apply(color_output).highlight_null(null_color='lightyellow').set_caption('Current WellLog data version with data chunks added in red')
            tables.append(current_results_st)
        else:
            current_results_st = current_results.style.set_table_attributes(f"style='margin-left:{margin}px'").highlight_null(null_color='lightyellow').set_caption('Current WellLog data version') 
            tables.append(current_results_st)
        
    display(multi_table(tables))

Settings

Several settings as the base url end-point and the data partition id to create a WellLog to the Wellbore DDMS. Please change those settings accordingly to the environment settings that you want to target.

base_url = "" # set a base URL value
data_partition_id = "" # set a data partition id
legal_tag = "" # set a valid legal tag in the data partition 
acl_domain = "" # set an Access Control Lists (ACL) domain

welllog_dms_url = f'{base_url}/api/os-wellbore-ddms/ddms/v3/welllogs'

client = httpx.Client(verify=False,
    headers={
        "data-partition-id": f"{data_partition_id}",
        "Authorization": f"Bearer {TOKEN}",
    },
    timeout=120
)

# Create a new WellLog. Here is a fake body just to illustrate the API use
record = {
    "kind": "osdu:wks:work-product-component--WellLog:1.0.0",
    "acl": {
        "viewers": [f"data.default.viewers@{data_partition_id}.{acl_domain}"],
        "owners": [f"data.default.owners@{data_partition_id}.{acl_domain}"]
      },
    "legal": {
        "legaltags": [f"{legal_tag}"],
        "otherRelevantDataCountries": ["US"],
    },
    "data": {""
        "WellboreID": "namespace:master-data--Wellbore:SomeUniqueWellboreID:",
        "Curves": [
            {
                "CurveID": "MD",
            },
            {
                "CurveID": "X",
            }
        ]
    },
    "version" : 0
}

Create a WellLog record

The script below is creating a WellLog record that is used in this tutorial to demonstrate how to write WellLog bulk data to the Wellbore DDMS.

response = client.post(welllog_dms_url, json=[record])
print_response(response)
record_id = response.json()["recordIds"][0]
record_id

Write bulk data - all at once

Each time that data are written to the WellLog, a new version is created to the Wellbore DDMS. This is true when writting the entire bulk data at once or even by chunks (cover in a next section of this tutorial). So when writting all bulk data at once, the payload is expected to contain the entire bulk data that replaces the previous bulk version by creating a new version. This new bulk version becomes the latest one and the current version that is returned by the GET WellLog bulk data API for the given record id.

The Wellbore DDMS bulk data API supports both Parquet and JSON formats. In order to target one of this format the 'Content-Type' must be set accordingly in the headers of the HTTP POST request. Wellbore DDMS API supports HTTP chunked encoding as well.

First of all let's generate a Pandas.Dataframe through the code below with 2 columns and 5 rows.

generated_dataframe = generate_df(['COLUMN_MD', 'COLUMN_X'], range(5))
generated_dataframe
COLUMN_MDCOLUMN_X
0986712
1311348
2-27339
3230191
4162740

All at once - Parquet

Sending the whole dataframe to the WellLog bulk data.

data_to_send_parquet = generated_dataframe.to_parquet(path=None, engine="pyarrow")
headers = { 'content-type': 'application/x-parquet'}

print_response(client.post(f'{welllog_dms_url}/{record_id}/data', data=data_to_send_parquet, headers=headers))

All at once - JSON

With the JSON format the orient parameter has to be set accordingly to the Pandas.Dataframe orientation. This orient value can be passed through the params argument of the HTTP POST request. Supported orient values are split and columns. The default orient value is set to split.

Here are examples of the same Pandas.Dataframe (5 rows and 2 columns) with different orientation:

split: {"columns":["COLUMN_MD","COLUMN_X"],"index":[0,1,2,3,4],"data":[[0.0,1001],[0.5,1002],[1.0,1003],[1.5,1004],[2.0,1005]]}

columns: {"COLUMN_MD":{"0":0.0,"1":0.5,"2":1.0,"3":1.5,"4":2.0},"COLUMN_X":{"0":1001,"1":1002,"2":1003,"3":1004,"4":1005}}

data_to_send_json = {
    'index': [0, 1, 2, 3, 4],
    'columns': ['COLUMN_MD', 'COLUMN_X'],
    'data': [[265, 845], [92, 246], [804, 268], [645, 877], [-20, -28]]
}

params = {'orient':'split'}
print_response(client.post(f'{welllog_dms_url}/{record_id}/data', params=params, json=data_to_send_json))

Write bulk data - by chunk

In order to write WellLog bulk data by chunks to the Wellbore DDMS you have to follow those 3 steps:

  1. Create a WellLog session - POST /ddms/v3/welllogs/{record_id}/sessions
  2. Send data by chunk in the session - POST /ddms/v3/welllogs/{record_id}/sessions/{session_id}/data
  3. Commit the session once all chunks are sent - PATCH /ddms/v3/welllogs/{welllog_id}/sessions/{session_id}

In step 3 you can also update the session or abandon. This is controlled by the state attribute that is passed in the JSON of the PATCH HTTP session API.

{ "state": "commit", "abandon" or "update" }

Flow to send json

Open a new session > Send json chunks > Commit the session

Session mode: update or overwrite

A session can be created with two different modes:

  • update: existing data in previous WellLog version is merged with the data sent during the session when the session is committed.
  • overwrite: existing data in previous WellLog version is ignored, the final result only contains data sent during the session when the session is committed. In this case the only way to retrieve the previous data is querying the previous WellLog version.
SESSION_MODE = 'update' # 'update' | 'overwrite'

Add data by rows

In the sample script below the WellLog data is ingested by chunk of row data. In the same session it is possible to liberate WellLog data with both JSON and Parquet formats as shown below:

# Create a session
create_session_response = client.post(f'{welllog_dms_url}/{record_id}/sessions', json={'mode': SESSION_MODE})

print_response(create_session_response)
session_data = create_session_response.json()
session_id = session_data['id']
print(f"Session created: {session_data['state']} with id {session_id}\n")
                               
# append first chunk - JSON
chunk_1 = generate_df(['COLUMN_MD', 'COLUMN_X'], range(5,10))
response_chunk_1 = client.post(f'{welllog_dms_url}/{record_id}/sessions/{session_id}/data', json=chunk_1.to_dict(orient='split'))
print_response(response_chunk_1)

# append second chunk - JSON
chunk_2 = generate_df(['COLUMN_MD', 'COLUMN_X'], range(10,15))
response_chunk_2 = client.post(f'{welllog_dms_url}/{record_id}/sessions/{session_id}/data', json=chunk_2.to_dict(orient='split'))
print_response(response_chunk_2)

Once the whole WellLog data has been sent through the session, then the session needs to be committed using a session PATCH API call with the 'state' attribute sets to 'commit' value.

# Commit session
commit_session_response = client.patch(f'{welllog_dms_url}/{record_id}/sessions/{session_id}', json={'state': 'commit'})

print_response(commit_session_response)
session = commit_session_response.json()
print('Session after commit =', session['state'])

Or the session can be abandonned calling the session PATCH API with the 'state' attribute sets to 'abandon' value.

# OR else, ABANDON session
abandon_session_response = client.patch(f'{welllog_dms_url}/{record_id}/sessions/{session_id}', json={'state': 'abandon'})
print_response(abandon_session_response)
if abandon_session_response.status_code == httpx.codes.OK:
    print('Session after commit =', abandon_session_response.json()['state'])

Flow to send parquet

Open a new session > Send parquet chunks > Commit the session

SESSION_MODE = 'update'
# Create a session to send parquet
create_session_response = client.post(f'{wellbore_dms_url}/{record_id}/sessions', json={'mode': SESSION_MODE})

print_response(create_session_response)
session_data = create_session_response.json()
session_id = session_data['id']
print(f"Session created: {session_data['state']} with id {session_id}\n")
# append first chunk - PARQUET
chunk_3 = generate_df(['COLUMN_MD', 'COLUMN_X'], range(15,20))
headers = {'content-type': 'application/x-parquet'}
response_chunk_3 = client.post(f'{wellbore_dms_url}/{record_id}/sessions/{session_id}/data', data=chunk_3.to_parquet(engine="pyarrow"), headers=headers)
print_response(response_chunk_3)
# append second chunk - PARQUET
chunk_4 = generate_df(['COLUMN_MD', 'COLUMN_X'], range(20,25))
headers = {'content-type': 'application/x-parquet'}
response_chunk_4 = client.post(f'{wellbore_dms_url}/{record_id}/sessions/{session_id}/data', data=chunk_4.to_parquet(engine="pyarrow"), headers=headers)
print_response(response_chunk_4)
# commit session for parquet
print_response(client.patch(f'{wellbore_dms_url}/{record_id}/sessions/{session_id}', json={'state': 'commit'}))

The code below shows initial WellLog data before the session and chunks by rows inserted to the final WellLog data version after the session has been committed.

# Display result
results_response = client.get(f'{welllog_dms_url}/{record_id}/data')
results_cols_md_x = create_df_from_response(results_response) 
display_operation(generated_dataframe, [chunk_1, chunk_2, chunk_3], results_cols_md_x)
Initial data - Before session
COLUMN_MDCOLUMN_X
0957190
1649907
2598697
33968
457297
chunk 1 sent
COLUMN_MDCOLUMN_X
546295
6275946
7-79965
81745
9848344
chunk 2 sent
COLUMN_MDCOLUMN_X
10252929
11390629
12449986
13-34400
14607272
chunk 3 sent
COLUMN_MDCOLUMN_X
15390915
16-73368
17277-21
18543-78
1975494
chunk 4 sent
COLUMN_MDCOLUMN_X
20-8227
21431933
22318465
23-3593
24256130
Final data - After session commit
COLUMN_MDCOLUMN_X
0265845
192246
2804268
3645877
4-20-28
546295
6275946
7-79965
81745
9848344
10252929
11390629
12449986
13-34400
14607272
15390915
16-73368
17277-21
18543-78
1975494
20-8227
21431933
22318465
23-3593
24256130

It is possible to get access to the exhaustive list of versions created for a given WellLog id (GET /ddms/v3/welllogs/{welllogid}/versions). And then access the WellLog data for a given version (GET /ddms/v3/welllogs/{welllogid}/versions/{version}/data). This is what the function below is doing reading WellLog data of the previous and current version and highlighting differences between them. Differences when sending WellLog data in a session with update or overwrite mode is clearly illustrated through WellLog data previous and current versions returned by the function.

display_previous_and_current_well_log_data_versions(record_id)
Initial data - Before session
COLUMN_MDCOLUMN_X
0265845
192246
2804268
3645877
4-20-28
5-29832
6-15107
7339212
8823240
9-97349
1018389
11194276
12-7-7
13446829
1432706
15914740
16593279
17304-57
18697145
19775247
chunk 1 sent
COLUMN_Y
5192
6816
761
8658
9104
10704
11681
12393
13329
14402
15418
16-9
17857
18845
1978
20484
21384
22658
23622
24459
chunk 2 sent
COLUMN_Z
10141
11478
1272
13476
14434
Final data - After session commit
COLUMN_MDCOLUMN_XCOLUMN_YCOLUMN_Z
0265.000000845.000000nannan
192.000000246.000000nannan
2804.000000268.000000nannan
3645.000000877.000000nannan
4-20.000000-28.000000nannan
5-29.000000832.000000192.000000nan
6-15.000000107.000000816.000000nan
7339.000000212.00000061.000000nan
8823.000000240.000000658.000000nan
9-97.000000349.000000104.000000nan
10183.00000089.000000704.000000141.000000
11194.000000276.000000681.000000478.000000
12-7.000000-7.000000393.00000072.000000
13446.000000829.000000329.000000476.000000
1432.000000706.000000402.000000434.000000
15914.000000740.000000418.000000nan
16593.000000279.000000-9.000000nan
17304.000000-57.000000857.000000nan
18697.000000145.000000845.000000nan
19775.000000247.00000078.000000nan
20nannan484.000000nan
21nannan384.000000nan
22nannan658.000000nan
23nannan622.000000nan
24nannan459.000000nan

The function below shows the differences between the current WellLog data version with new columns added by chunk and the previous version of the WellLog data.

display_previous_and_current_well_log_data_versions(record_id)
Previous WellLog data version
COLUMN_MDCOLUMN_XCOLUMN_YCOLUMN_Z
0265.000000845.000000nannan
192.000000246.000000nannan
2804.000000268.000000nannan
3645.000000877.000000nannan
4-20.000000-28.000000nannan
5-29.000000832.000000192.000000nan
6-15.000000107.000000816.000000nan
7339.000000212.00000061.000000nan
8823.000000240.000000658.000000nan
9-97.000000349.000000104.000000nan
10183.00000089.000000704.000000141.000000
11194.000000276.000000681.000000478.000000
12-7.000000-7.000000393.00000072.000000
13446.000000829.000000329.000000476.000000
1432.000000706.000000402.000000434.000000
15914.000000740.000000418.000000nan
16593.000000279.000000-9.000000nan
17304.000000-57.000000857.000000nan
18697.000000145.000000845.000000nan
19775.000000247.00000078.000000nan
20nannan484.000000nan
21nannan384.000000nan
22nannan658.000000nan
23nannan622.000000nan
24nannan459.000000nan
Current WellLog data version with data chunks added in red
COLUMN_MDCOLUMN_XCOLUMN_YCOLUMN_Z
0614964108.000000nan
1887155979.000000nan
2865179533.000000nan
3343167235.000000nan
4212100497.000000nan
5-52-98608.000000nan
6738573781.000000nan
7151138646.000000nan
8-21378157.000000nan
9178266895.000000nan
10172596705.000000141.000000
11521618873.000000478.000000
12592832298.00000072.000000
13560831-82.000000476.000000
14926179484.000000434.000000
15901486446.000000nan
16610472456.000000nan
17587325776.000000nan
18463653208.000000nan
199923236.000000nan
20138460795.000000nan
21715362760.000000nan
22590-91160.000000nan
23642-18667.000000nan
2467954447.000000nan

Add array data by chunk to a WellLog

As prerequisite a new WellLog record is created below to store array data. The WellLog is created with a MD column storing reference values and single WellLog values stored in a column X.

# Create new record for 2D curves
record_2d_response = client.post(welllog_dms_url, json=[record])
print_response(record_2d_response)
record_2d_id = record_2d_response.json()["recordIds"][0]
print(f"2D record created '{record_2d_id}'")

initial_df = generate_df(['COLUMN_MD', 'COLUMN_X'], range(10))
headers = { 'content-type': 'application/x-parquet'}
print_response(client.post(f'{welllog_dms_url}/{record_2d_id}/data', data=initial_df.to_parquet(engine="pyarrow"), headers=headers))

By convention array data are added to the WellLog record through a Panda dataframe with columns that contain the name of the array and the column number between square bracket. The orient value has to be set to columns.

# Create a session
create_2d_session_response = client.post(f'{welllog_dms_url}/{record_2d_id}/sessions', json={'mode': 'update'})

print_response(create_2d_session_response)
session_id_2d = create_2d_session_response.json()['id']

# Send chunk data for 2D
arr_data_dataframe = generate_df(['2D[0]', '2D[1]'], range(15))

print_response(client.post(f'{welllog_dms_url}/{record_2d_id}/sessions/{session_id_2d}/data',
                           params={"orient": 'columns'},
                           headers={ 'content-type': 'application/json'},
                           data=arr_data_dataframe.to_json(orient='columns')))

# Commit session
print_response(client.patch(f'{welllog_dms_url}/{record_2d_id}/sessions/{session_id_2d}', json={'state': 'commit'}))

The script below shows initial WellLog data before the session and array data added to the final WellLog data version after the session has been committed.

# Display result
bulk_2d_data_response = client.get(f'{welllog_dms_url}/{record_2d_id}/data')
bulk_2d_data = create_df_from_response(bulk_2d_data_response)
display_operation(initial_df, [arr_data_dataframe], bulk_2d_data)
>
Previous WellLog data version
COLUMN_MDCOLUMN_X
0752700
1-36241
2883107
3177159
4156801
5277597
6-1202
7-21669
8334291
9771-56
Current WellLog data version with data chunks added in red
2D[0]2D[1]COLUMN_MDCOLUMN_X
0676702752.000000700.000000
1983588-36.000000241.000000
2948422883.000000107.000000
3272-59177.000000159.000000
4986869156.000000801.000000
5563131277.000000597.000000
670331-1.000000202.000000
7375538-21.000000669.000000
8244416334.000000291.000000
9761580771.000000-56.000000
10825222nannan
11174644nannan
12871857nannan
13880780nannan
14783883nannan

Update existing WellLog data by chunk

This section explains how to replace values for specific curves in a specific range for a given WellLog record id. First let's create through the sample script below a new WellLog record with some bulk data posted as a JSON dataframe to the WellLog record.

# Create new record
response = client.post(welllog_dms_url, json=[record])
print_response(response)
record_id = response.json()["recordIds"][0]
record_id

# Add first bulk data to the record
df_cols_md_x_y_z = generate_df(['COLUMN_MD', 'COLUMN_X', 'COLUMN_Y', 'COLUMN_Z'], range(5))
print_response(client.post(f'{welllog_dms_url}/{record_id}/data', json=df_cols_md_x_y_z.to_dict(orient='split')))

check_data_response = client.get(f'{welllog_dms_url}/{record_id}/data')
print_response(check_data_response)
df_cols_md_x_y_z = create_df_from_response(check_data_response)
df_cols_md_x_y_z
Initial data - Before session
COLUMN_MDCOLUMN_XCOLUMN_YCOLUMN_Z
0-15-21283768
1643659-3437
2674988739530
3-40244311171
4989989710541
chunk 1 sent
COLUMN_MDCOLUMN_Y
0-91877
1-28336
2971648
3458-50
456989
chunk 2 sent
COLUMN_Z
3964
4991
chunk 3 sent
COLUMN_X
5587
6818
7768
Final data - After session commit
COLUMN_MDCOLUMN_XCOLUMN_YCOLUMN_Z
0-91.000000-21877.000000768.000000
1-28.000000659336.000000437.000000
2971.000000988648.000000530.000000
3458.000000244-50.000000964.000000
4569.00000098989.000000991.000000
5nan587nannan
6nan818nannan
7nan768nannan

WellLog record versioning

Each time that the WellLog record metadata or its associated bulk data are updated a new version of the WellLog record is created. This rule makes that the first version for a given WellLog record has never a bulk data associated to it as demonstrated by the script below:

# creating a new record
response = client.post(welllog_dms_url, json=[record])
print_response(response)
record_id = response.json()["recordIds"][0]
record_id

# posting bulk data to the WellLog record
initial_df = generate_df(['COLUMN_MD', 'COLUMN_X'], range(10))
headers = { 'content-type': 'application/x-parquet'}
print_response(client.post(f'{welllog_dms_url}/{record_id}/data', data=initial_df.to_parquet(engine="pyarrow"), headers=headers))

# checking for versions = 2 versions of the WellLog record with only the last one with associated bulk data
results_response = client.get(f'{welllog_dms_url}/{record_id}/versions')
wellLog_versions_response = results_response.json()
versions = wellLog_versions_response['versions']
for index, version in enumerate(versions):
    print(f'{index}. version number: {version}')
    version_data_response = client.get(f'{welllog_dms_url}/{record_id}/versions/{version}/data')
    #print_response(version_data_response)
    if version_data_response.status_code == 200:
        version_df = create_df_from_response(version_data_response)
        version_df_st = version_df.style.set_table_attributes(f"style='margin-left:65px'").highlight_null(null_color='lightyellow').set_caption(f'WellLog data version {version}')   
        display(multi_table([version_df_st]))
    else:
        print(f'\tNo bulk data associated to version {version}')
  1. version number: 1627640423310341 No bulk data associated to version 1627640423310341
  2. version number: 1627640424041113
WellLog data version 1627640424041113
COLUMN_MDCOLUMN_X
0265970
1643-22
2-87926
3710432
4977225
5997880
6997806
73380
8517650
9514792

Write bulk data from a given WellLog record version

Through the wellbore DDMS API it is possible to write bulk data from a given version of the WellLog record. The example below shows a WellLog record with two different versions of the bulk data.

  1. First version contains only a column X
  2. Second version contains columns X and Y

If a column Z is written from the first version, only columns X and Z remains in the final version of the WellLog bulk data.

# creating a new record
response = client.post(welllog_dms_url, json=[record])
print_response(response)
record_id = response.json()["recordIds"][0]
record_id

# sending data for column A 
generated_A_dataframe = generate_df(['COLUMN_MD','COLUMN_X'], range(10))
headers = { 'content-type': 'application/x-parquet'}
print_response(client.post(f'{welllog_dms_url}/{record_id}/data', data=generated_A_dataframe.to_parquet(engine="pyarrow"), headers=headers))


SESSION_MODE = 'update' # 'update' | 'overwrite'

# adding column B to the WellLog by chunk through a session
create_session_response = client.post(f'{welllog_dms_url}/{record_id}/sessions', json={'mode': SESSION_MODE})
print_response(create_session_response)
session_id = create_session_response.json()['id']

generated_B_dataframe = generate_df(['COLUMN_Y'], range(10))
print_response(client.post(f'{welllog_dms_url}/{record_id}/sessions/{session_id}/data', json=generated_B_dataframe.to_dict(orient='split')))

# Commit session
print_response(client.patch(f'{welllog_dms_url}/{record_id}/sessions/{session_id}', json={'state': 'commit'}))

results_response = client.get(f'{welllog_dms_url}/{record_id}/versions')
wellLog_versions_response = results_response.json()
version = wellLog_versions_response['versions'][1]

# Create a session from previous version that contains only column A
session_json = {
    'mode': SESSION_MODE,
    'fromVersion': version
}
create_session_response = client.post(f'{welllog_dms_url}/{record_id}/sessions', json=session_json)
print_response(create_session_response)
session_id = create_session_response.json()['id']


# adding column C to the WellLog by chunk through a session and from the previous version
generated_C_dataframe = generate_df(['COLUMN_Z'], range(10))
print_response(client.post(f'{welllog_dms_url}/{record_id}/sessions/{session_id}/data', json=generated_C_dataframe.to_dict(orient='split')))


# Commit session
print_response(client.patch(f'{welllog_dms_url}/{record_id}/sessions/{session_id}', json={'state': 'commit'}))


# Display result
results_response = client.get(f'{welllog_dms_url}/{record_id}/versions')
wellLog_versions_response = results_response.json()
versions = wellLog_versions_response['versions']
titles = []
dataframes = []
for index, version in enumerate(versions):
    version_data_response = client.get(f'{welllog_dms_url}/{record_id}/versions/{version}/data')
    if version_data_response.status_code == 200:
        if index == 3:
            titles.append(f'{index}. version number {version} created from version {versions[1]}')
        else:
            titles.append(f'{index}. version number {version}')
        version_df = create_df_from_response(version_data_response)
        dataframes.append(version_df)
        

display_side_by_side(dataframes, titles)
1. version number 1627640429377696
COLUMN_MDCOLUMN_X
034518
1845863
2290-62
3947698
4562825
579450
6809153
753450
8121793
9352-97
2. version number 1627640431304081
COLUMN_MDCOLUMN_XCOLUMN_Y
034518750
1845863499
2290-62114
3947698637
4562825368
579450219
680915346
753450628
8121793267
9352-97990
3. version number 1627640433479387 created from version 1627640429377696
COLUMN_MDCOLUMN_XCOLUMN_Z
034518-31
1845863431
2290-62322
39476985
4562825-53
579450949
6809153-47
753450195
8121793291
9352-97-95

List sessions for a record id

The wellbore DDMS provides an API that allows to list the sessions used to write data for a given WellLog record id. The response returned by the API contains for each session some information as from which version the WellLog data have been written in the session.

sessions_response = client.get(f'{welllog_dms_url}/{record_id}/sessions')
sessions_response.json()

[{'id': '23854a8c-9051-48c2-b3f0-2a3c632f85fc', 'recordId': 'data-partition-id:work-product-component--WellLog:30f8f5173cc444cca28582ee7814cc0d', 'fromVersion': 1627640429377696, 'mode': 'update', 'expiry': '2021-07-31T10:20:32.187305', 'createdTime': '2021-07-30T10:20:32.187305', 'updatedTime': '2021-07-30T10:20:34.001277', 'state': 'committed', 'meta': None}, {'id': 'd28ad3ff-30e2-40e1-ac96-a4efedd6b15e', 'recordId': 'data-partition-id:work-product-component--WellLog:30f8f5173cc444cca28582ee7814cc0d', 'fromVersion': 1627640429377696, 'mode': 'update', 'expiry': '2021-07-31T10:20:29.915170', 'createdTime': '2021-07-30T10:20:29.915170', 'updatedTime': '2021-07-30T10:20:31.832840', 'state': 'committed', 'meta': None}]

Read bulk data

As for writing it is possible to specify the format to be returned when reading WellLog bulk data. This is done through the header passed to the GET http client request.

headers = {
    'Accept': 'application/parquet' # 'application/parquet' | 'application/json'
}

Read all data at once

The whole WellLog bulk data can be read in one API call as below:

response = client.get(f'{welllog_dms_url}/{record_id}/data', headers=headers)
print_response(response)
create_df_from_response(response)
COLUMN_MDCOLUMN_XCOLUMN_Z
034518-31
1845863431
2290-62322
39476985
4562825-53
579450949
6809153-47
753450195
8121793291
9352-97-95

Read single curves from the bulk

The GET WellLog data API allows you to pass the list of curves (WellLog data column names) to be returned into the response as follow:

response = client.get(f'{welllog_dms_url}/{record_id}/data', params={'curves': 'COLUMN_MD,COLUMN_Z'}, headers=headers)
print_response(response)
create_df_from_response(response)
COLUMN_MDCOLUMN_Z
0345-31
1845431
2290322
39475
4562-53
579949
6809-47
753195
8121291
9352-95

Read array columns from the bulk

For array data you can pass to the GET WellLog data API the name of the array and the column number between square bracket to specify which array columns you want to get returned into the response.

response = client.get(f'{welllog_dms_url}/{record_2d_id}/data', params={'curves': '2D[0],2D[1]'}, headers=headers)
print_response(response)
create_df_from_response(response)
2D[0]2D[1]
0676702
1983588
2948422
3272-59
4986869
5563131
670331
7375538
8244416
9761580
10825222
11174644
12871857
13880780
14783883

Additional filtering options to read bulk data

Some additional filtering options are available when reading WellLog bulk data as:

  • offset: starting index from which the data have to be read from the WellLog bulk data
  • limit: the maximum number of rows to be returned.
response = client.get(f'{welllog_dms_url}/{record_id}/data', 
                      params={'limit': 4, 'offset': 4, 'curves': 'COLUMN_MD,COLUMN_Z'}, 
                      headers=headers)

print_response(response)
create_df_from_response(response)
COLUMN_MDCOLUMN_Z
4562-53
579949
6809-47
753195

WellLog consistency rules

WellLog entity : Meta only (record) consistency

Rules

see WellLog schema.

  • rule 1: Each CurveID listed in data.Curves.CurveID must be unique.

  • rule 2: Ensure data.ReferenceCurveID exists in data.Curves.CurveID list.

Example

wellog record:

{
"id": "...",
"data": {
  "ReferenceCurveID": "MD",
  "SamplingStart": 7627.0,
  "SamplingStopt": 7627.6,
  "Curves": [
      {
        "CurveID": "CSHG",
        "Mnemonic": "CSHG",
        "LogCurveFamilyID": "data-partition-id:reference-data--LogCurveFamily:Core%20Mercury%20Saturation:",
        "NumberOfColumns": 4
      },
      {
        "CurveID": "MD",
        "CurveUnit": "data-partition-id:reference-data--UnitOfMeasure:ft:",
        "Mnemonic": "MD",
        "LogCurveFamilyID": "data-partition-id:reference-data--LogCurveFamily:Measured%20Depth:",
        "NumberOfColumns": 1
      }
  ],
       
}
  • rule 1: Each Curves.CurveID is unique, here MD and CSHG.

  • rule 2: ReferenceCurveID is set to MD and MD exists Curves.CurveID list.

WellLog entity : Meta data (record) & Bulk data consistency

WellLog record can exist without bulk data​.

Rules

When bulk is added\edited following checks to be done :​

  • rule 3: Ensure Curves.CurveID listed in the record match the column names in the bulk​.

  • rule 4: For each curve, ensure that NumberOfColumns matches the column count in the bulk​ for this curve.

Example

WellLog bulk data:

DEPTHCSHG[0]CSHG[1]CSHG[2]CSHG[3]
7627.00.5730.5730.5730.573
7627.10.5310.5310.5310.531
7627.20.6530.6530.6530.653
7627.30.7880.7880.7880.788
7627.40.0340.0340.0340.034
7627.50.0350.0350.0350.035
7627.60.6070.6070.6070.607

using previous section well log record.

  • rule 3: Curves.CurveID list, DEPTH and CSHG matches the column names in the bulk​. Here CSHG is an array with 4 columns: CSHG[0], CSHG[1], CSHG[2], CSHG[3].

  • rule 4: DEPTH.NumberOfColumns matches the column count in the bulk​ ==> 1. CSHG.NumberOfColumns matches the column count in the bulk​ ==> 4, CSHG[0], CSHG[1], CSHG[2], CSHG[3].

Additional rules when the reference is type "Measured Depth"

The following rules are only applied if the reference is type "Measured Depth".

Rules

  • rule 5: The values associated to the ReferenceCurveID in the record are monotonic​.

  • rule 6: The top and bottom bulk values associated to the ReferenceCurveID should match values data.SamplingStart and data.SamplingStop in the record.

Example

from previous record and bulk data:

record:

{
"id": "...",
"data": {
  "ReferenceCurveID": "MD",
  "SamplingStart": 7627.0,
  "SamplingStopt": 7627.6,

bulk:

DEPTH...
7627.0...
7627.1...
7627.2...
7627.3...
7627.4...
7627.5...
7627.6...
  • rule 5: The values associated to the ReferenceCurveID,DEPTH, are monotonic​: no duplicates, strictly increasing, no missing values.
  • rule 6: data.SamplingStart matches bulk DEPTH top value ==> 7627.0. data.SamplingStop matches bulk DEPTH bottom value ==> 7627.6.

WellboreTrajectory consistency rules

Wellbore trajectory entity : Meta only (record) consistency

Rules

see Wellbore trajectory schema

  • rule 1: Each Name listed in data.AvailableTrajectoryStationProperties.Name must be unique.
Example

Wellbore trajectory record:

{
  "id": "...",
  "data": {
    "Name": "Index",
    "WellboreID": "data-partition-id:master-data--Wellbore:71612d776:",
    "TopDepthMeasuredDepth": 0.0,
    "AzimuthReferenceType": "data-partition-id:reference-data--AzimuthReferenceType:truenorth:",
    "BaseDepthMeasuredDepth": 7628.0,
    "AvailableTrajectoryStationProperties": [
      {
        "TrajectoryStationPropertyTypeID": "data-partition-id:reference-data--TrajectoryStationPropertyType:BOREHOLE_AZIMUTH:",
        "StationPropertyUnitID": "data-partition-id:reference-data--UnitOfMeasure:dega:",
        "Name": "BOREHOLE_AZIMUTH"
      },
      {
        "TrajectoryStationPropertyTypeID": "data-partition-id:reference-data--TrajectoryStationPropertyType:BOREHOLE_DEVIATION:",
        "StationPropertyUnitID": "qa-weu-des-prod-testing-eu:reference-data--UnitOfMeasure:dega:",
        "Name": "BOREHOLE_DEVIATION"
      },
      {
        "TrajectoryStationPropertyTypeID": "data-partition-id:reference-data--TrajectoryStationPropertyType:MD:",
        "StationPropertyUnitID": "data-partition-id:reference-data--UnitOfMeasure:ft:",
        "Name": "MD"
      }
    ]
  }
}
  • rule 1: AvailableTrajectoryStationProperties.Name is unique, here BOREHOLE_AZIMUTH, BOREHOLE_DEVIATION and MD.

Wellbore trajectory entity : Meta data (record) & Bulk data consistency

Wellbore trajectory record can exist without bulk data.

Rules

When bulk is added\edited following checks to be done :

  • rule 2: Ensure AvailableTrajectoryStationProperties.Name listed in the record match the column names in the bulk.
Example

Wellbore trajectory bulk data:

MDBOREHOLE_AZIMUTHBOREHOLE_DEVIATION
0.0360.5730.573
0.5360.5310.531
1.0360.6530.653
.........
7627.5360.0350.035
7628.0360.6070.607

using previous section well log record.

  • rule 2: AvailableTrajectoryStationProperties.Name listed in the record match the column names in the bulk, here BOREHOLE_AZIMUTH, BOREHOLE_DEVIATION and MD.

Additional rules in case of TrajectoryStationPropertyType:MD

The following rules are only applied for TrajectoryStationPropertyType:MD.

Rules

  • rule 3: The values associated to the reference in the record must be monotonic.

  • rule 4: The top and bottom bulk values associated to the reference should match values data.TopDepthMeasuredDepth and data.BaseDepthMeasuredDepth in the record.

Example

from previous record and bulk data:

record:

{
  "id": "...",
  "data": {
    "WellboreID": "data-partition-id:master-data--Wellbore:71612d776:",
    "TopDepthMeasuredDepth": 0.0,
    "BaseDepthMeasuredDepth": 7628.0,

bulk:

MD...
0.0...
0.5...
......
7627.5...
7628.0...
  • rule 3: The values of MD are monotonic: no duplicates, strictly increasing, no missing values.

  • rule 4: data.TopDepthMeasuredDepth matches bulk MD top value ==> 0.0. data.BaseDepthMeasuredDepth matches bulk MD bottom value ==> 7628.0.