Last updated

Log statistics

Introduction

Goal: Provide statistical information for Bulk log data

Use case 1: Data consumers interested in specific statistics for specific log.

Use case 2: Data ingestors can use statistics as a method to ensure bulk data is transferred correctly to Wellbore DMS.

When transferring bulk data to Wellbore DMS it might happened that due to various reasons (human, system) the data is not transferred correctly and might not be discovered until the data is actually used. By launching the calculation on the application side and comparing the results from this service, you will be able to compare results quickly.

The method calculate the following statistics:

  • Count : Number of values (we exclude the missing value NaN)

  • Min & Max : smallest and largest values in this channel

  • Arithmetic Mean value & Standard deviation

  • Percentiles : 25%, 50% , 75%

Methodology

To calculate the statistics, the service use the method describe from the "Pandas" library.

Link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html

Example:

DataFrame Panda:

::Refcol_1col_2col_3
01.010NaN11
11.55820.021
22230.031

after using describe method:

::Refcol_1col_2col_3
count3323
mean1.523.3333332521
std0.530.2875117.07106810
min121011
25%1.25622.516
50%1.5102521
75%1.753427.526
max2583031

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.

For numeric data, the result's index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

https://www.w3resource.com/pandas/dataframe/dataframe-to_json.php

Example with Curl:

Curl

curl -X GET "http://${host}/osdu/wdms/wellbore/v3/log/${logid}/statistics" -H  "accept: application/json" -H  "data-partition-id: ${data-partition-id}" -H  "Authorization: Bearer $TOKEN"
{"Ref":{"count":3.0,"mean":1.5,"std":0.5,"min":1.0,"25%":1.25,"50%":1.5,"75%":1.75,"max":2.0},
"col_1":{"count":3.0,"mean":23.3333333333,"std":30.2875111776,"min":2.0,"25%":6.0,"50%":10.0,"75%":34.0,"max":58.0},
"col_2":{"count":2.0,"mean":25.0,"std":7.0710678119,"min":20.0,"25%":22.5,"50%":25.0,"75%":27.5,"max":30.0},
"col_3":{"count":3.0,"mean":21.0,"std":10.0,"min":11.0,"25%":16.0,"50%":21.0,"75%":26.0,"max":31.0}}

API response from the swagger :

Stat API response

{
  "Ref": {
    "count": 3,
    "mean": 1.5,
    "std": 0.5,
    "min": 1,
    "25%": 1.25,
    "50%": 1.5,
    "75%": 1.75,
    "max": 2
  },
  "col_1": {
    "count": 3,
    "mean": 23.3333333333,
    "std": 30.2875111776,
    "min": 2,
    "25%": 6,
    "50%": 10,
    "75%": 34,
    "max": 58
  },
  "col_2": {
    "count": 2,
    "mean": 25,
    "std": 7.0710678119,
    "min": 20,
    "25%": 22.5,
    "50%": 25,
    "75%": 27.5,
    "max": 30
  },
  "col_3": {
    "count": 3,
    "mean": 21,
    "std": 10,
    "min": 11,
    "25%": 16,
    "50%": 21,
    "75%": 26,
    "max": 31
  }
}