Table of Contents

Introduction

After performing the basic user management procedures, such as create users and groups, assign users to groups, etc., through the Entitlements Service, OSDU developers can use the Data Platform Storage Service to ingest metadata information generated by OSDU applications into the Data Platform. The Storage Service provides a set of APIs to manage the entire metadata life-cycle such as ingestion (persistence), modification, deletion, versioning, and data schema.

Back to table of contents

Record structure

From the Storage Service perspective, the metadata to ingest is called a record. The following is a basic example of a Data Platform record with a brief explanation of each field:

{
   "id": "opendes:hello:123456",
   "kind": "opendes:test:hello:1.0.0",
   "acl": {
     "viewers": ["data.default.viewers@{datapartition}.{domain}.com"],
     "owners": ["data.default.owners@{datapartition}.{domain}.com"]
   },
   "legal": {
     "legaltags": ["opendes-sample-legaltag"],
     "otherRelevantDataCountries": ["FR","US","CA"]
   },
   "data": {
     "msg": "Hello World, Data Platform!",
     "ElevM": 1000
   },
   "meta": [
        {
          "kind": "Unit",
          "persistableReference": "{\"scaleOffset\":{\"scale\":1.0,\"offset\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
          "propertyNames": [
              "ElevM"
          ],
          "name": "m"
        }
    ]
   }
}
  • id: (optional) The unique record identifier in the Data Platform. The record IDs can be system or user generated (provided in request) and must be no longer than 512 bytes. It must be composed of 3 components separated by colons, in which the first component must be the data-partition-id. The expression must match a character (a-zA-Z_0-9) or the symbols "-", "." that occurs one or more times that is defined by the [\w-\.]+ expressions group that are followed by the ":" symbol. As a best practice, compose the record ID as {data-partition-id}:{entity-type}:{unique-identifier} that meets the string validation of ^[\w\-\.]+:[\w-\.]+:[\w\-\.\:\%]+$ pattern defined by OSDU Data Definition. Please check Known issues/limitations for best practices on user-generated IDs. When not provided, the service creates and assigns an ID to the record and defaults the second component to "doc", which would essentially compose the record ID as {data-partition-id}:{entity-type}:{unique-identifier}.
    Example: opendes:work-product-component--WellLog:a6d6-63fc9a0bbac1
  • kind: (mandatory) The kind of data being ingested. It must follow the naming convention: {authority/data-partition-id}:{source}:{entity-type}1:{major}.{minor}.{patch}. Kind is case-sensitive in Storage. Note: The entity-type is composed of groupType--individualType.
    Example: opendes:wks:work-product-component--WellLog:1.0.0
  • acl: (mandatory) The group of users who have access to the record.
    • acl.viewers: The list of valid groups which have view and read privileges for the record. We follow the naming convention such that data groups begin with data..
    • acl.owners: The list of valid groups which have view and read, and write privileges for the record. We follow the naming convention such that data groups begin with data..
  • legal: (mandatory) The attributes which represent the legal constraints associated with the record.
    • legal.legaltags: The list of legal tag names associated with the record. Refer to Compliance Service for legal tag creation.
    • legal.otherRelevantDataCountries: The list of other relevant data countries. It must have at least 2 values: where the data was ingested from and where Data Platform stores the data. Refer to Compliance Service for LegalTag properties.
  • data: (mandatory) The record payload represented as a list of key-value pairs.
  • meta: An array of elements collecting the frame of reference (FoR) information in records.
  • tags: The record label.

Back to table of contents

Schema structure

Another important concept in the Data Platform Storage Service is schema. A schema is a structure, also defined in JSON, that provides data type information for the record fields. In other words, the schema defines whether a given field in the record is a string, integer, long, float, double, boolean, datetime, link, core:dl:geopoint:1.0.0, or core:dl:geoshape:1.0.0. Arrays of the these data types are also supported.

Note that only fields with associated schema information are indexed by the Search Service. For this reason, OSDU developers must create the respective schema for their records kind before they start ingesting records into the Data Platform.

For example: Record created with the string value for a field defined as integer are created successfully. However, due to a type mismatch, the Indexer Service will fail to index.

Note: The Storage Service persists all cases of null or empty variations of the data, even when they are not being indexed. For details on how search handles null values, see Search service tutorial.

For example:

"data": { "country": " ", "name": "", "uwi": null, "wellHeadElevation.value": "NaN" }

Schemas and records are tied together by the kind attribute. Additionally, a given kind can have zero or exactly one schema associated with. With that concept in mind, the DELFI developer can make use of schema service APIs for schema management.

Note: Note that all schema APIs in Storage service are now deprecated, schema service is now used to manage schemas.

Back to table of contents

Record Tagging

The OSDU storage solution supports adding tags as metadata to storage records, the ability to query the records based on the tags filter (keys and values), and the ability to patch (append, override, delete) existing tags. You can tag records at record creation time using the 'tags' attribute with PUT /api/storage/v2/records API. See Create Record with tags for more details.

Back to table of contents

What are Derivatives?

Often when ingesting data into the Data Platform, it is the raw data itself. In these scenarios, a single LegalTag is associated with this data.

However, when the data to ingest come from multiple sources, it is considered derivative data. For instance, what if you take multiple records from the Data Platform and create a whole new record based on them all? Or what if you run an algorithm over your seismic data and create an attribute associated with this data that you want to ingest?

At this point, you have derivative data: data derived from data. In these scenarios, you need to assign LegalTags to this new data which is the union of the LegalTags associated with all the source data from which it was created.

For instance, I have Data A associated with LegalTag 1, and Data B associated with LegalTag 2. If I create Data C from Data A and Data B, then Data C will inherit LegalTag 1 from Data A and LegalTag 2 from Data B.

If one or more parent legal tags expire, then the derived/child record becomes invalidated and is soft-deleted from the Data Platorm. For more information on Legal Tag validation, see the Compliance documentation#LegalTag-properties).

You can find more details at Creating derivative records.

Ingestion workflow

To demonstrate the schema and record concept, as well as their respective APIs, consider the following case:

The OSDU developer wants to ingest metadata information related to a well dataset. The metadata contains the following pieces of information: name of the well, company name, year when it was drilled, total depth, and the well location.

In summary, to execute the above workflow, the OSDU developer must:

  1. Be a valid Data Platform user;
  2. Define which partition to use;
  3. Determine the data access control list (ACL) for the data being ingested using data group membership. If the ACL does not already exist or needs adjustments, create or assign users to an existing partition data group;
  4. Agree on the kind attribute which will represent the developer's wells. For this exaple: opendes:welldb:wellbore:1.0.0;
  5. Create the legal tag that represents the legal constraints for the metadata to ingest. If there is already an existing legal tag that describes the data to ingest, we encourage you to use the existing one instead of creating a new legal tag.
  6. Create a schema for the kind opendes:welldb:wellbore:1.0.0 via the schema service;
  7. Create and ingest records using the PUT /api/storage/v2/records API.

Becoming a Data Platform user

Refer to Entitlements Service to learn how to become a valid Data Platform user.

Choosing a partition

The Data Platform stores data in different data partitions, depending on the access to those data partitions in the OSDU system. When using the Storage Service APIs, specify the active account as the data-partition-id.

Creating data groups

Refer to Entitlements Service to learn how to create data groups (the ones that start with data.) and assign users to them. For data access authorization purposes in this example, assume the groups data.default.viewers@{datapartition}.{domain}.com and data.default.owners@{datapartition}.{domain}.com were previously created with the [Entitlements Service]((solutions/compliance-service/tutorial/Entitlements-Service.md).

Creating the schema

The schema creation is done with the schema service.

Note: The geoshape and geopoint are the elastic concept and data types here. The "kind" core:dl:geopoint:1.0.0 and core:dl:geoshape:1.0.0 are actually primary data types. It is not the "kind" that you can use in the GET /api/storage/v2/schemas/{kind} Schema API.

Refer to Compliance Service for legal tag creation. For this example, assume a legal tag called opendes-well-legal was previously created.

Creating records

After legal tag creation and schema definition, the records of the kind opendes:welldb:wellbore:1.0.0 can be created. They must follow the same structure and fields' naming convention as defined in the schema. Here is a sample record:

curl
{
  "kind": "opendes:welldb:wellbore:1.0.0",
  "acl": {
    "viewers": ['data.default.viewers@{datapartition}.{domain}.com'],
    "owners": ['data.default.owners@{datapartition}.{domain}.com']
  },
  "legal": {
    "legaltags": ['opendes-sample-legaltag'],
    "otherRelevantDataCountries": ["FR","US","CA"]
  },
  "data": {
    "name": "well1",
    "company": "OSDU",
    "drillingYear": 1983,
    "depth": 1208.84,
    "location": {
      "latitude": 29.7512026,
      "longitude": -95.4812934
    }
  },
  "meta": [
    {
      "kind": "CRS",
      "propertyNames": [
        "Longitude",
        "Latitude"
      ],
      "name": "GCS_WGS_1984",
      "persistableReference": "{\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\",\"ver\":\"PE_10_3_1\",\"name\":\"GCS_WGS_1984\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"type\":\"LBC\"}"
    }
		]
}

Creating derivative Records

When creating records that represent derivative data, you must assign the following:

  • The Record ID and version of all the records that are the direct parents of the new derivative. This is added to the ancestry section.
  • The Alpha-2 country code where the derivative was created.

Below is an example of the minimum number of fields required to ingest a derivative Record.

Sample payload
        [{
                "acl": {
                        "owners": [ 
                            "data.default.owners@{datapartition}.{domain}.com" 
                        ],
                        "viewers": [ 
                            "data.default.viewers@{datapartition}.{domain}.com"
                        ]
                },
                "data": {
                        "count": 123456789
                },
                "id": "opendes:id:123456789",
                "kind": "opendes:welldb:wellbore:1.0.0",
                "legal" :{
                        "otherRelevantDataCountries": ["US"] //the physical location of where the derivative was created
                },
                "ancestry" :{
                       "parents": ["opendes:id:1:version", "opendes:id:2:version"] //the record ids and versions of the Records this derivative was created from
                }    
        }]

As shown in the example, the parent records are provided as well as the "otherRelevantDataCountries" (ORDC) of where the derivative was created. The Storage Service takes responsibility for populating the full LegalTag and ORDC values based on the parents.

Therefore, the child record would look something like this:

Sample derived record
        {
          "records": [
           {
                "acl": {
                        "owners": [ 
                            "data.default.owners@{datapartition}.{domain}.com" 
                        ],
                        "viewers": [ 
                            "data.default.viewers@{datapartition}.{domain}.com"
                        ]
                },
                "data": {
                        "count": 123456789
                },
                "id": "opendes:id:123456789",
                "kind": "opendes:welldb:wellbore:1.0.0",
                "legal": {
                    "legaltags": [Parenttag1, Parenttag2],
                    "otherRelevantDataCountries": ["Parent1ORDC","Parent2ORDC","US"]
                },
                "ancestry" :{
                       "parents": ["opendes:id:1:version", "opendes:id:2:version"] //the record ids and versions of the Records this derivative was created from
                }    
           }],
          "notFound": [],
          "conversionStatuses": []
        }

Back to table of contents

Tags

The Storage OSDU Solution supports adding tags as metadata to storage records and the ability to query the records based on the tags filter. Use the PUT /api/storage/v2/records API to ingest records with tags metadata such as the following:

curl
curl --request PUT \
  --url '/api/storage/v2/records' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '[
  {
    "kind": "opendes:welldb:wellbore:1.0.0",
    "acl": {
      "viewers": ["data.default.viewers@opendes.{domain}.com"],
      "owners": ["data.default.owners@opendes.{domain}.com"]
    },
    "tags": {
      "dataflowId":"test-uploader-test-1_final.lac-2021-04-23T09:51:27.287Z"
    },
    "legal": {
      "legaltags": ["opendes-default-legal"],
      "otherRelevantDataCountries": ["FR","US","CA"]
    },
    "data": {
      "msg": "hello from OSDU"
    }
  }
]'

After ingesting, you can use the query or query_with_cursor API from the Search Service with the following payload to search for the record ingested.

curl
curl --request POST \
--url '/api/search/v2/query' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
--data '{
    "kind" : "opendes:welldb:wellbore:1.0.0",
    "query" : "tags.dataflowId:test-uploader-test-1_final.lac-2021-04-23T09:51:27.287Z"
}'

Note The tags feature was introduced in R2. Therefore, record tagging did not work with the "kinds" created before this release. Re-indexing (with force_clean=true) from the Indexer Service maybe required.

Ingesting records

After defining the record structure, the OSDU developer must use the PUT /api/storage/v2/records' API to ingest the records, as follows:

curl
curl --request PUT \
  --url '/api/storage/v2/records' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '[
  {
    "kind": "opendes:welldb:wellbore:1.0.0",
    "acl": {
      "viewers": ['data.default.viewers@{datapartition}.{domain}.com'],
      "owners": ['data.default.owners@{datapartition}.{domain}.com']
    },
    "legal": {
      "legaltags": ['opendes-sample-legaltag'],
      "otherRelevantDataCountries": ["FR","US","CA"]
    },
    "data": {
      "name": "well1",
      "company": "OSDU",
      "drillingYear": 1983,
      "depth": 1208.84,
      "location": {
        "latitude": 29.7512026,
        "longitude": -95.4812934
      }
    },
  "meta": [
    {
      "kind": "CRS",
      "propertyNames": [
        "Longitude",
        "Latitude"
      ],
      "name": "GCS_WGS_1984",
      "persistableReference": "{\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\",\"ver\":\"PE_10_3_1\",\"name\":\"GCS_WGS_1984\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"type\":\"LBC\"}"
    }
		]
  },
  {
    "kind": "opendes:welldb:wellbore:1.0.0",
    "acl": {
      "viewers": ['data.default.viewers@{datapartition}.{domain}.com'],
      "owners": ['data.default.owners@{datapartition}.{domain}.com']
    },
    "legal": {
      "legaltags": ['opendes-sample-legaltag'],
      "otherRelevantDataCountries": ["IN","BR","CA"]
    },
    "data": {
      "name": "well12312",
      "company": "shell",
      "drillingYear": 2001,
      "depth": 208.84,
      "location": {
        "latitude": 49.7512026,
        "longitude": -65.4812934
      }
    },
  "meta": [
    {
      "kind": "CRS",
      "propertyNames": [
        "Longitude",
        "Latitude"
      ],
      "name": "GCS_WGS_1984",
      "persistableReference": "{\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\",\"ver\":\"PE_10_3_1\",\"name\":\"GCS_WGS_1984\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"type\":\"LBC\"}"
    }
		]
  },
   ...]'

Note: A legal record requires information about the countryOfOrigin - where the data originated, and otherRelevantDataCountries - any other countries where the data was ingested/accessed/consumed/stored. otherRelevantDataCountries is only provided when creating a record and should at least contain the country where the data is ingested. The location of the data center, where the record is stored, is automatically added. otherRelevantDataCountries is only relevant per record because the data may originate from the same country countryOfOrigin, but may be ingestedor accessed from different countries. When creating a legal tag, the otherRelevantDataCountries is not a required property. The legal tag itself only contains countryOfOrigin.

Note: The PUT /api/storage/v2/records API can handle 500 records with 32MB size limitation.

Back to table of contents

Storage service APIs

The Data Platform Storage service has two different categories of API's 1.Records 2.Query for schema and record management.

Query

Query all kinds

The API returns a list of all kinds in the specific {data-partition-id}.

 GET /api/storage/v2/query/kinds

Parameters

ParameterDescription
limitPage size limit. The number of rows to returned per page. If not provided, the default limit is 1000. Use the cursor to paginate through the results. Currently, there is no restriction on the maximum number of rows per page. However, there is a proxy time-out set via Apigee, so your request may time-out if it is very large. So it is best practice to paginate through the results.
curl
curl --request GET \
  --url '/api/storage/v2/query/kinds' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' 
  --data '{
  "limit": 10,
 }

Fetch Records

The API fetches multiple records (maximum 20) from the Storage Service at a time. It allows you to request data being converted to the openDES standard by using customized header {frame-of-reference}. The openDES standard defines units in SI, CRS in WGS84, elevation in MSL, azimuth in true north, and dates in UTC. Currently, only "none" and "units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;" are valid values for the header {frame-of-reference}.

As for now, we only support conversion for units and CRS and dates. Elevation and azimuth will be available later. Returned records could be either original value (none) or converted (units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc) value depending on the user's request and conversion status. Original value will be returned when the user does not request for the conversion, or the conversion is requested but failed. In addition to the records a user requests, if conversion is requested, a list of the conversion status of each record is included in the response, indicating whether the conversion was successful or not, and if not, what errors occured. Refer to Frame of Reference and Reference Normalization for Storage Fetch Record Frame of Reference.

How conversion is handled for AsIngested and Wgs84

  1. If you specify the Wgs84 block, the API just takes the values from the block. The fetched record contains only the Wgs84 block.
  2. If you only specify the AsIngested block, then the API performs conversion for the coordinates provided. The fetched record includes both the AsIngested and Wgs84 blocks.
  3. If you specify BOTH the AsIngested and Wgs84 blocks, then API ignores the AsIngested block and takes only the values from the Wgs84 block. The fetched record includes both the AsIngested and Wgs84 blocks.

The CRS conversion process can be time-consuming because it relies on the Esri projection engine to perform the geo-spatial operations. Up to 90 seconds have been measured for transformations with the largest known parameter file (ESRI, 108109). Simple conversions with a few points have response times around 0.5 seconds while 500000 points require 180 to 200 seconds. Complex operations have response times of around 90 seconds for a few points to ~260 seconds for 500000 points.

For details on how the record is indexed in regard to frame of reference and normalization (coordinate conversion), refer to Search service tutorial.

If some records were not found or the user doesn't have access to view them, their ids are returned in the notFound section of the response.

POST /api/storage/v2/query/records:batch
curl
curl --request POST \
  --url '/api/storage/v2/query/records:batch' \
  --header 'Authorization: Bearer <JWT>' \
  --header 'Content-Type: application/json' \
  --header 'data-partition-id: opendes' \
  --header 'frame-of-reference: units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;' \
  --data '{
    "records": [
        "opendes:well:123456789",
        "opendes:wellTop:abc789456",
        "opendes:wellLog:4531wega22"
    ]
}

Fetch Multiple Records

The API fetches multiple records in bulk of up to 100 records at a time.

If some records were not found, their ids are returned in the invalidRecords section of the response. And if the user doesn't have access to view some records, the ids are returned in the retryRecords section of the response.

POST /api/storage/v2/query/records
curl
curl --request POST \
  --url '/api/storage/v2/query/records' \
  --header 'Authorization: Bearer <JWT>' \
  --header 'Content-Type: application/json' \
  --header 'data-partition-id: opendes' \
  --header 'accept: application/json' \
  --data '{
    "records": [
        "opendes:well:123456789",
        "opendes:wellTop:abc789456",
        "opendes:wellLog:4531wega22"
    ]
}

Fetch records by kind

The API fetches records found by the given kind. It also uses page size limit and the cursor to paginate through the results.

Storage honors the case sensitivity while querying the records. Storage considers these two kinds to be different: "slb:OSDU:USER:1.1.0" and "slb:OSDU:user:1.1.0".

GET /api/storage/v2/query/records

Parameters

ParameterDescription
kindKind to search record.
limitPage size limit. The number of rows to be returned per page. If not provided, the default limit is 1000.
cursorNot required. Returned with each response to use it for pagination for next calls.
curl
curl --request GET \
  --url '/api/storage/v2/query/records?kind={kind}&limit={limit}&cursor={cursor}' \
  --header 'Authorization: Bearer <JWT>' \
  --header 'Content-Type: application/json' \
  --header 'data-partition-id: opendes' \
  --header 'accept: application/json'
}

Back to table of contents

Records

Create or Update records

The API represents the main injection mechanism into the Data Platform. It allows you to create and update records. When no record ID is provided or when the provided ID is not already present in the Data Platform, then a new record is created. If the ID is related to an existing record in the Data Platform, then an update operation occurs and a new version of the record is created.

Key details to note when creating or updating records:

  1. The record version only applies to the data block. Therefore, the record update trace is kept only for changes in the data block. When there are changes to any other root properties, such as legal tag or ancestry, the ACL is applied to the entire record (in all versions of the record). There is only one version of the metadata for a record.
  2. When updating root properties, such as acl or legal tag, the tags using the PATCH API will NOT update the record version, but the changes are applied for all versions of the record.
  3. When updating root properties, such as acl or legal tag, the tags and ancestry using the PUT API WILL update the record version, and the changes are applied to all versions of the record. Essentially, any update using the PUT API results in creating a new version of the record.
  4. Entitlements service creates the groups with all lowercase, even if the input has mixed case. In order to properly assign the ACL, record ACL provided upon record creation must be in all lowercase.

More details available at Creating records and Ingesting records sections.

Record size

A record's size can become very large due to the number of versions a record has, not just because of the data it contains. There is a 2MB record size limit that includes data and non-data properties. For each record, the maximum number of versions is 2000.

Note: The Storage Service works with English characters and supports UTF-8 encoding. International language characters may not be supported and may cause inconsistent behavior in the Storage and Search Services.

Using skipdupes

The skipdupes parameter is only related to update operations, which means you are calling the API with record IDs that are already present in the Data Platform. If skipdupes==true, it means the service will not update the record if the payload is the same (duplicates). The default value of skipdupes parameter is false. If there is a difference in the payload, then a new version of the record is created. However, if skipdupes == false in an update operation, the service does not check whether the payload is the same or not and always creates a new version, even if it is identical to a previous version. But skipedRecordIds are the record IDs which were not updated (skipped) because skipdupes == true and it had the same payload. In a PUT response, there is no duplication of the record IDs. They are either in recordIds or skippedRecordIds.

For clarity, this is the current behavior of skipdupes:

If skipdupes is true

  • if the record does not exist at all, then create a new record.
  • if the record was soft-deleted, then make the record active again if the data is the same, or create a new version if data is different.
  • if the record exists,
    • if the data is the same, then skip it.
    • if data is different, then create a new version

If skipdupes is false

  • if the record does not exist at all, then create a new record.
  • if the record was soft-deleted, then create a new version of the record.
  • if the record exists, then a new version of the record is created, regardless whether the data is the same or different.

Get record version

The API retrieves the specific version of the given record.

Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.

GET /api/storage/v2/records/{id}/{version}
curl

 curl --request GET \
  --url '/api/storage/v2/records/{id}/{version}?attribute={attribute}' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' 
}

Get all record versions

The API returns a list containing all versions for the given record ID.

Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.

GET /api/storage/v2/records/versions/{id}
curl
curl --request GET \
  --url '/api/storage/v2/records/versions/{id}'\
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' 

Get record

This API returns the latest version of the given record.

Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.

GET /api/storage/v2/records/{id}
curl
curl --request GET \
  --url '/api/storage/v2/records/{id}?attribute={attribute}'\
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
}

Delete record

The API performs a logical deletion of the given record. You can undo this operation by ingesting the record again with the same ID. The deleted (inactive) record is removed from the index, and therefore is not returned in the search result. This operation can be performed by the owner of the record.

Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.

POST /api/storage/v2/records/{id}:delete
curl
curl --request POST \
   --url '/api/storage/v2/records/{id}:delete' \
   --header 'accept: application/json' \
   --header 'authorization: Bearer <JWT>' \
   --header 'content-type: application/json'\
   --header 'data-partition-id: opendes'

Delete records in bulk

The API performs a logical deletion of a batch of record (max size of a batch is 500 records). You can undo this operation by ingesting the record again with the same ID. The deleted (inactive) records are removed from the index, and therefore are not returned in the search result. This operation can be performed by the owner of the record.

POST /api/storage/v2/records/delete
curl
curl --request POST \
   --url '/api/storage/v2/records/delete' \
   --header 'accept: application/json' \
   --header 'authorization: Bearer <JWT>' \
   --header 'content-type: application/json'\
   --header 'data-partition-id: common'
   --data-raw '[
          "tenant:type:unique-identifier",
          "tenant:type:unique-identifier",
          "tenant:type:unique-identifier"
     ]'     

Note: A record-change event is sent in batches of 50 records at a time. So for a maximum batch of 500 records per purge call, 10 record-change events are sent.

Purge record

The API performs the permanent physical deletion of the given record and all of its versions, not including any linked records or files if they exist.We recommend that you clean up all the linked records, such as child records, records in relationship block, and actual data (file ingested via Submit API in Ingestion Service), to avoid having orphaned data after using the Purge API. This operation cannot be undone. This operation can be performed by the owner of the record.

Note:

  • If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is `opendes:test:%5BUS%5D`, you should encode it as `opendes%3Atest%3A%255BUS%255D`.
  • The Purge Record API works on active and inactive (soft-deleted) records.
DELETE /api/storage/v2/records/{id}
curl
curl --request DELETE \
   --url '/api/storage/v2/records/{id}' \
   --header 'accept: application/json' \
   --header 'authorization: Bearer <JWT>' \
   --header 'content-type: application/json'\
   --header 'data-partition-id: opendes'

Purge record versions

The API performs the permanent physical deletion of the given record versions excluding latest version and any linked records or files if there are any. If 'limit' query parameter is used then it will delete oldest versions defined by 'limit'. This operation cannot be undone.

Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.

DELETE api/storage/v2/records/{id}/versions

Parameters

ParameterDescription
limitAPI will delete oldest versions defined by 'limit', excluding the latest record version
curl
curl --request DELETE \
   --url 'api/storage/v2/records/{id}/versions?limit=2' \
   --header 'accept: application/json' \
   --header 'authorization: Bearer <JWT>' \
   --header 'content-type: application/json'\
   --header 'Data-Partition-Id: common'

Metadata Bulk Update

The Bulk Update API allows you to update a record's metadata in batch for Record tags, Legal Tags, ACL owners, and ACL viewers. It takes an array of record IDs, with or without version numbers, with a maximum number of 500, and updates the properties specified in the operation path with the value and operation type provided. Users must specify the corresponding data partition ID in the header as well.

Users must provide op(operation type), path, and value in the 'ops'field. Currently, the add, replace, and remov" operation types are supported. Users specify the property they want to update in the "path" field. For example, "/acl/viewers" indicates the values for the metadata acl viewers would be updated. Provide new values in "value" field. In the "replace" operation, the property value in "path" is fully replaced by the values provided in "value" field. In the "add" operation, the property value in "path" is appended with values provided in "value" field. In the "remove" operation, values provided in the "value" field are removed from property value in "path". Note that you must provide a record's version number if you want to apply an optimistic lock on the records, which means that before updating the metadata, the version number is checked to see if any other update operations happens at the same time. When conflict is discovered, then corresponding records is locked and returned in 'lockedRecordIds' in the response body without updating the metadata.

You can only update record tags, legal tags, and ACLs. Record versions do not change and cannot change with this operation.

The Bulk Update API has 2 different success response codes:

CodeDescription
200The update operation succeeds fully, and all records’ metadata are updated.
206The update operation succeeds partially. Some records are not updated due to different reasons, including records not found or unauthorized. For records whose version number was also provided in the request, they may be locked during a metadata update, due to optimistic lock. In this case, the version you provided is not the latest one, and other uses might be updating the record. If the record version is locked, the 'lockedRecordIds' field is returned. You can retry later with the record's latest version number, after the record is no longer locked.
PATCH /api/storage/v2/records
curl
curl --request PATCH \
--url '/api/storage/v2/records' \
--header 'data-partition-id: <data-partition-id>' \
--header 'Authorization: <Bearer Token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": {
      "ids": [
          "data-partition:type:uentity-typ:unique-identifier",
          "data-partition:type:uentity-typ:unique-identifier",
          "data-partition:type:uentity-typ:unique-identifier"
      ]
    },
    "ops": [
      {
        "op":"replace",
        "path":"/acl/viewers",
        "value":[
            "data.default.viewers@<DataPartition>.<Domain>.com",
            "test1.viewers@<DataPartition>.<Domain>.com"
        ]
      },
      {
        "op":"replace",
        "path":"/legal/legaltags",
        "value":[
            "<DataPartition>-legaltag-1",
            "<DataPartition>-legaltag-2"
        ]
      },
      {
        "op":"replace",
        "path":"/acl/owners",
        "value":[
            "data.default.owners@<DataPartition>.<Domain>.com",
            "test1.owners@<DataPartition>.<Domain>.com"
        ]
      },
      {
        "op":"add",
        "path":"/acl/viewers",
        "value":[
            "test2.viewers@<DataPartition>.<Domain>.com"
        ]
      },
      {
        "op":"add",
        "path":"/legal/legaltags",
        "value":[
            "<DataPartition>-legaltag-3"
        ]
      },
      {
        "op":"add",
        "path":"/acl/owners",
        "value":[
            "test2.owners@<DataPartition>.<Domain>.com"
        ]
      },
      {
        "op":"remove",
        "path":"/tags",
        "value":[
            "dataflowId"
        ]
      }
    ]
  }'

The response body contains a total count of the updated records, an array of updated record IDs, an array of not found record IDs, an array of unauthorized record IDs, and an array of locked record IDs.

Records patch

This API allows update of records data and/or metadata in batch. It takes an array of record ids (without version numbers) with a maximum number of 100, and updates properties specified in the operation path with value and operation type provided. Users need to specify the corresponding data partition id in the header as well. The API response contains list of record IDs that were patched successfully, as well as list of record IDs that failed to be patched, with the list of errors.

Note: The input record IDs must not contain version of the records. However, the list of record IDs returned in the response will have <recordId>:<version> format. This is because any data update increases the record version, however metadata updates do not. There is only one metadata per record, not per record version. The version returned in the response will be the latest version of each record.

  • This API supports PATCH operation in compliant to the Patch RFC spec.
  • Users need to provide a list of recordIDs and a list of operations to be performed on each record.
  • Each operation has op(operation type), path, and value in the field 'ops' (unless the operation is remove, then the field value shouldn't be provided).
  • The currently supported operations are "replace", "add", and "remove".
  • The supported properties for metadata update are tags, acl/viewers, acl/owners, legal/legaltags, ancestry/parents, kind and meta (meta attribute out of the data block).
  • The supported properties for data update are data.
  • If acl is being updated, the user should be part of the groups that are being replaced/added/removed as ACL.

Records patch API has the following response codes:

CodeDescription
200The update operation succeeds fully, all records’ data and/or metadata are updated.
206The update operation succeeds partially. Some records are not updated due to different reasons, including records not found or user does not have permission to edit the records.
400The update operation fails when the input validation fails. Please check below section for more details.

Input Validation

To remain compliant with the domain data models and business requirements, we perform certain input validation on the request payload. Please see below table for details:

AddReplaceRemoveRemarks
/kindBad RequestReplaces kindBad Requestkind can only be replaced; value must be a raw string & valid kind. Path must match exactly to /kind
/tagsReplaces tags with value. Creates /tags if it doesn't existReplaces tags with value. /tags must existRemoves tags, value is ignored. /tags must existadd and replace behavior similar because /tags is an object member
/tags/keyAdds "key" : "value" to tags, /tags must existReplaces /tags/key with value. /tags/key must existRemoves "key" : "value" from tags, /tags/key must exist
/acl/viewers OR /acl/owners OR /legal/legaltags OR /ancestry/parentsReplaces the target array with value. Creates the attribute if it doesn't existReplaces the target attribute with new value. Target location must existOnly /ancestry/parents can be removedIn case of add or replace, Path should be an exact match and value must be an array of string values
/acl/viewers/0 OR /acl/owners/0 OR /legal/legaltags/0 OR /ancestry/parents/0Adds value to the target index in the array. The index cannot be greater than the array length, otherwise will result in errorReplaces value at the target index in the array. Target location must existRemoves value at the target index in the array. Target location must existCharacter - can be used to mention last index of the target array. For acl and legaltag, the target value must not be an empty array after applying Patch
/data/data doesn't adhere to a rigid structure, therefore users must be cautious when modifying /data attributes. Value type must adhere to attribute type defined in Schema service. Any type change can potentially cause indexing/search issues.
/metaif an update for /meta, it should be compliant with its structure (i.e. array of Map<String, Object>)

Check out some examples below, but refer to the Patch RFC spec for a comprehensive documentation on JsonPatch and more examples.

Note: The examples below only highlight the ops array from the input payload, a full curl sample is provided at the end.

Add Operation

Please note that the add operation performs either an add or a replace operation, depending on the target location. Refer to Patch RFC spec - add for the explaination.

  1. Add legaltag abc to a record, at the end of the legaltags array. This will perform an addition because path points to an index in an array

    add legaltag
    "ops": [
            { 
              "op": "add", 
              "path": "/legal/legaltags/-",
              "value": "abc"
            }
          ]
  2. Add/Replace tags for a record. Note that although the operation is add, this adds /tags if it doesn't exist or replaces the current value with given value for /tags. This is because the target location is an object member that already exists. Please read RFC Spec for more details.

    replace tags
    "ops": [
            { 
              "op": "add", 
              "path": "/tags",
              "value": {
                "tag1": "value1"
              }
            }
          ]
  3. Add a new property subprop to data block. Note that parent must exist. This operation will add child under parent with the value specified:

    add to data block
    "ops": [
            {
              "op": "add", 
              "path": "/data/parent/child",
              "value": {
                "grandchild": {
                  "key": "value"
                }
              }
            }
          ]

Replace Operation

The replace operation is fairly straightforward, it replaces the value at the target location with a new value.

  1. Replace /acl/owners array for a record.

    replace acl owners
    "ops": [
            { 
              "op": "replace", 
              "path": "/acl/owners",
              "value": [
                "newacl1",
                "newacl2"
              ]
            }
          ]

Remove Operation

The remove operation removes the value at the target location. The field value must not be provided for this operation.

  1. Remove /data/parent/child from the data block

    remove data property
    "ops": [
            { 
              "op": "remove", 
              "path": "/data/parent/child"
            }
          ]
  2. Remove the first value from /acl/viewers array

    remove first acl viewer
    "ops": [
            { 
              "op": "remove", 
              "path": "/acl/viewers/0"
            }
          ]

Below is a complete sample curl which performs multiple operations on a list of record IDs.

complete curl example
curl --request PATCH \
   --url '/api/storage/v2/records' \
   --header 'accept: application/json' \
   --header 'authorization: Bearer <JWT>' \
   --header 'content-type: application/json-patch+json'\
   --header 'Data-Partition-Id: common'
    --data-raw ‘{ 
      "query": { 
        "ids": [
          "tenant1:type:unique-identifier",
          "tenant2:type:unique-identifier",
          "tenant3:type:unique-identifier"
        ]
      }, 
      "ops": [ 
        {
          "op": "remove", 
          "path": "/legal/legaltags/0"
        }, 
        { 
	      "op": "remove", 
	      "path": "/ancestry/parents"
        }, 
        { 
          "op": "add", 
          "path": "/acl/viewers/-",
          "value": "data.default.viewer1@opendes.enterprisedata.cloud.slb-ds.com"
        },
        {
          "op":"replace",
          "path":"/kind",
          "value":"newKind"
        },
        {
          "op":"add",
          "path":"/tags",
          "value":{
            "tag1":"value1",
            "tag2":"value2"
          }
        },
        {
          "op":"replace",
          "path":"/data/someProperty/targetProperty",
          "value": { 
            "newValue": {
              "subProperty":"subValue"
            }
         }
        }
      ] 
    }

Differences compared to metadata update api

Metadata Update APIPatch API
Header Content-Typeapplication/jsonapplication/json-patch+json
Supported Record propertiesacl, tags, legaltagsacl, tags, legaltags, ancestry, kind, data, meta
ops field in payloadarray of PatchOperationJsonPatch
Maximum number of records500100

Back to table of contents

Known issues/limitations

  • There's currently an inconsistency between search and storage kinds. Storage honors the case sensitivity for the kind parameter when querying the records, but Search does not.
  • Due to limitations of underlying blob storage for Azure, usage of record-ids ending with dot (.) is discouraged. Due to these limitations, PUT API partially supports such record-ids. Payload that contains a combination of record-id that does not end with dot (.) and record-id that ends with dot (.) will be rejected with 4xx error code. API will honor such record-ids if they are split in two different PUT requests.
  • Storage record ID length of 512 bytes is enforced:
    • Impacts PUT & PATCH data update operations
    • No impact for PATCH metadata update & read/delete/purge operations
    • 400 Bad Request error code is returned if record ID is longer than 512 bytes (note that the length enforcement is in bytes, not number of characters)