- Introduction
- Record structure
- Schema structure
- Record tagging
- What are Derivatives?
- Ingestion workflow
- Storage service APIs
- Using service accounts to access Storage APIs
- Known issues/limitations
After performing the basic user management procedures, such as create users and groups, assign users to groups, etc., through the Entitlements Service, OSDU developers can use the Data Platform Storage Service to ingest metadata information generated by OSDU applications into the Data Platform. The Storage Service provides a set of APIs to manage the entire metadata life-cycle such as ingestion (persistence), modification, deletion, versioning, and data schema.
From the Storage Service perspective, the metadata to ingest is called a record. The following is a basic example of a Data Platform record with a brief explanation of each field:
{
"id": "opendes:hello:123456",
"kind": "opendes:test:hello:1.0.0",
"acl": {
"viewers": ["data.default.viewers@{datapartition}.{domain}.com"],
"owners": ["data.default.owners@{datapartition}.{domain}.com"]
},
"legal": {
"legaltags": ["opendes-sample-legaltag"],
"otherRelevantDataCountries": ["FR","US","CA"]
},
"data": {
"msg": "Hello World, Data Platform!",
"ElevM": 1000
},
"meta": [
{
"kind": "Unit",
"persistableReference": "{\"scaleOffset\":{\"scale\":1.0,\"offset\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
"propertyNames": [
"ElevM"
],
"name": "m"
}
]
}
}- id: (optional) The unique record identifier in the Data Platform. The record IDs can be system or user generated (provided in request) and must be no longer than 512 bytes. It must be composed of 3 components separated by colons, in which the first component must be the data-partition-id. The expression must match a character (a-zA-Z_0-9) or the symbols "-", "." that occurs one or more times that is defined by the [\w-\.]+ expressions group that are followed by the ":" symbol. As a best practice, compose the record ID as
{data-partition-id}:{entity-type}:{unique-identifier}that meets the string validation of^[\w\-\.]+:[\w-\.]+:[\w\-\.\:\%]+$pattern defined by OSDU Data Definition. Please check Known issues/limitations for best practices on user-generated IDs. When not provided, the service creates and assigns an ID to the record and defaults the second component to "doc", which would essentially compose the record ID as{data-partition-id}:{entity-type}:{unique-identifier}.
Example: opendes:work-product-component--WellLog:a6d6-63fc9a0bbac1 - kind: (mandatory) The kind of data being ingested. It must follow the naming convention:
{authority/data-partition-id}:{source}:{entity-type}1:{major}.{minor}.{patch}. Kind is case-sensitive in Storage. Note: The entity-type is composed of groupType--individualType.
Example: opendes:wks:work-product-component--WellLog:1.0.0 - acl: (mandatory) The group of users who have access to the record.
- acl.viewers: The list of valid groups which have view and read privileges for the record. We follow the naming convention such that data groups begin with
data.. - acl.owners: The list of valid groups which have view and read, and write privileges for the record. We follow the naming convention such that data groups begin with
data..
- acl.viewers: The list of valid groups which have view and read privileges for the record. We follow the naming convention such that data groups begin with
- legal: (mandatory) The attributes which represent the legal constraints associated with the record.
- legal.legaltags: The list of legal tag names associated with the record. Refer to Compliance Service for legal tag creation.
- legal.otherRelevantDataCountries: The list of other relevant data countries. It must have at least 2 values: where the data was ingested from and where Data Platform stores the data. Refer to Compliance Service for LegalTag properties.
- data: (mandatory) The record payload represented as a list of key-value pairs.
- meta: An array of elements collecting the frame of reference (FoR) information in records.
- tags: The record label.
Another important concept in the Data Platform Storage Service is schema. A schema is a structure, also defined in JSON, that provides data type information for the record fields. In other words, the schema defines whether a given field in the record is a string, integer, long, float, double, boolean, datetime, link, core:dl:geopoint:1.0.0, or core:dl:geoshape:1.0.0. Arrays of the these data types are also supported.
Note that only fields with associated schema information are indexed by the Search Service. For this reason, OSDU developers must create the respective schema for their records kind before they start ingesting records into the Data Platform.
For example: Record created with the string value for a field defined as integer are created successfully. However, due to a type mismatch, the Indexer Service will fail to index.
Note: The Storage Service persists all cases of null or empty variations of the data, even when they are not being indexed. For details on how search handles null values, see Search service tutorial.
For example:
"data": { "country": " ", "name": "", "uwi": null, "wellHeadElevation.value": "NaN" }
Schemas and records are tied together by the kind attribute. Additionally, a given kind can have zero or exactly one schema associated with. With that concept in mind, the DELFI developer can make use of schema service APIs for schema management.
Note: Note that all schema APIs in Storage service are now deprecated, schema service is now used to manage schemas.
The OSDU storage solution supports adding tags as metadata to storage records, the ability to query the records based on the tags filter (keys and values), and the ability to patch (append, override, delete) existing tags. You can tag records at record creation time using the 'tags' attribute with PUT /api/storage/v2/records API. See Create Record with tags for more details.
Often when ingesting data into the Data Platform, it is the raw data itself. In these scenarios, a single LegalTag is associated with this data.
However, when the data to ingest come from multiple sources, it is considered derivative data. For instance, what if you take multiple records from the Data Platform and create a whole new record based on them all? Or what if you run an algorithm over your seismic data and create an attribute associated with this data that you want to ingest?
At this point, you have derivative data: data derived from data. In these scenarios, you need to assign LegalTags to this new data which is the union of the LegalTags associated with all the source data from which it was created.
For instance, I have Data A associated with LegalTag 1, and Data B associated with LegalTag 2. If I create Data C from Data A and Data B, then Data C will inherit LegalTag 1 from Data A and LegalTag 2 from Data B.
If one or more parent legal tags expire, then the derived/child record becomes invalidated and is soft-deleted from the Data Platorm. For more information on Legal Tag validation, see the Compliance documentation#LegalTag-properties).
You can find more details at Creating derivative records.
To demonstrate the schema and record concept, as well as their respective APIs, consider the following case:
The OSDU developer wants to ingest metadata information related to a well dataset. The metadata contains the following pieces of information: name of the well, company name, year when it was drilled, total depth, and the well location.
In summary, to execute the above workflow, the OSDU developer must:
- Be a valid Data Platform user;
- Define which partition to use;
- Determine the data access control list (ACL) for the data being ingested using data group membership. If the ACL does not already exist or needs adjustments, create or assign users to an existing partition data group;
- Agree on the kind attribute which will represent the developer's wells. For this exaple:
opendes:welldb:wellbore:1.0.0; - Create the legal tag that represents the legal constraints for the metadata to ingest. If there is already an existing legal tag that describes the data to ingest, we encourage you to use the existing one instead of creating a new legal tag.
- Create a schema for the kind
opendes:welldb:wellbore:1.0.0via the schema service; - Create and ingest records using the
PUT /api/storage/v2/recordsAPI.
Refer to Entitlements Service to learn how to become a valid Data Platform user.
The Data Platform stores data in different data partitions, depending on the access to those data partitions in the OSDU system. When using the Storage Service APIs, specify the active account as the data-partition-id.
Refer to Entitlements Service to learn how to create data groups (the ones that start with data.) and assign users to them. For data access authorization purposes in this example, assume the groups data.default.viewers@{datapartition}.{domain}.com and data.default.owners@{datapartition}.{domain}.com were previously created with the [Entitlements Service]((solutions/compliance-service/tutorial/Entitlements-Service.md).
The schema creation is done with the schema service.
Note: The geoshape and geopoint are the elastic concept and data types here. The "kind" core:dl:geopoint:1.0.0 and core:dl:geoshape:1.0.0 are actually primary data types. It is not the "kind" that you can use in the GET /api/storage/v2/schemas/{kind} Schema API.
Refer to Compliance Service for legal tag creation. For this example, assume a legal tag called opendes-well-legal was previously created.
After legal tag creation and schema definition, the records of the kind opendes:welldb:wellbore:1.0.0 can be created. They must follow the same structure and fields' naming convention as defined in the schema. Here is a sample record:
curl
{
"kind": "opendes:welldb:wellbore:1.0.0",
"acl": {
"viewers": ['data.default.viewers@{datapartition}.{domain}.com'],
"owners": ['data.default.owners@{datapartition}.{domain}.com']
},
"legal": {
"legaltags": ['opendes-sample-legaltag'],
"otherRelevantDataCountries": ["FR","US","CA"]
},
"data": {
"name": "well1",
"company": "OSDU",
"drillingYear": 1983,
"depth": 1208.84,
"location": {
"latitude": 29.7512026,
"longitude": -95.4812934
}
},
"meta": [
{
"kind": "CRS",
"propertyNames": [
"Longitude",
"Latitude"
],
"name": "GCS_WGS_1984",
"persistableReference": "{\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\",\"ver\":\"PE_10_3_1\",\"name\":\"GCS_WGS_1984\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"type\":\"LBC\"}"
}
]
}When creating records that represent derivative data, you must assign the following:
- The Record ID and version of all the records that are the direct parents of the new derivative. This is added to the ancestry section.
- The Alpha-2 country code where the derivative was created.
Below is an example of the minimum number of fields required to ingest a derivative Record.
Sample payload
[{
"acl": {
"owners": [
"data.default.owners@{datapartition}.{domain}.com"
],
"viewers": [
"data.default.viewers@{datapartition}.{domain}.com"
]
},
"data": {
"count": 123456789
},
"id": "opendes:id:123456789",
"kind": "opendes:welldb:wellbore:1.0.0",
"legal" :{
"otherRelevantDataCountries": ["US"] //the physical location of where the derivative was created
},
"ancestry" :{
"parents": ["opendes:id:1:version", "opendes:id:2:version"] //the record ids and versions of the Records this derivative was created from
}
}]As shown in the example, the parent records are provided as well as the "otherRelevantDataCountries" (ORDC) of where the derivative was created. The Storage Service takes responsibility for populating the full LegalTag and ORDC values based on the parents.
Therefore, the child record would look something like this:
Sample derived record
{
"records": [
{
"acl": {
"owners": [
"data.default.owners@{datapartition}.{domain}.com"
],
"viewers": [
"data.default.viewers@{datapartition}.{domain}.com"
]
},
"data": {
"count": 123456789
},
"id": "opendes:id:123456789",
"kind": "opendes:welldb:wellbore:1.0.0",
"legal": {
"legaltags": [Parenttag1, Parenttag2],
"otherRelevantDataCountries": ["Parent1ORDC","Parent2ORDC","US"]
},
"ancestry" :{
"parents": ["opendes:id:1:version", "opendes:id:2:version"] //the record ids and versions of the Records this derivative was created from
}
}],
"notFound": [],
"conversionStatuses": []
}The Storage OSDU Solution supports adding tags as metadata to storage records and the ability to query the records based on the tags filter. Use the PUT /api/storage/v2/records API to ingest records with tags metadata such as the following:
curl
curl --request PUT \
--url '/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
--data '[
{
"kind": "opendes:welldb:wellbore:1.0.0",
"acl": {
"viewers": ["data.default.viewers@opendes.{domain}.com"],
"owners": ["data.default.owners@opendes.{domain}.com"]
},
"tags": {
"dataflowId":"test-uploader-test-1_final.lac-2021-04-23T09:51:27.287Z"
},
"legal": {
"legaltags": ["opendes-default-legal"],
"otherRelevantDataCountries": ["FR","US","CA"]
},
"data": {
"msg": "hello from OSDU"
}
}
]'After ingesting, you can use the query or query_with_cursor API from the Search Service with the following payload to search for the record ingested.
curl
curl --request POST \
--url '/api/search/v2/query' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
--data '{
"kind" : "opendes:welldb:wellbore:1.0.0",
"query" : "tags.dataflowId:test-uploader-test-1_final.lac-2021-04-23T09:51:27.287Z"
}'Note The tags feature was introduced in R2. Therefore, record tagging did not work with the "kinds" created before this release. Re-indexing (with force_clean=true) from the Indexer Service maybe required.
After defining the record structure, the OSDU developer must use the PUT /api/storage/v2/records' API to ingest the records, as follows:
curl
curl --request PUT \
--url '/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
--data '[
{
"kind": "opendes:welldb:wellbore:1.0.0",
"acl": {
"viewers": ['data.default.viewers@{datapartition}.{domain}.com'],
"owners": ['data.default.owners@{datapartition}.{domain}.com']
},
"legal": {
"legaltags": ['opendes-sample-legaltag'],
"otherRelevantDataCountries": ["FR","US","CA"]
},
"data": {
"name": "well1",
"company": "OSDU",
"drillingYear": 1983,
"depth": 1208.84,
"location": {
"latitude": 29.7512026,
"longitude": -95.4812934
}
},
"meta": [
{
"kind": "CRS",
"propertyNames": [
"Longitude",
"Latitude"
],
"name": "GCS_WGS_1984",
"persistableReference": "{\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\",\"ver\":\"PE_10_3_1\",\"name\":\"GCS_WGS_1984\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"type\":\"LBC\"}"
}
]
},
{
"kind": "opendes:welldb:wellbore:1.0.0",
"acl": {
"viewers": ['data.default.viewers@{datapartition}.{domain}.com'],
"owners": ['data.default.owners@{datapartition}.{domain}.com']
},
"legal": {
"legaltags": ['opendes-sample-legaltag'],
"otherRelevantDataCountries": ["IN","BR","CA"]
},
"data": {
"name": "well12312",
"company": "shell",
"drillingYear": 2001,
"depth": 208.84,
"location": {
"latitude": 49.7512026,
"longitude": -65.4812934
}
},
"meta": [
{
"kind": "CRS",
"propertyNames": [
"Longitude",
"Latitude"
],
"name": "GCS_WGS_1984",
"persistableReference": "{\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\",\"ver\":\"PE_10_3_1\",\"name\":\"GCS_WGS_1984\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"type\":\"LBC\"}"
}
]
},
...]'Note: A legal record requires information about the countryOfOrigin - where the data originated, and otherRelevantDataCountries - any other countries where the data was ingested/accessed/consumed/stored. otherRelevantDataCountries is only provided when creating a record and should at least contain the country where the data is ingested. The location of the data center, where the record is stored, is automatically added. otherRelevantDataCountries is only relevant per record because the data may originate from the same country countryOfOrigin, but may be ingestedor accessed from different countries. When creating a legal tag, the otherRelevantDataCountries is not a required property. The legal tag itself only contains countryOfOrigin.
Note: The PUT /api/storage/v2/records API can handle 500 records with 32MB size limitation.
The Data Platform Storage service has two different categories of API's 1.Records 2.Query for schema and record management.
The API returns a list of all kinds in the specific {data-partition-id}.
GET /api/storage/v2/query/kinds| Parameter | Description |
|---|---|
| limit | Page size limit. The number of rows to returned per page. If not provided, the default limit is 1000. Use the cursor to paginate through the results. Currently, there is no restriction on the maximum number of rows per page. However, there is a proxy time-out set via Apigee, so your request may time-out if it is very large. So it is best practice to paginate through the results. |
curl
curl --request GET \
--url '/api/storage/v2/query/kinds' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes'
--data '{
"limit": 10,
}The API fetches multiple records (maximum 20) from the Storage Service at a time. It allows you to request data being converted to the openDES standard by using customized header {frame-of-reference}. The openDES standard defines units in SI, CRS in WGS84, elevation in MSL, azimuth in true north, and dates in UTC. Currently, only "none" and "units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;" are valid values for the header {frame-of-reference}.
As for now, we only support conversion for units and CRS and dates. Elevation and azimuth will be available later. Returned records could be either original value (none) or converted (units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc) value depending on the user's request and conversion status. Original value will be returned when the user does not request for the conversion, or the conversion is requested but failed. In addition to the records a user requests, if conversion is requested, a list of the conversion status of each record is included in the response, indicating whether the conversion was successful or not, and if not, what errors occured. Refer to Frame of Reference and Reference Normalization for Storage Fetch Record Frame of Reference.
- If you specify the Wgs84 block, the API just takes the values from the block. The fetched record contains only the Wgs84 block.
- If you only specify the AsIngested block, then the API performs conversion for the coordinates provided. The fetched record includes both the AsIngested and Wgs84 blocks.
- If you specify BOTH the AsIngested and Wgs84 blocks, then API ignores the AsIngested block and takes only the values from the Wgs84 block. The fetched record includes both the AsIngested and Wgs84 blocks.
The CRS conversion process can be time-consuming because it relies on the Esri projection engine to perform the geo-spatial operations. Up to 90 seconds have been measured for transformations with the largest known parameter file (ESRI, 108109). Simple conversions with a few points have response times around 0.5 seconds while 500000 points require 180 to 200 seconds. Complex operations have response times of around 90 seconds for a few points to ~260 seconds for 500000 points.
For details on how the record is indexed in regard to frame of reference and normalization (coordinate conversion), refer to Search service tutorial.
If some records were not found or the user doesn't have access to view them, their ids are returned in the notFound section of the response.
POST /api/storage/v2/query/records:batchcurl
curl --request POST \
--url '/api/storage/v2/query/records:batch' \
--header 'Authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: opendes' \
--header 'frame-of-reference: units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;' \
--data '{
"records": [
"opendes:well:123456789",
"opendes:wellTop:abc789456",
"opendes:wellLog:4531wega22"
]
}The API fetches multiple records in bulk of up to 100 records at a time.
If some records were not found, their ids are returned in the invalidRecords section of the response. And if the user doesn't have access to view some records, the ids are returned in the retryRecords section of the response.
POST /api/storage/v2/query/recordscurl
curl --request POST \
--url '/api/storage/v2/query/records' \
--header 'Authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: opendes' \
--header 'accept: application/json' \
--data '{
"records": [
"opendes:well:123456789",
"opendes:wellTop:abc789456",
"opendes:wellLog:4531wega22"
]
}The API fetches records found by the given kind. It also uses page size limit and the cursor to paginate through the results.
Storage honors the case sensitivity while querying the records. Storage considers these two kinds to be different: "slb:OSDU:USER:1.1.0" and "slb:OSDU:user:1.1.0".
GET /api/storage/v2/query/records| Parameter | Description |
|---|---|
| kind | Kind to search record. |
| limit | Page size limit. The number of rows to be returned per page. If not provided, the default limit is 1000. |
| cursor | Not required. Returned with each response to use it for pagination for next calls. |
curl
curl --request GET \
--url '/api/storage/v2/query/records?kind={kind}&limit={limit}&cursor={cursor}' \
--header 'Authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: opendes' \
--header 'accept: application/json'
}The API represents the main injection mechanism into the Data Platform. It allows you to create and update records. When no record ID is provided or when the provided ID is not already present in the Data Platform, then a new record is created. If the ID is related to an existing record in the Data Platform, then an update operation occurs and a new version of the record is created.
Key details to note when creating or updating records:
- The record version only applies to the data block. Therefore, the record update trace is kept only for changes in the data block. When there are changes to any other root properties, such as legal tag or ancestry, the ACL is applied to the entire record (in all versions of the record). There is only one version of the metadata for a record.
- When updating root properties, such as acl or legal tag, the tags using the PATCH API will NOT update the record version, but the changes are applied for all versions of the record.
- When updating root properties, such as acl or legal tag, the tags and ancestry using the PUT API WILL update the record version, and the changes are applied to all versions of the record. Essentially, any update using the PUT API results in creating a new version of the record.
- Entitlements service creates the groups with all lowercase, even if the input has mixed case. In order to properly assign the ACL, record ACL provided upon record creation must be in all lowercase.
More details available at Creating records and Ingesting records sections.
A record's size can become very large due to the number of versions a record has, not just because of the data it contains. There is a 2MB record size limit that includes data and non-data properties. For each record, the maximum number of versions is 2000.
Note: The Storage Service works with English characters and supports UTF-8 encoding. International language characters may not be supported and may cause inconsistent behavior in the Storage and Search Services.
The skipdupes parameter is only related to update operations, which means you are calling the API with record IDs that are already present in the Data Platform. If skipdupes==true, it means the service will not update the record if the payload is the same (duplicates). The default value of skipdupes parameter is false. If there is a difference in the payload, then a new version of the record is created. However, if skipdupes == false in an update operation, the service does not check whether the payload is the same or not and always creates a new version, even if it is identical to a previous version. But skipedRecordIds are the record IDs which were not updated (skipped) because skipdupes == true and it had the same payload. In a PUT response, there is no duplication of the record IDs. They are either in recordIds or skippedRecordIds.
For clarity, this is the current behavior of skipdupes:
If skipdupes is true
- if the record does not exist at all, then create a new record.
- if the record was soft-deleted, then make the record active again if the data is the same, or create a new version if data is different.
- if the record exists,
- if the data is the same, then skip it.
- if data is different, then create a new version
If skipdupes is false
- if the record does not exist at all, then create a new record.
- if the record was soft-deleted, then create a new version of the record.
- if the record exists, then a new version of the record is created, regardless whether the data is the same or different.
The API retrieves the specific version of the given record.
Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.
GET /api/storage/v2/records/{id}/{version}
curl
curl --request GET \
--url '/api/storage/v2/records/{id}/{version}?attribute={attribute}' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes'
}The API returns a list containing all versions for the given record ID.
Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.
GET /api/storage/v2/records/versions/{id}
curl
curl --request GET \
--url '/api/storage/v2/records/versions/{id}'\
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' This API returns the latest version of the given record.
Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.
GET /api/storage/v2/records/{id}curl
curl --request GET \
--url '/api/storage/v2/records/{id}?attribute={attribute}'\
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
}
The API performs a logical deletion of the given record. You can undo this operation by ingesting the record again with the same ID. The deleted (inactive) record is removed from the index, and therefore is not returned in the search result. This operation can be performed by the owner of the record.
Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.
POST /api/storage/v2/records/{id}:deletecurl
curl --request POST \
--url '/api/storage/v2/records/{id}:delete' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json'\
--header 'data-partition-id: opendes'The API performs a logical deletion of a batch of record (max size of a batch is 500 records). You can undo this operation by ingesting the record again with the same ID. The deleted (inactive) records are removed from the index, and therefore are not returned in the search result. This operation can be performed by the owner of the record.
POST /api/storage/v2/records/deletecurl
curl --request POST \
--url '/api/storage/v2/records/delete' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json'\
--header 'data-partition-id: common'
--data-raw '[
"tenant:type:unique-identifier",
"tenant:type:unique-identifier",
"tenant:type:unique-identifier"
]' Note: A record-change event is sent in batches of 50 records at a time. So for a maximum batch of 500 records per purge call, 10 record-change events are sent.
The API performs the permanent physical deletion of the given record and all of its versions, not including any linked records or files if they exist.We recommend that you clean up all the linked records, such as child records, records in relationship block, and actual data (file ingested via Submit API in Ingestion Service), to avoid having orphaned data after using the Purge API. This operation cannot be undone. This operation can be performed by the owner of the record.
Note:
- If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is `opendes:test:%5BUS%5D`, you should encode it as `opendes%3Atest%3A%255BUS%255D`.
- The Purge Record API works on active and inactive (soft-deleted) records.
DELETE /api/storage/v2/records/{id}curl
curl --request DELETE \
--url '/api/storage/v2/records/{id}' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json'\
--header 'data-partition-id: opendes'The API performs the permanent physical deletion of the given record versions excluding latest version and any linked records or files if there are any. If 'limit' query parameter is used then it will delete oldest versions defined by 'limit'. This operation cannot be undone.
Note: If the Record ID contains encoded characters (e.g., %5B, %5D, etc.), ensure that you url-encode the Record ID to avoid issues with URL decoding. For example, if your Record ID is opendes:test:%5BUS%5D, you should encode it as opendes%3Atest%3A%255BUS%255D.
DELETE api/storage/v2/records/{id}/versions| Parameter | Description |
|---|---|
| limit | API will delete oldest versions defined by 'limit', excluding the latest record version |
curl
curl --request DELETE \
--url 'api/storage/v2/records/{id}/versions?limit=2' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json'\
--header 'Data-Partition-Id: common'The Bulk Update API allows you to update a record's metadata in batch for Record tags, Legal Tags, ACL owners, and ACL viewers. It takes an array of record IDs, with or without version numbers, with a maximum number of 500, and updates the properties specified in the operation path with the value and operation type provided. Users must specify the corresponding data partition ID in the header as well.
Users must provide op(operation type), path, and value in the 'ops'field. Currently, the add, replace, and remov" operation types are supported. Users specify the property they want to update in the "path" field. For example, "/acl/viewers" indicates the values for the metadata acl viewers would be updated. Provide new values in "value" field. In the "replace" operation, the property value in "path" is fully replaced by the values provided in "value" field. In the "add" operation, the property value in "path" is appended with values provided in "value" field. In the "remove" operation, values provided in the "value" field are removed from property value in "path". Note that you must provide a record's version number if you want to apply an optimistic lock on the records, which means that before updating the metadata, the version number is checked to see if any other update operations happens at the same time. When conflict is discovered, then corresponding records is locked and returned in 'lockedRecordIds' in the response body without updating the metadata.
You can only update record tags, legal tags, and ACLs. Record versions do not change and cannot change with this operation.
The Bulk Update API has 2 different success response codes:
| Code | Description |
|---|---|
| 200 | The update operation succeeds fully, and all records’ metadata are updated. |
| 206 | The update operation succeeds partially. Some records are not updated due to different reasons, including records not found or unauthorized. For records whose version number was also provided in the request, they may be locked during a metadata update, due to optimistic lock. In this case, the version you provided is not the latest one, and other uses might be updating the record. If the record version is locked, the 'lockedRecordIds' field is returned. You can retry later with the record's latest version number, after the record is no longer locked. |
PATCH /api/storage/v2/recordscurl
curl --request PATCH \
--url '/api/storage/v2/records' \
--header 'data-partition-id: <data-partition-id>' \
--header 'Authorization: <Bearer Token>' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"ids": [
"data-partition:type:uentity-typ:unique-identifier",
"data-partition:type:uentity-typ:unique-identifier",
"data-partition:type:uentity-typ:unique-identifier"
]
},
"ops": [
{
"op":"replace",
"path":"/acl/viewers",
"value":[
"data.default.viewers@<DataPartition>.<Domain>.com",
"test1.viewers@<DataPartition>.<Domain>.com"
]
},
{
"op":"replace",
"path":"/legal/legaltags",
"value":[
"<DataPartition>-legaltag-1",
"<DataPartition>-legaltag-2"
]
},
{
"op":"replace",
"path":"/acl/owners",
"value":[
"data.default.owners@<DataPartition>.<Domain>.com",
"test1.owners@<DataPartition>.<Domain>.com"
]
},
{
"op":"add",
"path":"/acl/viewers",
"value":[
"test2.viewers@<DataPartition>.<Domain>.com"
]
},
{
"op":"add",
"path":"/legal/legaltags",
"value":[
"<DataPartition>-legaltag-3"
]
},
{
"op":"add",
"path":"/acl/owners",
"value":[
"test2.owners@<DataPartition>.<Domain>.com"
]
},
{
"op":"remove",
"path":"/tags",
"value":[
"dataflowId"
]
}
]
}'The response body contains a total count of the updated records, an array of updated record IDs, an array of not found record IDs, an array of unauthorized record IDs, and an array of locked record IDs.
This API allows update of records data and/or metadata in batch. It takes an array of record ids (without version numbers) with a maximum number of 100, and updates properties specified in the operation path with value and operation type provided. Users need to specify the corresponding data partition id in the header as well. The API response contains list of record IDs that were patched successfully, as well as list of record IDs that failed to be patched, with the list of errors.
Note: The input record IDs must not contain version of the records. However, the list of record IDs returned in the response will have <recordId>:<version> format. This is because any data update increases the record version, however metadata updates do not. There is only one metadata per record, not per record version. The version returned in the response will be the latest version of each record.
- This API supports PATCH operation in compliant to the Patch RFC spec.
- Users need to provide a list of recordIDs and a list of operations to be performed on each record.
- Each operation has
op(operation type),path, andvaluein the field 'ops' (unless the operation isremove, then the fieldvalueshouldn't be provided). - The currently supported operations are "replace", "add", and "remove".
- The supported properties for metadata update are
tags,acl/viewers,acl/owners,legal/legaltags,ancestry/parents,kindandmeta(metaattribute out of the data block). - The supported properties for data update are
data. - If
aclis being updated, the user should be part of the groups that are being replaced/added/removed as ACL.
Records patch API has the following response codes:
| Code | Description |
|---|---|
| 200 | The update operation succeeds fully, all records’ data and/or metadata are updated. |
| 206 | The update operation succeeds partially. Some records are not updated due to different reasons, including records not found or user does not have permission to edit the records. |
| 400 | The update operation fails when the input validation fails. Please check below section for more details. |
To remain compliant with the domain data models and business requirements, we perform certain input validation on the request payload. Please see below table for details:
| Add | Replace | Remove | Remarks | |
|---|---|---|---|---|
| /kind | Bad Request | Replaces kind | Bad Request | kind can only be replaced; value must be a raw string & valid kind. Path must match exactly to /kind |
| /tags | Replaces tags with value. Creates /tags if it doesn't exist | Replaces tags with value. /tags must exist | Removes tags, value is ignored. /tags must exist | add and replace behavior similar because /tags is an object member |
| /tags/key | Adds "key" : "value" to tags, /tags must exist | Replaces /tags/key with value. /tags/key must exist | Removes "key" : "value" from tags, /tags/key must exist | |
| /acl/viewers OR /acl/owners OR /legal/legaltags OR /ancestry/parents | Replaces the target array with value. Creates the attribute if it doesn't exist | Replaces the target attribute with new value. Target location must exist | Only /ancestry/parents can be removed | In case of add or replace, Path should be an exact match and value must be an array of string values |
| /acl/viewers/0 OR /acl/owners/0 OR /legal/legaltags/0 OR /ancestry/parents/0 | Adds value to the target index in the array. The index cannot be greater than the array length, otherwise will result in error | Replaces value at the target index in the array. Target location must exist | Removes value at the target index in the array. Target location must exist | Character - can be used to mention last index of the target array. For acl and legaltag, the target value must not be an empty array after applying Patch |
| /data | /data doesn't adhere to a rigid structure, therefore users must be cautious when modifying /data attributes. Value type must adhere to attribute type defined in Schema service. Any type change can potentially cause indexing/search issues. | |||
| /meta | if an update for /meta, it should be compliant with its structure (i.e. array of Map<String, Object>) |
Check out some examples below, but refer to the Patch RFC spec for a comprehensive documentation on JsonPatch and more examples.
Note: The examples below only highlight the ops array from the input payload, a full curl sample is provided at the end.
Please note that the add operation performs either an add or a replace operation, depending on the target location. Refer to Patch RFC spec - add for the explaination.
Add legaltag
abcto a record, at the end of thelegaltagsarray. This will perform an addition becausepathpoints to an index in an arrayadd legaltag
"ops": [ { "op": "add", "path": "/legal/legaltags/-", "value": "abc" } ]Add/Replace
tagsfor a record. Note that although the operation isadd, this adds/tagsif it doesn't exist or replaces the current value with given value for/tags. This is because the target location is an object member that already exists. Please read RFC Spec for more details.replace tags
"ops": [ { "op": "add", "path": "/tags", "value": { "tag1": "value1" } } ]Add a new property
subproptodatablock. Note thatparentmust exist. This operation will addchildunderparentwith the value specified:add to data block
"ops": [ { "op": "add", "path": "/data/parent/child", "value": { "grandchild": { "key": "value" } } } ]
The replace operation is fairly straightforward, it replaces the value at the target location with a new value.
Replace
/acl/ownersarray for a record.replace acl owners
"ops": [ { "op": "replace", "path": "/acl/owners", "value": [ "newacl1", "newacl2" ] } ]
The remove operation removes the value at the target location. The field value must not be provided for this operation.
Remove
/data/parent/childfrom the data blockremove data property
"ops": [ { "op": "remove", "path": "/data/parent/child" } ]Remove the first value from
/acl/viewersarrayremove first acl viewer
"ops": [ { "op": "remove", "path": "/acl/viewers/0" } ]
Below is a complete sample curl which performs multiple operations on a list of record IDs.
complete curl example
curl --request PATCH \
--url '/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json-patch+json'\
--header 'Data-Partition-Id: common'
--data-raw ‘{
"query": {
"ids": [
"tenant1:type:unique-identifier",
"tenant2:type:unique-identifier",
"tenant3:type:unique-identifier"
]
},
"ops": [
{
"op": "remove",
"path": "/legal/legaltags/0"
},
{
"op": "remove",
"path": "/ancestry/parents"
},
{
"op": "add",
"path": "/acl/viewers/-",
"value": "data.default.viewer1@opendes.enterprisedata.cloud.slb-ds.com"
},
{
"op":"replace",
"path":"/kind",
"value":"newKind"
},
{
"op":"add",
"path":"/tags",
"value":{
"tag1":"value1",
"tag2":"value2"
}
},
{
"op":"replace",
"path":"/data/someProperty/targetProperty",
"value": {
"newValue": {
"subProperty":"subValue"
}
}
}
]
}| Metadata Update API | Patch API | |
|---|---|---|
Header Content-Type | application/json | application/json-patch+json |
| Supported Record properties | acl, tags, legaltags | acl, tags, legaltags, ancestry, kind, data, meta |
ops field in payload | array of PatchOperation | JsonPatch |
| Maximum number of records | 500 | 100 |
- There's currently an inconsistency between search and storage kinds. Storage honors the case sensitivity for the kind parameter when querying the records, but Search does not.
- Due to limitations of underlying blob storage for Azure, usage of record-ids ending with dot (.) is discouraged. Due to these limitations, PUT API partially supports such record-ids. Payload that contains a combination of record-id that does not end with dot (.) and record-id that ends with dot (.) will be rejected with 4xx error code. API will honor such record-ids if they are split in two different PUT requests.
- Storage record ID length of 512 bytes is enforced:
- Impacts PUT & PATCH data update operations
- No impact for PATCH metadata update & read/delete/purge operations
- 400 Bad Request error code is returned if record ID is longer than 512 bytes (note that the length enforcement is in bytes, not number of characters)