{"templateId":"markdown","sharedDataIds":{"sidebar":"sidebar-guides/sidebars.yaml"},"props":{"metadata":{"markdoc":{"tagList":[]},"type":"markdown"},"seo":{"title":"Table of contents","description":"Accelerate E&P application development and protect your innovation by consuming our Data and Domain APIs / Platform APIs.","lang":"en-US","meta":[{"name":"robots","content":"noindex"}],"llmstxt":{"hide":true,"excludeFiles":[]}},"dynamicMarkdocComponents":[],"compilationErrors":[],"ast":{"$$mdtype":"Tag","name":"article","attributes":{},"children":[{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"table-of-contents","__idx":0},"children":["Table of contents ",{"$$mdtype":"Tag","name":"a","attributes":{"name":"TOC"},"children":[]}]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#introduction"},"children":["Introduction"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#clean-data"},"children":["Steps to clean data"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#roles"},"children":["Required permissions/roles"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#script"},"children":["Example script"]}]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"introduction","__idx":1},"children":["Introduction ",{"$$mdtype":"Tag","name":"a","attributes":{"name":"introduction"},"children":[]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["When shards in the ECK reaches capacity, one option is to clean up the indices with zero documents or delete the data that is no longer needed, to bring own the shard capacity. The other option is, of course, to increase the shard capacity, which has cost implications."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"steps-to-clean-up-data","__idx":2},"children":["Steps to clean up data ",{"$$mdtype":"Tag","name":"a","attributes":{"name":"clean-data"},"children":[]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Before the indices can be cleaned up, the data associated with the schema must be deleted. Below are the steps to clean up the storage records related to the schema we want to clean up."]},{"$$mdtype":"Tag","name":"ol","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Query for all the records in a kind/schema using ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"/solutions/core-service/tutorial/search-service"},"children":["Search Service"]},"."," ",{"$$mdtype":"Tag","name":"br","attributes":{},"children":[]},"a. If the number of records is less than 10k, it is best to use the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["Search Query API"]},"."," ",{"$$mdtype":"Tag","name":"br","attributes":{},"children":[]},"b. If the number of records exceeds 10k, then you can use the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["Search Query with Cursor API"]}," ",{"$$mdtype":"Tag","name":"br","attributes":{},"children":[]},{"$$mdtype":"Tag","name":"strong","attributes":{},"children":["Note that if there are records that failed to index correctly, then they will not be returned when querying via Search service. In this case, please reach out to an OSDU SRE for help querying for all the records belonging to a kind from Storage service (this is a privileged API)."," ",{"$$mdtype":"Tag","name":"br","attributes":{},"children":[]},"This needs to be done by a script to iterate through all the records until the cursor returns null."]}]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Delete all the records retrieved from Step 1 using the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["Storage Purge API"]}," from ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"/solutions/core-service/tutorial/storage-service"},"children":["Storage Service"]},". This operation should also delete index record info with it."," ",{"$$mdtype":"Tag","name":"br","attributes":{},"children":[]},{"$$mdtype":"Tag","name":"strong","attributes":{},"children":["Since only one record can be purged at a time, this needs to be done by a script to iterate through all the records."]}]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Verify that all the records have been purged properly by searching for them again using the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["Search Query API"]}]}]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"required-roles-for-using-api","__idx":3},"children":["Required roles for using API ",{"$$mdtype":"Tag","name":"a","attributes":{"name":"roles"},"children":[]}]},{"$$mdtype":"Tag","name":"div","attributes":{"className":"md-table-wrapper"},"children":[{"$$mdtype":"Tag","name":"table","attributes":{"className":"md"},"children":[{"$$mdtype":"Tag","name":"thead","attributes":{},"children":[{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"Step"},"children":["Step"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"API"},"children":["API"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"Required roles"},"children":["Required roles"]}]}]},{"$$mdtype":"Tag","name":"tbody","attributes":{},"children":[{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["1"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["POST / search/query_with_cursor"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["users.datalake.viewers or users.datalake.editors or users.datalake.admins"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["DELETE storage/records/{id}"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["users.datalake.admins"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["3"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["POST / search/query"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["users.datalake.viewers or users.datalake.editors or users.datalake.admins"]}]}]}]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"example-code","__idx":4},"children":["Example code ",{"$$mdtype":"Tag","name":"a","attributes":{"name":"script"},"children":[]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Here is the Python script that follows above-mentioned steps."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"import requests\nimport time\n\ntenant_id = \"\"\nresource_id = \"\"\nclient_id = \"\"\nclient_secret = \"\"\n\ndata_partition_id=\"opendes\"\nschema_id=\"opendes:test:facet:1.0.5\"\n\nBASE_URL=\"https://evd.managed-osdu.cloud.slb-ds.com/api\"\n\n#Delay can vary between a minute and up to half an hour, depending on the number of records to be deleted\ndelay_before_verification_in_seconds=60\n#Limit(number of record ids received per requests) for SEARCH/query_with_cursor API\nlimit=1000\n\nSTORAGE_URL=f\"{BASE_URL}/storage/v2\"\nSEARCH_URL=f\"{BASE_URL}/search/v2\"\n\ndata = {\n    'grant_type': 'client_credentials',\n    'client_id': client_id,\n    'client_secret': client_secret,\n    'resource': resource_id\n}\n\nauth_response = requests.post(\n    f\"https://login.microsoftonline.com/{tenant_id}/oauth2/token\",\n    headers={'Content-Type': 'application/x-www-form-urlencoded'},\n    data=data\n)\n\nauth_token = auth_response.json()['access_token']\n\nheaders = {\n    \"Content-Type\": \"application/json\",\n    \"data-partition-id\": data_partition_id,\n    \"Authorization\": f\"Bearer {auth_token}\"\n}\n\n#Step 1 Get all records to delete\nprint(\"Search all the records related to the kind/schema\")\nexit_from_loop = False\ncursor = \"\"\nrecords_to_delete = []\nnew_elements = []\nwhile not exit_from_loop:\n    data = {\n        \"cursor\": cursor,\n        \"limit\": limit,\n        \"kind\": schema_id,\n        \"returnedFields\": [\"id\"]\n    }\n\n    response = requests.post(f\"{SEARCH_URL}/query_with_cursor\", json=data, headers=headers, verify=False)\n    RECORDS_BY_KIND = response.json()\n\n    cursor = RECORDS_BY_KIND.get(\"cursor\")\n    new_elements = [obj[\"id\"] for obj in RECORDS_BY_KIND[\"results\"]]\n\n    records_to_delete.extend(new_elements)\n    if cursor is None:\n        exit_from_loop = True\nprint(\"Search completed. Number of records to delete: \", len(records_to_delete))\n\n#Step 2 Deleting records\nfor id in records_to_delete:\n    response = requests.delete(f\"{STORAGE_URL}/records/{id}\", headers=headers, verify=False)\n    print(f\"Deleting record with id {id}. Response status: \", response.status_code)\n\n#Step 3 Searching records\nprint(f\"Waiting for {delay_before_verification_in_seconds} seconds...\")\ntime.sleep(delay_before_verification_in_seconds)\n\ndata = {\n    \"kind\": schema_id,\n    \"returnedFields\": [\"id\"]\n}\nresponse = requests.post(f\"{SEARCH_URL}/query\", headers=headers, json=data, verify=False)\nprint(f\"Searching records with kind {schema_id}. Total count: \", response.json().get(\"totalCount\"))\n"},"children":[]},{"$$mdtype":"Tag","name":"ol","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":["Save script as file with *.py extension (for example filename.py)."]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["Fill in the appropriate values for tenant_id, resource_id, client_id, client_secrets, data_partition_id, schema_id and BASE_URL. Open a terminal or command prompt, navigate to the directory containing Python file, and run the following command ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["python filename.py"]},"."," ","If everything went well, you should see ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["Total count:  0"]}," at the end (similar to this one):"]}]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"Search all the records related to the kind/schema\nSearch completed. Number of records to delete:  4\nDeleting record with id opendes:test:well1... Response status:  204\nDeleting record with id opendes:test:well2... Response status:  204\nDeleting record with id opendes:test:well3... Response status:  204\nDeleting record with id opendes:test:well4... Response status:  204\nWaiting for 60 seconds...\nSearching records with kind slb:test:well:1.0.0. Total count:  0\n"},"children":[]}]},"headings":[{"value":"Table of contents","id":"table-of-contents","depth":2},{"value":"Introduction","id":"introduction","depth":2},{"value":"Steps to clean up data","id":"steps-to-clean-up-data","depth":2},{"value":"Required roles for using API","id":"required-roles-for-using-api","depth":2},{"value":"Example code","id":"example-code","depth":2}],"frontmatter":{"seo":{"title":"Table of contents"}},"lastModified":"2025-04-10T19:06:30.000Z","pagePropGetterError":{"message":"","name":""}},"slug":"/solutions/core-service/tutorial/how-to-remove-records-related-to-schema","userData":{"isAuthenticated":false,"teams":["anonymous"]},"isPublic":true}