Last updated

Overview

How do we execute computations in a cloud that is mostly tailored towards web applications and light-weight micro services? The energy industry is driven by advances in science and technology, and SLB has always been at the forefront of that development. The cloud opens new opportunities: Computational resources are vast, a treasure through of data awaits in the data ecosystem, and live information may be streamed from IoT devices deployed in the field. The Engine EcoSystem or EESy (pronounced "easy" as in -- "We make engines in the cloud easy") is the Engine Eco-System developed by SLB to bring computational engines into the cloud. It provides the APIs and framework for computational engines in this environment, pre-existing ones like SLB's HPC simulators as well as future ones that will be born in the cloud to produce previously unattainable insights.

Engines in context


The Delfi™ digital platform’s cloud architecture consists of three major components: a data ecosystem, an engine ecosystem, and workflows. The data ecosystem liberates the data by enhancing their access, association, management and utilization to enable workflows previously unimaginable. Workflow implementations leverage the data ecosystem to identify and deliver relevant data as well as the engine ecosystem to perform computations. The engine ecosystem lives between the two as a connecting link.

Engines, Data, Workflows


The figure shows the scope of EESy and its relationship to other components. Workflows and data ecosystem are displayed on the left, engines are on the right, and EESy sandwiched in between. EESy implements all cross-cutting concerns that are needed to provision and execute computational engines in the cloud and connect them up to workflows through a consistent, generic web service API. It abstracts engine details away from workflows, and workflow details away from engines. Workflows discover engines and consume them like a service, leaving deployment, orchestration, storage management, security, etc. to the ecosystem. Computational engines developed before the cloud do not need to worry about the new environment. Cloud-born engines, on the other hand, may leverage cloud technologies like Google's Dataflow through EESy's service SPI (Service Provider Interface).

In a typical scenario, the workflow obtains data from the data ecosystem and passes it as input to the engine ecosystem. It then starts an engine to process the data. The engine produces its output, which the ecosystem returns to the workflow. The engine ecosystem provisions and manages disk storage that the engine may use for its input data, its output data, to write log information, or just for temporary files. A workflow may choose to upload input data and download the output. Alternatively, it may provide URL references and access credentials to avoid costly file transfers.

Diagram 1

Efficient, effective, adaptable


The EESy has been designed to fully leverage the cloud. Storage, engines and hence execution expands elastically, performance scales horizontally against users and workflows, communication is encrypted, and access controlled through secure tokens. Below we provide some highlights of our solution's scalability.

Scalability

The EESy's storage service takes advantage of Kubernetes elastic storage scaling. EESy creates a data volume container pod for each session in the Kubernetes cluster. Using the Kubernetes dynamic storage provisioning feature, the pod allocates disk storage as needed for file stores and exports these disks as shared file systems to the engines. File stores backed by external storage are mounted into the pod and proxied to the engines. An nginx server implements file upload and download on the workflow side.

Security

Running volume pods for each session delivers horizontal scalability and separates data access without need for user accounts to ensure that an engine can only see the data belonging to its session. Thus, security is baked into the architecture from the beginning. File system export from the volume pod utilizes the secure ssh protocol and sshfs. EESy generates new encryption keys for login and network communication with every instance and forces client processes into a chroot jail that limits file system visibility to exported paths. Other settings for security hardening include process execution under dedicated system accounts, enforcement of strict host key checking, and disablement of any form of user authentication other than through cryptographic keys. In addition, cloud storage buckets are protected from unauthorized access with cryptographic tokens over secure https networking protocol.

Separation of Concerns

The EESy architecture embraces the idea that the workflows are abstracted from the engines, the engines are abstracted from the workflows and the two communicate through simple common APIs. Based on that, engines deployed in EESy can focus on science and computation and not on cloud underpinning, HTTP response codes, or web service security as these are taken care by the EESy infrustructure.