Intro Picture

Introduction

The last step in most data pipelines is publishing the final data product to entitled internal and external consumers. With Oracle Autonomous Database (ADB), you have several ways how to securely publish data:

SQL - consumers access data directly via SQL from JDBC or ODBC clients.
REST API - data is published as REST API, using Oracle REST Data Services (ORDS).
GraphQL - data is published as GraphQL queries, also supported by ORDS.
Streaming - database transactions are streamed via GoldenGate service to Kafka or other messaging systems.
Object Storage - data is exported to Object Storage and accessed via S3 API or OCI API.
Delta Sharing - data is shared from ADB using Delta Sharing open-source protocol.

In this post I will demonstrate how you can easily use Delta Sharing for automated publishing of General Ledger star schema from the Autonomous Database.

My intention is to show data sharing on realistic use case and with reasonably large data set. General Ledger star schema is an example of a typical dimensional model you can find in many data warehouses, with large fact table and multiple smaller dimensions. I simulate the daily load frequency of this schema and show how the data sharing can be incorporated into the loading pipeline.

Why did I choose Delta Sharing? It has several advantages over the other options:

It is designed for publishing of large data sets.
It supports versioning of data sets and incremental publishing.
It shares both data and metadata (data schemas).
It supports fast acces to shared data.
Accessing the shared data does not impact the data provider.
It is designed for heterogeneous environments, across technologies, clouds, and organizations.
Delta Sharing is open-source protocol, supported by many technologies.

Post Topics

As this is quite long post, here are the links to individual sections.

Use Case - source data, loading pipeline, and data sharing requirements.
Concept - Delta Sharing concept and how does it fit our scenario.
Creating Data Share - how to create and configure data share.
Publishing Data Share - how to publish data share, automate the publishing, and monitor performance.
Managing Data Share Versions - how to manage data share versions.
Creating Data Share Recipient - how to create recipient and generate data share profile.
Accessing Data Share - how to connect to data share and refresh authentication token.
Resources - links to useful documentation pages, blog posts, and other resources.

Use Case

Data Model

The data to be shared represents a simple star schema with General Ledger transactions (called journals) as fact table.

Data Model

GL_JOURNALS - fact table with the granularity of GL journal lines.
GL_ACCOUNTS - dimension containing GL account hierarchy.
GL_CURRENCIES - dimension containing currencies.
GL_ORGANIZATIONS - dimension containing organizational hierarchy.
GL_JOURNAL_LINE_TYPES - dimension with transaction types.
GL_PERIODS - dimension containing accounting periods.
GL_PROJECTS - dimension containing project hierarchy.

The tables are located in the database schema SHARE_PROVIDER.

Data Pipeline

The General Ledger star schema is loaded via Data Integration pipeline gl-pipeline running with daily frequency.

Data Pipeline

Dimensions are loaded in parallel as they do not depend on each other. Once all the dimensions are up-to-date, the pipeline proceeds with the load of the fact table. The fact table GL_JOURNALS is interval partitioned by JOURNAL_POSTED_DATE. Every daily load produces a new partition in the fact table.

All the tables belonging to the General Ledger star schema (i.e., the fact table and all the dimensions) must be shared with the consumers.
The new data loaded by the pipeline gl-pipeline must be shared after the whole pipeline is finished. The shared data must be consistent and valid as of the end of the pipeline. If the pipeline run does not finish for the day, the consumers must see data as of the previous day.
Data must be shared only with the consumers registered by the data steward of the General Ledger star schema. Data steward must be able to add new consumers or revoke access from existing consumers anytime.
The load of ADB instance with the General Ledger star schema should not be impacted by consumers accessing and analyzing the shared data. In other words, the access to shared data should be decoupled from the ADB usage.
And finally, it must be possible to automate all the steps for publishing and managing shared data. Data publishing step should be added to the end of the data pipeline.

Concept

Delta Sharing is a modern data sharing protocol, initially developed by Databricks and later open-sourced. It enables secure and efficient data sharing across locations and organizations. It provides high-performance and cost-effective solution for sharing large datasets without the need for data duplication or complex ETL processes.

Oracle Autonomous Database supports the Delta Sharing protocol, both as a Data Provider and as a Data Recipient. In this post I will show how to use ADB as the Data Provider.

Delta Sharing Architecture in ADB

Oracle Autonomous Database supports creation and management of data shares and recipients. It also contains an implementation of the Delta Sharing server, that receives requests from Delta Sharing clients, authenticates and authorizes the requests, and generates short-term pre-authenticated requests (PARs) to reach the shared data. Data is stored in OCI Object Storage, which is directly accessed by clients via PARs.

Note the Delta Sharing support is currently available with Autonomous Database Serverless. It is not yet available with Autonomous Database on Dedicated Exadata Infrastructure.

Terms

Cloud Storage Link - location where shared data is exported (Object Storage bucket and credential).
Data Share - named collection of tables or views that are shared with recipients.
Data Share Recipient - internal or external consumer who can access data share.
Data Share Version - data share is versioned; recipients may usually access only the latest version.
Data Share Publishing - process which exports data share and makes the new version available to recipients.
Activation Link - link used to download data share profile with authentication token.
Data Share Profile - JSON document with Delta Sharing endpoint and authentication token.

Meeting the Requirements

Delta Sharing in Autonomous Database nicely fits the scenario and all the requirements.

To share data, you have to create a data share and specify tables and/or views in the data share. Consumers (recipients in Delta Sharing terminology) will see all the tables and views that are part of the data share. In our scenario, the data share will contain all the tables belonging to the General Ledger star schema.
Shared data is not visible until it is published. ADB supports sharing of versioned data - every time a data share is published, a new version is created containing all the data in shared tables as of the time of publishing. In our scenario, the data share will be published every day, after all the tables in the General Ledger star schema are loaded.
The publisher must specify recipients of the data share. The publisher may choose to share the profile link with recipients via email or some other methods. The profile contains the data share endpoint and the authentication token. Existing recipients may be removed from the data share if required.
When recipients access the data share, they do not access data in the Autonomous Database. Instead, they access Parquet files in the Object Storage bucket, which are created when the data share is published. Reading from the Object Storage does not impact the ADB database.
All the data sharing steps may be scripted and automated with the dictionary package DBMS_SHARE. This post shows how to do so. You may also try an excellent LiveLab Implement Data Sharing with PL/SQL.

Create Object Storage Bucket

Data shared from ADB via Delta Sharing will be stored in an Object Storage bucket. The bucket must exist before the data share is configured.

oci os bucket create --compartment-id <compartment OCID> --namespace-name <namespace> --name adb-delta-sharing

This command uses OCI CLI to create new bucket adb-delta-sharing that will store data shared from ADB.

Enable User

In our scenario, data will be shared by the user SHARE_PROVIDER (owner of the General Ledger schema). Before sharing data, the user must be enabled to use the DBMS_SHARE package and the data share tool in Data Studio.

begin
  dbms_share.enable_schema ( schema_name => 'SHARE_PROVIDER', privileges => dbms_share.priv_ords_acl );
  dbms_cloud_admin.enable_resource_principal ( username => 'SHARE_PROVIDER' );
  ords_admin.enable_schema ( p_schema => 'SHARE_PROVIDER' );
end;
/

The privilege PRIV_ORDS_ACL is used later to generate bearer token for data share recipients by calling a local REST.

Note I enabled also Resource Principal authentication for accessing OCI sevices and usage of Oracle REST Data Service (ORDS). Instead of Resource Principal credential, you can use an API Key based credential.

The statement above has to be executed by an ADB administrator, i.e. user ADMIN. From this step onward, all the other SQL and PLSQL commands should be executed by user SHARE_PROVIDER.

Create Cloud Storage Link

Object Storage bucket is linked with data share via an object called Cloud Storage Link. Once the Cloud Storage Link is created, it must be assigned a credential, used for authenticating access to the bucket.

begin
  dbms_share.create_or_replace_cloud_storage_link (
    storage_link_name => 'DATA_SHARE_CLOUD_STORE',
    uri => 'https://objectstorage.uk-london-1.oraclecloud.com/n/<namespace>/b/adb-delta-sharing/o/'
  );
  dbms_share.set_storage_credential (
    storage_link_name => 'DATA_SHARE_CLOUD_STORE',
    credential_name => 'OCI$RESOURCE_PRINCIPAL'
  );
  commit;
end;
/

This command creates Cloud Storage Link DATA_SHARE_CLOUD_STORE for bucket adb-delta-sharing, with Resource Principal credential. You may validate the Cloud Storage Link by querying the dictionary view USER_LINEAGE_CLOUD_STORAGE_LINKS.

select storage_link_name, credential_name, uri
from user_lineage_cloud_storage_links
order by storage_link_name
/

When the Cloud Storage Link is created, you can create the data share and assign Cloud Storage Link to it.

begin
  dbms_share.create_share (
     share_name => 'GL_JOURNALS_SHARE',
     share_type => dbms_share.share_type_versioned,
     storage_link_name => 'DATA_SHARE_CLOUD_STORE',
     description => 'Sample share containing General Ledger Journals star schema',
     log_level => dbms_share.log_level_detail
  );
  commit;
end;
/

This command creates versioned data share GL_JOURNALS_SHARE with data stored in the Cloud Storage Link DATA_SHARE_CLOUD_STORE. You may validate the created data share by querying the dictionary view USER_SHARES.

select share_name, share_type, current_version, description
from user_shares
where share_name = 'GL_JOURNALS_SHARE'
/

Once the data share is created, you can add tables or views to the Share. Note that adding tables to the data share does not export any data. For this, the Share must be published.

begin
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_ACCOUNTS'
  );
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_CURRENCIES'
  );
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_JOURNAL_LINE_TYPES'
  );
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_ORGANIZATIONS'
  );
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_PERIODS'
  );
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_PROJECTS'
  );
  dbms_share.add_to_share (
    share_name => 'GL_JOURNALS_SHARE',
    table_name => 'GL_JOURNALS'
  );
  commit;
end;
/

The above statement adds all tables from the General Ledger star schema to the data share GL_JOURNALS_SHARE. You can check the tables belonging to this Share by querying dictionary view USER_SHARE_TABLES.

select share_name, share_schema_name, share_table_name, object_type, table_owner, table_name, status
from user_share_tables
where share_name = 'GL_JOURNALS_SHARE'
order by share_schema_name, share_table_name
/

Optionally, you may also need to modify some property of the data share.

begin
  dbms_share.update_share_property (
    share_name      => 'GL_JOURNALS_SHARE',
    share_property  => 'SPLIT_SIZE',
    new_value       => '64000000'
  );
  commit;
end;
/

This statement modifies property SPLIT_SIZE to 64 MB from the default of 128 MB. This property specifies the maximum size of single file created during publishing of the Share.

Once the data share is created and tables are added to it, you can publish the Share by calling PUBLISH_SHARE or PUBLISH_SHARE_WAIT procedure. Publishing will export all tables in the Share to the bucket defined by Cloud Storage Link. Once the Share is published, recipients may read data from the Share.

begin
  dbms_share.publish_share_wait(
    share_name => 'GL_JOURNALS_SHARE'
  );
end;
/

This statement will publish the Share and wait until the publishing process is finished. If you do not want to wait, use PUBLISH_SHARE instead. You can check the published versions by querying dictionary view USER_SHARE_VERSIONS.

select share_name, share_version, status, num_files, export_time
from user_share_versions
where share_name = 'GL_JOURNALS_SHARE'
order by share_version desc
/

To automate publishing of the General Ledger star schema at the end of the loading process, you can create a new SQL task PUBLISH_GL and add the task to the end of the pipeline.

Data Pipeline with Publish Task

Note that if the pipeline is implemented with OCI Data Integration service, it is necessary to encapsulate the call to DBMS_SHARE into a standalone PLSQL procedure, as OCI Data Integration does not support calling procedures from packages.

create or replace procedure publish_share_wait (
  p_share_name in varchar2
) authid current_user is
begin
  dbms_share.publish_share_wait(
    share_name => p_share_name
  );
end publish_share_wait;
/

Size of Published Data

To understand the size of Object Storage required for published data share, I executed the loading pipeline for every day of April 2023. After every load I measured the size of Object Storage files used by the GL_JOURNALS_SHARE and compared it with the size of the General Ledger star schema in the ADB database.

Size of Published Data

Every daily load adds about 7.6 million rows to the GL_JOURNALS fact table. This corresponds to about 500 MB of data allocated by every partition in the GL_JOURNALS table. The total size for 30 days is about 230 million rows and 16 GB of data. The loading process does not delete any data - the data volumes are constantly growing.

The table is by default compressed with Hybrid Columnar Compression (HCC) with COMPRESS FOR QUERY HIGH. Without the compression, a single partition would consume about 2.3 GB, i.e. the compression ratio for our data is 4.5x. Data added by the dimensional tables are negligible compared to the fact table.

Data published by GL_JOURNALS_SHARE consume about 1.4 GB for every day and 42 GB for the whole month. This means that (for our scenario) the data share consumes about 2.6x more space in Object Storage than what is allocated to the same data in ADB storage.

The higher storage requirements are caused by the different storage mechanism - HCC compressed tables in ADB versus Parquet files used by Delta Sharing. Also, Parquet files are compressed with Snappy compression, with significantly lower compression ratio than HCC.

Note that the figures presented here are specific for our scenario. The compression and comparison ratios will differ depending on your data sets. I strongly recommend testing the ratios on your data sets.

Duration of Publish Process

I also measured the duration of the publish process (i.e., the duration of DBMS_SHARE.PUBLISH_SHARE_WAIT) to understand the impact of this step on the overall duration of the data loading pipeline.

Duration of Publish Process

The duration took about 80 seconds per day and it remained constant during the month, even though the amount of data increased with every day. This is an important observation, as it means the duration depends on the size of the increment, and not on the overall size of the data share.

I believe that the publishing process implemented in ADB takes advantage of the fact that, in our scenario, the fact table GL_JOURNALS is partitioned, with daily load inserting data into one partition only. The process tracks the changes per partition and it can therefore export only the data from the new partition.

Possibly, if the table is not partitioned or the daily load also updates the previously loaded data, the performance of the publishing process will be worse, as the process will have to do more work to identify changed data. Note this is my assumption only; I did not test the performance for different data models and load patterns.

Debugging the Publish Process

In case there are some errors in the publishing process, it is useful to examine the log records by querying the dictionary view USER_SHARE_EVENTS.

select share_name, time, status, object_name, partition_name, details
from user_share_events
where share_name = 'GL_JOURNALS_SHARE'
and status = 'ERROR'
order by share_name, time
/

The log or error message is in the DETAILS column in the JSON format.

{
  "message": "ODI_STATEMENT_FAILED",
  "session_id": "752907641",
  "trace": null,
  "jobID": "32",
  "objectOwner": "SHARE_PROVIDER",
  "objectName": "GL_JOURNALS",
  "odiJobID": "6",
  "odiTaskStatus": "F",
  "partitionName": "SYS_P3131",
  "splitOrder": "2",
  "executionInfo": {
    "status": "FAILED",
    "temp_folder": "/tmp",
    "retry_count": 0,
    "logs": [
      "UPLOAD:File(GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNALS/SYS_P3131/V55_385S-385.parquet) exceeds max size(256MB)."
    ],
    "file_details": [
      {
        "path_uri": "GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNALS/SYS_P3131/V55_385S-385.parquet",
        "file_size": 309993764,
        "generated": true,
        "uploaded": false
      }
    ]
  }
}

Note the above error occured in older version of ADB and it was corrected since then.

Structure of Files in Object Storage

And finally, it is interesting to review the structure of files generated when publishing a data share. Although you cannot influence the naming pattern and you should not manage and access the files manually, you can use this information to understand how different objects are treated when published.

SQL> select object_name
  2  from dbms_cloud.list_objects (
  3    credential_name => 'OCI$RESOURCE_PRINCIPAL',
  4    location_uri => 'https://objectstorage.uk-london-1.oraclecloud.com/n/<namespace>/b/adb-delta-sharing/o/'
  5  )
  6  where object_name like 'GL_JOURNALS_SHARE%'
  7  order by object_name
  8* /

OBJECT_NAME
_______________________________________________________________________________
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_ACCOUNTS/V55_380S-380.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_ACCOUNTS/V56_413S-413.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_CURRENCIES/V55_379S-379.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_CURRENCIES/V56_414S-414.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNALS/SYS_P3131/V55_385S-385.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNALS/SYS_P3131/V55_386S-386.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNALS/SYS_P3132/V56_419S-419.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNALS/SYS_P3132/V56_420S-420.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNAL_LINE_TYPES/V55_381S-381.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_JOURNAL_LINE_TYPES/V56_415S-415.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_ORGANIZATIONS/V55_382S-382.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_ORGANIZATIONS/V56_416S-416.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_PERIODS/V55_383S-383.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_PERIODS/V56_417S-417.parquet ...
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_PROJECTS/V55_384S-384.parquet
GL_JOURNALS_SHARE_667631/SHARE_PROVIDER/GL_PROJECTS/V56_418S-418.parquet ...

1,020 rows selected.

SQL>

As you can see, files use the following naming pattern:

<Share Storage Folder>/<Share Schema>/<Share Table>[/<Partition Name>]/<Share Version>_<File ID>.parquet

Share Storage Folder - top level folder for the data share (see USER_SHARES.STORAGE_FOLDER)
Share Schema - owner of the data share
Share Table - name of table in the data share
Partition Name - name of the partition (for partitioned tables only)
Share Version - published version
File ID - internal file identifier to ensure uniqueness of files

Drop Old Shares

Whenever you publish the versioned data share, a new version of data share is created. This version requires space in the Object Storage. So it is a good practice to regularly purge unused versions to preserve space. There are several procedures you can use to drop old versions. Note that the current version cannot be dropped.

DBMS_SHARE.DROP_SHARE_VERSION - drop a single version.
DBMS_SHARE.DROP_SHARE_VERSIONS - drop range of versions.
DBMS_SHARE.DROP_UNUSED_SHARE_VERSIONS - drop all versions with the exception of current one.

If you want to have an option to return to older versions, you might consider more complex scenario, in which you keep the latest N versions and drop all the others. Below is the procedure DROP_OLD_VERSIONS that implements this logic.

create or replace procedure drop_old_versions (
  p_share_name in varchar2,
  p_keep_versions in number
) authid current_user is
  v_delete_from_version number;
  v_delete_to_version number;
begin
  /* get versions to drop */
  select
    min(share_version), max(share_version)
  into
    v_delete_from_version, v_delete_to_version
  from (
    select share_version, status, rank() over (order by share_version desc) as version_rank
    from user_share_versions
    where share_name = p_share_name
  )
  where version_rank > p_keep_versions;
  /* drop old versions */
  if (v_delete_from_version is not null) then
    dbms_share.drop_share_versions (
      share_name => p_share_name,
      from_share_version => v_delete_from_version,
      to_share_version => v_delete_to_version,
      destroy_objects => true,
      force_drop => true
    );
  end if;
end drop_old_versions;
/

To automate the deletion of old versions you can create a new SQL task DROP_OLD_VERSIONS and add the task to the end of the pipeline, directly after the PUBLISH_GL task.

Data Pipeline with Publish and Drop Task

Return to Previous Version

Sometimes you might need to revert the publish process and return to the previous version. You can do this easily using the procedure:

DBMS_SHARE.SET_CURRENT_SHARE_VERSION - set the version as current.

For example, the following procedure REVERT_TO_PREVIOUS_VERSION returns to the previous version by seting the previous version as current and dropping the latest version.

create or replace procedure revert_to_previous_version (
  p_share_name in varchar2
) authid current_user is
  v_latest_version number;
  v_previous_version number;
begin
  /* get latest and previous versions */
  select
    min(share_version), max(share_version)
  into
    v_previous_version, v_latest_version
  from (
    select share_version, status, rank() over (order by share_version desc) as version_rank
    from user_share_versions
    where share_name = p_share_name
  )
  where version_rank <= 2;
  /* set previous version as current and drop the latest version */
  if (v_previous_version < v_latest_version) then
    dbms_share.set_current_share_version (
      share_name => p_share_name,
      share_version => v_previous_version
    );
    dbms_share.drop_share_version (
      share_name => p_share_name,
      share_version => v_latest_version,
      destroy_objects => true
    );
  end if;
end revert_to_previous_version;
/

Visibility of Versions

By default, recipients see the current version of the data share. When you publish a new version or revert to the previous version, recipients will automatically access the new current version. In most scenarios this is the required behaviour, as it means the recipients will always get the correct data.

However, you can change the default behaviour and allow access to all versions. This is done by modifying the data share and recipient properties VERSION_ACCESS as shown below.

begin
  dbms_share.update_recipient_property (
    recipient_name => 'SAMPLE_RECIPIENT',
    recipient_property => 'VERSION_ACCESS',
    new_value => dbms_share.version_access_any
  );
  dbms_share.update_share_property (
    share_name => 'GL_JOURNALS_SHARE',
    share_property => 'VERSION_ACCESS',
    new_value => dbms_share.version_access_any
  );
end;
/

Create Recipient

A recipient must be created for every consumer of data share. The email address is used by Data Studio to send the activation link.

begin
  dbms_share.create_share_recipient (
    recipient_name => 'SAMPLE_RECIPIENT',
    description => 'Recipient used by OAC reporting',
    email => 'some.user@example.com'
  );
  commit;
end;
/

You can validate the created recipient by querying dictionary view USER_SHARE_RECIPIENTS. Note that email address is not visible through this view.

select recipient_name, recipient_id, recipient_type, description
from user_share_recipients
where recipient_name = 'SAMPLE_RECIPIENT'
/

A newly created recipient does not have access to any data share. The access has to be explicitly granted.

begin
  dbms_share.grant_to_recipient (
    share_name => 'GL_JOURNALS_SHARE',
    recipient_name => 'SAMPLE_RECIPIENT'
  );
end;
/

The data shares a recipient can access are visible in the dictionary view USER_SHARE_RECIPIENT_GRANTS.

select recipient_name, recipient_id, share_name, share_id
from user_share_recipient_grants
where recipient_name = 'SAMPLE_RECIPIENT'
order by share_name
/

Modify Recipient Property

Optionally, you may also need to modify some property of the recipient.

begin
  dbms_share.update_recipient_property (
    recipient_name => 'SAMPLE_RECIPIENT',
    recipient_property => 'TOKEN_LIFETIME',
    new_value => '90 00:00:00'
  );
  commit;
end;
/

This statement modifies property TOKEN_LIFETIME to 90 days. This property specifies the time during which the token is valid. After this time, the recipient looses access to the data share, until he requests a new token.

To use a data share, the recipient needs data share profile. The profile is a small JSON file, containing authentication token, endpoint of the Delta Sharing server, and few other attributes. The profile may be either downloaded from the activation link by the recipient or you may create it and share with the recipient.

Note that ownership of the activation link or the profile enables a person to access the data share. Therefore, you should distribute activation links and profiles securely and protect them like passwords, API keys, and other secrets.

Example of the profile file:

{
  "shareCredentialsVersion" : 1,
  "endpoint" : "https://xxxxxxxxxxxxxxx-myadw.adb.uk-london-1.oraclecloudapps.com/ords/share_provider/_delta_sharing/",
  "bearerToken" : "xxxxxxxxxxxxxxxxxxxxxx",
  "expirationTime": "2024-04-23T07:34:53.702Z",
  "tokenEndpoint" : "https://xxxxxxxxxxxxxxx-myadw.adb.uk-london-1.oraclecloudapps.com/ords/share_provider/oauth/token",
  "clientID" : "xxxxxxxxxxxxxxxxxxxxxxxx",
  "clientSecret" : "xxxxxxxxxxxxxxxxxxxxxxxx"
}

Activation Link

You can generate an activation link and share it with the recipient. When the recipient opens the link in a browser, he can download the profile file. Note the activation link may be used only once and it expires after the specified number of seconds.

declare
  v_activation_link varchar2(1000);
begin
  v_activation_link := dbms_share.get_activation_link(
    recipient_name => 'SAMPLE_RECIPIENT',
    expiration => 1*24*60*60
  );
  dbms_output.put_line(v_activation_link);
end;
/

The activation link looks like this:

https://xxxxxxxxxxxxxxx-myadw.adb.uk-london-1.oraclecloudapps.com/ords/_adpshr/delta-sharing/download?key=xxxxxxxxxxxxxxx=

Generate the Profile

As an alternative to activation link, you can create the profile directly using the POPULATE_SHARE_PROFILE procedure.

declare
  v_share_profile sys.json_object_t;
  v_share_profile_text varchar2(4000);
begin
  dbms_share.populate_share_profile(
    recipient_name => 'SAMPLE_RECIPIENT',
    share_profile => v_share_profile
  );
  v_share_profile_text := v_share_profile.to_string;
  dbms_output.put_line (json_query(v_share_profile_text, '$' pretty));
end;
/

Delta Sharing client requires the profile file to connect to the share. The following screenshot shows how to create a Pandas dataframe from data share using the delta_sharing Python client. Pandas is running in OCI Data Science notebook.

Note that I restricted the size of Pandas dataframe to 10 million rows. The reason is that Pandas loads the whole data set into memory and it crashes if the data set is larger then the available memory. For larger data sets, it is therefore better using Spark instead of Pandas. Or using larger instance with more than 16 GB of memory that I have.

Refresh Authentication Token

You may wonder what happens when the authentication token in the profile expires. In this case the recipient may request a new token from the Delta Sharing server using the REST POST call to the token endpoint.

Refresh Authentication Token

Note that both token endpoint and authentication parameters are provided in the profile. Once you retrieve the new token from the field access_token, you can modify the profile and replace value for bearerToken by this value.

The above example uses Python requests library to call the REST service. You could do the same with Postman or curl.

Resources

Delta Sharing on GitHub: Delta Sharing: An Open Protocol for Secure Data Sharing.
Blog post introducting Delta Sharing in Autonomous Database from Alexey Filanovskiy: Unlimited data-driven collaboration with Data Sharing of Oracle Autonomous Database.
Blog post describing data sharing between ADB and Databricks from Alexey Filanovskiy: How Oracle Autonomous Database Connects with Databricks Across Clouds.
Oracle LiveLab on implementing Delta Sharing with PL/SQL: Implement Data Sharing with PL/SQL.
Oracle LiveLab on implementing Delta Sharing with Data Studio: Implement Data Sharing with ADB Data Studio.
Reference documentation on DBMS_SHARE package in Autonomous Database: DBMS_SHARE Package.

Introduction

Data Sharing Options

Benefits of Delta Sharing

Post Topics

Use Case

Data Model

Data Pipeline

Data Sharing Requirements

Concept

Delta Sharing

Terms

Meeting the Requirements

Creating Data Share

Create Object Storage Bucket

Enable User

Create Cloud Storage Link

Create Data Share

Add Tables to Data Share

Modify Data Share Property

Publishing Data Share

Publish Share

Add Publish Share to Pipeline

Size of Published Data

Duration of Publish Process

Debugging the Publish Process

Structure of Files in Object Storage

Managing Data Share Versions

Drop Old Shares

Add Drop Old Share to Pipeline

Return to Previous Version

Visibility of Versions

Creating Data Share Recipient

Create Recipient

Allow Recipient to Access Data Share

Modify Recipient Property

Get Data Share Profile

Activation Link

Generate the Profile

Accessing Data Share

Connect to Data Share

Refresh Authentication Token

Resources