Intro Picture

Introduction

To bulk load large amount of data into Oracle Database, you typically use Direct Path Load. This approach uses optimized, direct path API, which bypasses conventional INSERT path, and formats and writes database blocks directly to the database. For example, Direct Path Load may be used to load large data sets from Object Storage into Autonomous Data Warehouse via external tables or DBMS_CLOUD.COPY_DATA utility.

But there are cases when the Direct Path Load is not appropriate. Consider an Internet of Things (IoT) workload, with devices generating huge amount of events. These events have to be inserted to Oracle Database with minimal latency, to support operational analytics and data services. If we use Direct Path Load, we would need to batch the events before loading, and the latency would suffer.

In this blog post I compare 4 methods of inserting data into Oracle Database. 3 standard methods that use a conventional INSERT path, and a “Fast Ingest” method which was introduced in Oracle Database 19c. I am interested in the insert performance and scalability of these methods, whether these methods support compression, and for which use cases they are suitable.

Note this post is not a benchmark. My objective is a comparison of methods with out-of-the-box setup; not finding the fastest insert performance on a tuned system. Your performance will differ, depending on your data model, data distribution, data sizes, configuration, and other parameters. I highly recommend you test performance on your data.

Summary

With out-of-the-box configuration of Autonomous Data Warehouse (ADW), with data model and loading program not specifically tuned and optimized for the high-frequency load, and with real-life data model with JSON payload I observed

Insert throughput of 173k records per second with Fast Ingest and 16 parallel threads.
Insert throughput of 118k records per second with Array Interface and 24 parallel threads.

This throughput was achieved with the following configuration:

Autonomous Data Warehouse Serverless, with 16 ECPUs, Autoscaling disabled, and public access.
Compute VM with 16 OCPUs, running Oracle Linux and the load load generator client.
Python3 based load generator, using python-oracledb package and LOW service.
Single non-partitioned and not indexed target table with JSON payload in VARCHAR2(4000) column.
Average record size of 1360 bytes with randomly generated data.
No other workloads were running on both the ADW and Compute instances.

With Array Interface, data was compressed with HCC, achieving compression ratio of 5-6x. Fast Ingest does not support compression in 19c - it is coming with Oracle Database 23c.

Based on the observations described in the post, I recommend

Using Fast Ingest method in case your workload does not require durability guarantee. Fast Ingest provides the best insert throughput, but there is a chance of data loss if for example the instance crashes before newly inserted data is written from the large pool to the target table.
Using Array Interface if you require standard durability guarantee and transaction handling. Inserting data with Array Interface has lower throughput than Fast Ingest, but there are no limitations regarding the transaction behaviour, structure of the target table, or the insert statement.
Do not use single row insert (Single and Batch scenarios), as this method has by far the lowest throughput and it does not provide any other benefits compared to Array Interface. (The reason I included it in the comparison is mostly to show the difference in throughput.)

Design

Concept

The hypothetical use case I tested is depicted below:

Concept

Devices - sensors, equipments, machines and other devices that produce events.
Gateways - gateways on the edge or in the cloud, which collect events from devices, optionally apply transformations or filtering, and insert events into the data store.
Operational Data Store - database storing history of events; querying, analyzing, and serving the events to analytical tools or data services.
Consumers - various applications using device data, such as analytical tools, data services, data sharing tools, AI/ML platforms, or other downstream applications.

This post focuses on the methods for high frequency inserts from gateways to the Operational Data Store. It assumes the Operational Data Store is implemented with Oracle Database, particularly with Autonomous Data Warehouse Serverless.

Simulation

To test the high frequency inserts, I simulated gateways with Python based load-generator available on Github.

Concept

Worker - program that generates data and inserts them into the database. Every worker opens a single connection to the database and it inserts data sequentially according to the selected scenario. Number of workers (threads) defines the parallelism of the load.
Driver - program that initializes the environment, creates and runs requested number of workers, and produces output statistics after all workers finish.
Client VM - single compute instance with driver and workers. Note the load-generator does not support distributing workers across multiple compute instances.

Scenarios

I tested the following insert methods:

Single - Records are inserted into the database one-by-one and immediately committed.
Batch - Records are inserted into the database one-by-one and committed after a batch of records is inserted.
Array - Records are inserted into the database in arrays and committed after every array insert.
Fast - Records are inserted into the database in arrays using the Fast Ingest available in Oracle Database 19c.

Single

This is the “transactional” approach, which inserts data using a single row INSERT statement and immediately commits after the insert. The simplified Python code looks like this:

start_ts = datetime.datetime.today()
while (datetime.datetime.today()-start_ts).total_seconds() <= 600:
    data_array = data_generator()
    for data_record in data_array:
        data_record_js = json.dumps(data_record)
        cursor.execute('insert into target_table (ts, payload) values (:ts, :payload)', ts=datetime.datetime.today(), payload=data_record_js)
        connection.commit()

Batch

This approach also inserts data using a single row INSERT statement, but it commits the transaction after all records (i.e., batch) returned by the generator are inserted. Compared to the previous option, this method significantly reduces the frequency of COMMITs.

start_ts = datetime.datetime.today()
while (datetime.datetime.today()-start_ts).total_seconds() <= 600:
    data_array = data_generator()
    for data_record in data_array:
        data_record_js = json.dumps(data_record)
        cursor.execute('insert into target_table (ts, payload) values (:ts, :payload)', ts=datetime.datetime.today(), payload=data_record_js)
    connection.commit()

Array

This approach inserts the whole array of rows in a single call, using the Oracle Database array interface. Data is committed after the array insert. Compared to the previous option, this method significantly reduces generated redo and the network round-trips between the client and Oracle Database.

Note that to use the array interface in Python, it is necessary to use executemany() instead of execute().

start_ts = datetime.datetime.today()
while (datetime.datetime.today()-start_ts).total_seconds() <= 600:
    data_array = data_generator()
    insert_array = []
    for data_record in data_array:
        data_record_js = json.dumps(data_record)
        insert_array.append((datetime.datetime.today(),data_record_js))
    cursor.executemany('insert into target_table (ts, payload) values (:ts, :payload)', insert_array)
    connection.commit()

Fast

This approach uses Fast Ingest, available since Oracle Database 19c. Fast Ingest does not use the conventional path. It buffers data into a special area in the large pool memory and writes them to the target table asynchronously by background processes. This should allow the Fast Ingest to support higher data insertion bandwidth than the other methods.

There is no commit and no rollback with the Fast Ingest. Data is also not durable until written from large pool to the table by background processes, hence this approach is not suitable if you need durability guarantee. See Using the Fast Ingest for more details.

To use Fast Ingest, table must be created or altered with MEMOPTIMIZE FOR WRITE clause. Also, you need to use hint /*+MEMOPTIMIZE_WRITE*/ in the INSERT statement. Note that in the Autonomous Database hints are implicitly disabled. Therefore, you must enable hints with alter session set optimizer_ignore_hints = false prior to using the Fast Ingest.

start_ts = datetime.datetime.today()
while (datetime.datetime.today()-start_ts).total_seconds() <= 600:
    data_array = data_generator()
    insert_array = []
    for data_record in data_array:
        data_record_js = json.dumps(data_record)
        insert_array.append((datetime.datetime.today(),data_record_js))
    cursor.executemany('insert /*+ MEMOPTIMIZE_WRITE */ into target_table (ts, payload) values (:ts, :payload)', insert_array)

Data Model

Target Table

For the test I used single table with few metadata columns and JSON payload. Note the payload is relatively small, allowing to use VARCHAR2(4000) data type, which is stored inline in the table and which supports HCC compression of JSON columns.

create table target_table (
  id           varchar2(40) not null,
  run_id       varchar2(40) not null,
  scenario     varchar2(20) not null,
  ts           timestamp not null,
  payload      varchar2(4000),
  constraint gl_stream_single_is_json check (payload is json)
)
/

id - unique identifier of the record.
run_id - identifier of the testing run.
scenario - scenario being tested in the testing run.
ts - timestamp when the record was inserted.
payload - JSON payload.

As I used Autonomous Data Warehouse (ADW) instance for the test, the table supports HCC compression out of the box. When you check table DDL, you can see it is implicitly created with COMPRESS FOR QUERY HIGH ROW LEVEL LOCKING clause.

However, as compression is not supported with Fast Ingest, to test the Fast Ingest method I had to create the target table with NOCOMPRESS MEMOPTIMIZE FOR WRITE clause instead.

Payload

The JSON paylod mimics General Ledger transactions, with randomly generated data. The JSON structure is simple, without arrays and nested structures. Example of the payload looks like this:

{
  "journal_header_source_code": "NWJ",
  "journal_external_reference": "I9GD11AQ6SGZCIPPV52J",
  "journal_header_description": "easdtoqyoldrilnkdupgozmsulhgtmvckddihewacjtxvovvz",
  "period_code": "202404",
  "period_date": "2024-04-30T00:00:00",
  "currency_code": "EUR",
  "journal_category_code": "LFD",
  "journal_posted_date": "2024-04-12T00:00:00",
  "journal_created_date": "2024-04-12T00:00:00",
  "journal_created_timestamp": "2024-04-12T14:10:54.918070",
  "journal_actual_flag": "Y",
  "journal_status": "DXO",
  "journal_header_name": "emsvrgdockdeesyltygsx",
  "reversal_flag": "N",
  "reversal_journal_header_source_code": "",
  "journal_line_number": 60,
  "account_code": "A766",
  "organization_code": "R714",
  "project_code": "P595",
  "journal_line_type": "CR",
  "entered_debit_amount": 0,
  "entered_credit_amount": 8286.994542015054,
  "accounted_debit_amount": 0,
  "accounted_credit_amount": 8286.994542015054,
  "journal_line_description": "ydhyvsakqhhldvpzlhtufnbtchnubwngmadavupxtmgjwpzegai"
}

Configuration

The test configuration consists of an instance of Autonomous Data Warehouse Serverless (ADW) with the target table and Compute VM with the load generator program. Both components are deployed in the OCI region UK South (London).

Configuration

Autonomous Data Warehouse Serverless - configured with 16/8/4 ECPUs (without Autoscaling), 1 TB of storage, and Secure access from anywhere (i.e., without private endpoint and with mandatory mTLS). Note that 4 ECPUs correspond approximately to 1 OCPU, i.e., 16 ECPUs to 4 OCPUs.
Compute VM - configured with VM.Standard.E4.Flex shape with 16 OCPUs and running Oracle Linux 7.9. The VM is placed in a public subnet and it communicates with ADW over public IP.
Connectivity - Compute VM accesses ADW over SQL Net protocol using TCP and port 1522. The traffic is routed over Internet Gateway.

Execution Parameters

I tested the insert performance for the following combination of parameters:

Test Duration - 10 minutes (600 seconds).
Service - LOW (as we generate transactional, not analytical load).
ADW Shape - 16 ECPUs, 8 ECPUs, 4 ECPUs.
Scenarios - Single, Batch, Array, Fast.
Number of Workers (Threads) - 1, 2, 4, 8, 16, 24, 32, 40.

The observed results for 16 ECPUs are in the table Test Results below.

Observations

Insert Throughput

The first metric I was interested in is the overall insert throughput, which I calculated as number of inserted records per one second. For 16 ECPUs, the insert throughput looks like this:

Insert Throughput

Fast scenario provides the best throughput, with peak of 173k records inserted per second by 16 threads, and 218k records inserted by 40 threads respectively.
Array scenario comes second, with peak of 118k records inserted per second by 24 threads. Fast scenario has approximately 2x higher throughput for the same number of threads than Array scenario.
Batch scenario has significantly lower throughput, with peak of over 22k records inserted per second by 40 threads. Array scenario has almost 10x higher throughput for the same number of threads than Batch scenario.
Single scenario provides (unsurprisingly) the worst throughput, with peak of over 5k records inserted per second by 40 threads. Batch scenario has 3-4x higher throughput for the same number of threads than Single scenario.

Insert Throughput per Thread

The second metric is an average throughput per thread, measured as number of inserted records per one second by one thread, averaged over the threads. For 16 ECPUs, the insert throughput per thread looks like this:

Insert Throughput per Thread

Fast scenario supports over 12k inserts per second and thread, but only up to 4 threads. With higher parallelism the throughput decreases, with significant drop for 24 and more threads.
Array scenario supports over 6k inserts per second and thread, but only up to 8 threads. With more parallelism the throughput gradually decreases, with bigger drop for 32 and more threads.
Batch scenario supports over 570 inserts per second and thread. This throughput is consistent across the whole range of threads; there is no significant drop.
Single scenario supports over 200 inserts per second and thread. This throughput is consistent up to 8 threads, after that the throughput gradually decreases.

Maximum Parallelism for ADW Shape

The next chart shows the maximum level of parallelism that an ADW shape can support, tested with 16 ECPU, 8 ECPU, and 4 ECPU shapes. The maximum level of parallelism is defined as maximum number of threads, for which the throughput scales.

For the Array scenario, you can clearly see that the maximum parallelism is 8 threads for 4 ECPUs, 16 threads for 8 ECPUs, and 24 threads for 16 ECPUs:

Maximum Parallelism Array

For the Fast scenario, the results are not so obvious. The throughput scales up to 8 threads for 4 ECPUs, 16 threads for 8 ECPUs, and also 16 threads for 16 ECPUs. However, after a dip it seems to grow again.

Maximum Parallelism Fast

CPU Utilization of ADW Instance

A possible explanation of the throughput behavior observed in previous sections is that with higher parallelism we reach ADW CPU utilization maximum. The following chart shows the median ADW CPU utilization 16 ECPUs.

ADW CPU Utilization

Array scenario is obviously constrained by the ADW CPU utilization. Somewhere between 24 and 32 threads the ADW reaches 80% CPU utilization. This observation is well aligned with the maximum parallelism of 16 threads, i.e., before the ADW CPU is fully utilized.
Fast, Batch, and Single scenarios are constrained by some other resource, as there is plenty of spare ADW CPU capacity. Curiously, even though Fast scenario provides the highest throughput, it needs the least amount of ADW CPU.

Data Compression

The final chart shows how much storage is allocated. To compare the results between scenarios with different amount of generated data, I measured average storage required by a single record.

Data Compression

Fast, Batch, and Single scenario store data without HCC compression, with average row size of 1360 bytes. This is no suprise - Fast Ingest does not support compression in 19c and the target table must be explicitly created with NOCOMPRESS. And Batch and Single scenarios do not support HCC compression by design (single row insert).
Array scenario on the other hand does support HCC compression. I admit I did not expect that. Average row size is 250 bytes, with the compression ratio of 5-6x. The compression ratio highly depends on the data - in our case, descriptive fields are randomly generated, which probably leads to lower ratio than normal data.

Note that Fast Ingest in 23c will support compression, as outlined in the blog Oracle Database 23c Fast Ingest Enhancements.

Test Results

The table shows results that I used for the charts above. Note these results were produced with 16 ECPUs allocated for the ADW instance, with Autoscaling disabled.

Scenario	Threads	Total Inserts	Inserts per Thread	Inserts per Second	Inserts per Second and Thread	Used Bytes	Bytes per Row	ADW CPU Utilization %	ADW Client Sent Bytes	Client CPU Utilization %
single	1	161476	161476	269	269	218103808	1351	1.31	17471416	0.17
single	2	297206	148603	495	248	402653184	1355	2.68	31151767	0.3
single	4	514973	128743	858	215	696254464	1352	5.12	56239732	0.49
single	8	1006353	125794	1674	209	1409286144	1400	9.92	108626727	0.92
single	16	1909146	119322	3182	199	2616197120	1370	19.51	205437397	1.94
single	24	2665583	111066	4443	185	3587178496	1346	27.88	286919810	2.92
single	32	3024726	94523	5041	158	4091543552	1353	33.19	320723761	3.53
single	40	3447097	86177	5745	144	4673503232	1356	38.78	374464698	4.05
batch	1	312210	312210	520	520	419430400	1343	1.25	33544602	0.22
batch	2	697947	348974	1163	582	939524096	1346	2.69	74481470	0.47
batch	4	1371920	342980	2287	572	1866465280	1360	5.09	147525388	0.89
batch	8	2807957	350995	4680	585	3811573760	1357	10.28	302819487	1.91
batch	16	5699053	356191	9498	594	7702839296	1352	21.1	613625382	4.14
batch	24	8473112	353046	14122	588	11460935680	1353	31.73	914679629	6.76
batch	32	11150937	348467	18585	581	15014559744	1346	43	1202846932	9.16
batch	40	13715453	342886	22859	571	18433966080	1344	54.23	1483715019	11.62
array	1	3905174	3905174	6509	6509	998244352	256	2.62	392426992	0.97
array	2	7449168	3724584	12436	6218	1946157056	261	4.98	790648022	1.77
array	4	15012417	3753104	25021	6255	3759144960	250	10.3	1590553386	3.55
array	8	29071138	3633892	48452	6056	7314866176	252	20.93	3074097328	7.16
array	16	54995783	3437236	91660	5729	13689159680	249	44.71	5870053062	14.52
array	24	71142564	2964274	118769	4949	17716740096	249	70.3	7700615372	21.44
array	32	70129137	2191536	116882	3653	17445158912	249	98.05	7291388140	24.19
array	40	63360903	1584023	105602	2640	15777923072	249	100	5820054045	21.5
fast	1	7375197	7375197	12292	12292	10092544000	1368	1.02	774367576	1.75
fast	2	15095665	7547833	25201	12601	20561526784	1362	2.16	1593179223	3.39
fast	4	30282148	7570537	50470	12618	41224765440	1361	4.18	3189663856	6.75
fast	8	55831396	6978925	93052	11632	75959894016	1361	8.28	5875185001	14.08
fast	16	104297635	6518602	173829	10864	141947830272	1361	16.49	11196136570	27.05
fast	24	102562062	4273419	170369	7099	139527716864	1360	17.38	10778883774	32.92
fast	32	103225392	3225794	171186	5350	140403277824	1360	18.82	11678089935	32.05
fast	40	130837615	3270940	218063	5452	177978998784	1360	23.22	13739915894	47.56

Scenario - testing scenario.
Threads - number of processes (workers), inserting data into the target database in parallel.
Total Inserts - total number of records inserted into the target database by all threads during 10 minutes.
Inserts per Thread - average number of records inserted into the target database by a single thread during 10 minutes.
Inserts per Second - average number of records inserted into the target database by all threads during 1 second.
Inserts per Second and Thread - average number of records inserted into the target database by single thread during 1 second.
Used Bytes - number of bytes allocated by all inserted records in the database, as reported by the dictionary view USER_SEGMENTS.
Bytes per Row - average number of bytes allocated by a single record in the database, as reported by the dictionary view USER_SEGMENTS.
ADW CPU Utilization % - average CPU utilization of the ADW instance, retrieved from Monitoring service with the query CpuUtilization[1m]{resourceId = "<ADW OCID>"}.mean().
ADW Client Sent Bytes - average number of bytes per minute sent from the client VM to the ADW instance, retrieved from Monitoring service with the query SQLNetBytesFromClient[1m]{resourceId = "<ADW OCID>"}.mean().
Client CPU Utilization % - average CPU utilization of the client VM instance, retrieved from Monitoring service with the query CPUUtilization[1m]{resourceId = "<Client VM OCID>"}.mean().

Considerations

Changes in Data Model

For your workload, you might want to consider some changes in the data model that will impact the throughput performance. You should test the throughput with your data model to get realistic figures.

Change the data model for your data.
Add primary or unique keys constraints.
Add analytical indexes.
Change JSON data type to BLOB if you have large JSON documents.
Partition the target table by time.
Insert data into multiple target tables.
Add materialized views with refresh ON COMMIT.

Tuning

For large data volumes, requiring performance beyond the figures achieved with out-of-the-box setup, you might need additional tuning of the data model, data loader, or configuration. Some options to consider are below.

Scale up the ADW instance and/or enable Compute Autoscaling.
Eliminate gateways in the traffic between clients and database.
Analyze the contention bottlenecks for Fast and/or Array scenarios.
Optimize array size and commit frequency.
Partition for scalability.
Use Direct Path Load with micro-batching.
Shard with Oracle Sharding or manually.

I also recommend reviewing Oracle White Paper by Andy Rivenes on Best Practices For High Volume IoT workloads with Oracle Database 19c for ideas on other optimization methods.

Resources

Using Oracle Autonomous Database Serverless: Use Fast Ingest on Autonomous Database.
Oracle Database 19c, Database Performance Tuning Guide, Tuning the System Global Area: Using the Fast Ingest.
Oracle Database 19c, Utilities: Data Loading Methods.
Oracle Database 19c, PL/SQL Packages and Types Reference: DBMS_MEMOPTIMIZE.
Oracle Database In-Memory blog by Andy Rivenes: New in Oracle Database 19c: Memoptimized Rowstore – Fast Ingest.
Oracle Database In-Memory blog by Andy Rivenes: Memoptimized Rowstore - Fast Ingest Updates.
Oracle Database In-Memory blog by Andy Rivenes: Oracle Database 23c Fast Ingest Enhancements.
Oracle White Paper by Andy Rivenes: Best Practices For High Volume IoT workloads with Oracle Database 19c.
Program to test high frequency inserts: load-generator.