Infrastruktur & Datenbanktechnologie06. Juni 2023Tom Lieber

Exasol Virtual Schema Performancetest

In today’s world, where companies manage vast amounts of data, efficient data management is increasingly crucial. One solution to optimize data handling and improve performance is the use of Virtual Schemas in Exasol, a high-performance in-memory database.

This performance test demonstrates how Exasol Virtual Schemas can reduce the active data footprint in an Exasol cluster environment. The underlying concept is to offload data to a smaller Exasol cluster equipped with a storage (memory) license, enabling better resource utilization.

Offloading data to a smaller cluster enhances performance by reducing the volume of active data stored on the primary Exasol instance. This approach also offers greater flexibility, as data does not need to remain on the main cluster at all times, leading to more efficient resource allocation.

For this test, the Virtual Schema Adapter—written in Java—was used to interact with Exasol via JDBC, showcasing how Virtual Schemas can simplify data management while maintaining high performance.

First we create the schema and the adapter:

CREATE SCHEMA X_SCHEMA_FOR_VS_SCRIPT;


CREATE OR REPLACE JAVA ADAPTER SCRIPT
X_SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL AS
	%scriptclass com.exasol.adapter.RequestDispatcher;
	%jar /buckets/bfsdefault/vschema/virtual-schema-dist-10.5.0-exasol-
7.1.1.jar;
/

We then define two connections to our Exasol instance, one JDBC and one native Exasol connection:

CREATE OR REPLACE CONNECTION JDBC_CONNECTION_EXA_DEV1
TO 'jdbc:exa:1.112.32.331..333/FINGERPRINT:8565'
USER 'SYS'
IDENTIFIED BY 'xx';


CREATE OR REPLACE CONNECTION EXA_CONNECTION_DEV1
TO '1.112.32.331..333/FINGERPRINT:8565'
USER 'SYS'
IDENTIFIED BY 'xx';

After that we create the Virtual Schema with the JDBC connection we created earlier:

CREATE VIRTUAL SCHEMA VIRTUAL_EXASOL_DEV1
USING X_SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL WITH
	CONNECTION_NAME = 'JDBC_CONNECTION_EXA_DEV1'
	SCHEMA_NAME = 'HISTORY_ARCHIVE'
	IMPORT_FROM_EXA = 'true'
	EXA_CONNECTION = 'EXA_CONNECTION_DEV1'
	MAX_TABLE_COUNT = '10000';

After that, the data will be merged with the data from the other source:

Since we do not want that another query is necessary than before. It should remain transparent for our customer!

create or replace view FULL_DATA
as select * from ACTUAL_DATA
where datadate > sysdate-interval '1' MONTH
union all
select * from VIRTUAL_EXASOL_DEV1.ARCHIVE_DATA
where datadate < sysdate- interval '1' MONTH;

When we do a performance test we see that there is a clear difference if we do a query on a large database or on 2 databases when they are linked via Union All and Virtual Schema.

[IMAGE]

DEV1 is really slow – but cheap!

The optimizer of course passes the Where Clause, but still fetches all the necessary data over the network. And that takes time. Note: Runtime in seconds.

The performance of queries using Virtual Schemas in Exasol depends heavily on the complexity and size of the query. For calculations requiring only a small amount of data, the impact on performance is less significant. However, in our tests, we observed a slowdown factor ranging from 11 to 157, even with simple queries. This highlights the importance of understanding these limitations, as it provides valuable insights for optimizing data management and query strategies in similar environments.

SPHINX.AT

YAITCON