Skip to main content

Materialized Views and Stream Processing

The core functionality of RisingWave is stream processing, and the way stream processing is presented in streaming databases is through materialized views. This section explains the purpose and usage of RisingWave materialized views.

Characteristics of RisingWave's Materialized Views

Materialized views are not unique to streaming databases. In fact, traditional databases such as PostgreSQL, data warehouses like Redshift and Snowflake, and OLAP databases like ClickHouse and Apache Pinot, all have materialized views. However, RisingWave's materialized views have several important features compared to those in other databases:

Real-time: Many databases update materialized views asynchronously or require manual updates by users. However, materialized views in RisingWave are updated synchronously, ensuring users always query the freshest results. Even for complex queries involving joins and windowing, RisingWave efficiently handles synchronous processing to maintain the freshness of materialized views.

Consistency: Some databases only provide eventually consistent materialized views, meaning the results on materialized views are approximate or contain errors. Especially when users create multiple materialized views with different refresh strategies, it becomes challenging to see consistent results across materialized views. Materialized views in RisingWave are consistent, ensuring that users always see correct results even when accessing multiple materialized views.

High availability: RisingWave persists materialized views and implements frequent checkpoints for fast failure recovery. When a physical node running RisingWave experiences a failure, RisingWave achieves recovery within seconds and updates the calculation results to the latest state in seconds.

High concurrency: RisingWave supports high-concurrency ad-hoc queries. Since RisingWave persistently stores data in remote object storage in real-time, users can dynamically configure the number of query nodes based on the workload, effectively supporting business requirements.

Stream processing semantics: In stream processing, users can apply high-level syntax such as time windows and watermarks to process data streams. Traditional databases do not have these semantics, so users often rely on external systems to handle these semantics. RisingWave is a stream processing system that comes with various complex stream processing semantics, allowing users to operate on data streams using SQL statements.

Resource isolation: Materialized views involve continuous stream computation and consume a significant amount of computing resources. To avoid interference between materialized view computations and other computations, some users transfer the materialized view functionality from OLTP or OLAP databases to RisingWave, achieving resource isolation.

Stream Processing without Materialized Views

In RisingWave, although materialized views are an important presentation of stream processing, it does not mean that users can only create materialized views to perform stream processing. In fact, for simple ETL computations, i.e., using RisingWave merely as a stream processing pipeline to process data generated by upstream systems and send it to downstream systems, there is no need to use materialized views. Users can simply use the create sink statement to directly perform stream processing and export the results.

Sample Code

Most of you are likely familiar with how to create materialized views in PostgreSQL. Here, we demonstrate how to create cascading materialized views in RisingWave, that is, materialized views on top of other materialized views.

We want to create tables t1 and t2, as well as sources s1 and s2. After that, we create a materialized view mv1 on top of t1 and t2, then materialized view mv2 on top of mv1 and mv2. Finally, we create a materialized view mv on mv1 and mv2.

Let's first create t1 t2 s1 s2:

CREATE TABLE t1 (v1 int, v2 int) 
WITH (
connector = 'datagen',

fields.v1.kind = 'sequence',
fields.v1.start = '1',

fields.v2.kind = 'random',
fields.v2.min = '-10',
fields.v2.max = '10',
fields.v2.seed = '1',

datagen.rows.per.second = '10'
) ROW FORMAT JSON;

CREATE TABLE t2 (v3 int, v4 int)
WITH (
connector = 'datagen',

fields.v3.kind = 'sequence',
fields.v3.start = '1',

fields.v4.kind = 'random',
fields.v4.min = '-10',
fields.v4.max = '10',
fields.v4.seed = '1',

datagen.rows.per.second = '10'
) ROW FORMAT JSON;

CREATE SOURCE s1 (w1 int, w2 int)
WITH (
connector = 'datagen',

fields.w1.kind = 'sequence',
fields.w1.start = '1',

fields.w2.kind = 'random',
fields.w2.min = '-10',
fields.w2.max = '10',
fields.w2.seed = '1',

datagen.rows.per.second = '10'
) ROW FORMAT JSON;


CREATE SOURCE s2 (w3 int, w4 int)
WITH (
connector = 'datagen',

fields.w3.kind = 'sequence',
fields.w3.start = '1',

fields.w4.kind = 'random',
fields.w4.min = '-10',
fields.w4.max = '10',
fields.w4.seed = '1',

datagen.rows.per.second = '10'
) ROW FORMAT JSON;

Then create materialized views mv1 and mv2:

create materialized view mv1 as select v2, w2 from t1, s1 where v1 = w1;
create materialized view mv2 as select v4, w4 from t2, s2 where v3 = w3;

Finally we create mv:

create materialized view mv as select w2, w4 from mv1, mv2 where v2 = v4;

Let's verify whether the cascading materialized views have been updated in a timely manner. We will repeatedly perform the following query:

select count(*) from mv;

We should see the results continuously changing. Below is the sample result:

 count
-------
8092
(1 row)