Summary: Information about Consolidation queries, including what they are, how they work, scheduling/running, and more.
Here are some common questions about consolidation queries.
A consolidation query is a query Alooma uses to generate a 1:1 replica of a data source in a target data warehouse after raw event data is loaded to the warehouse. Consolidation queries were "born" due to the append-only nature of some data warehouses, namely Redshift, to simulate "upsert" behavior.
A common use case: A company has an updateable application database (e.g. PostgreSQL, MySQL), as part of their production environment. The company's analytics solutions and applications that rely on the database both suffer from performance degradation as they churn the database — an issue that gets worse as the data grows. The company's data scientists would like their tables to be replicated to Redshift, with all the INSERTs, UPDATEs and DELETEs accounted for, to lessen the load on the DB and improve their analytics solution performance.
But with Redshift as an append-only-oriented data warehouse, how can this be done in a performant and reliable way? By using consolidation queries to bridge the gap.
The 1:1 replica creation process has two stages:
Data is pulled from the source along with metadata describing the action type the data originated from. e.g. INSERT statement, DELETE statement, UPDATE statement, or others.
This data is then loaded along with the metadata into a "log table" in the data warehouse, which then contains a superset of all the versions of a row in the data source. e.g. if a row is created and then deleted, while the source will now have 0 versions of said row, the data warehouse will contain 2 - one version corresponding to the INSERT, and one to the DELETE.
A consolidation query is run on said log table, selecting only the latest version of each and every database row, and INSERTing that version into a final table - the "consolidated" table. This table always contains an exact 1:1 replica of the data from the point in time at which the query ran.
This process allows Alooma to create a 1:1 replica of the source database in a performant and reliable way using only INSERT statements, without any usage of DELETEs or UPDATEs.
The consolidation interval for BigQuery is every 8 hours. For Redshift and Snowflake outputs the default frequency is every 15 minutes. There are no consolidation queries for Azure.
Depending on the amount and frequency of data coming in, and the data warehouse cost of running consolidations, you may wish to change the interval. To change the interval, contact us.
Lucky for you - you don't need to! When creating an input that requires consolidation (like MySQL, PostgreSQL, or Salesforce) a consolidation query is generated and run on your target data warehouse automatically by Alooma.
You can see your existing queries via the Queries tab.
Since Alooma is all about real time (and, while close, even a 15 minute delay does not always cut it!), as a part of the consolidation process, we create a SQL VIEW we call the "NRTVIEW" (NRT for Near Real-Time). When querying this view, you will always get a real-time representation of your source data (with only a 2 minute or less delay).