You’ve got data in your transactional database - MySQL, Oracle, Microsoft SQL Server, PostgreSQL, MongoDB, or Elasticsearch. Awesome, we can handle that!
When you create a new transactional database input, Alooma gives you up to 3 ways to replicate, each optimized for different data use cases:
- Full dump and load
- Change data capture (aka log replication)
What's the difference? Let’s dive in!
Full Dump and Load Replication
With full dump and load, you define a replication interval (such as every 4 hours). On that interval, we query each of the tables you’re replicating, take a full snapshot (dump) of the table, and replace the previous snapshot (load) in your target data warehouse.
Advantage: Setup is simple, and this method works best for small tables.
Disadvantage: Tables only update every few hours due to the time it takes to perform the dump. This can also be resource heavy, which may slow down your database, especially for larger tables.
With incremental, you define an “update indicator column” (such as
updated_at) for each of the tables you're replicating. Every time a row in your database is updated, it gets the next largest value in the updated indicator column.
Every few seconds, we’ll query your tables for what’s changed since the last time we checked, leveraging the update indicator column. We’ll copy those incremental changes to your target data warehouse, and merge them in every 15 minutes. We'll also provide you with a view to see the changes in near real time.
Advantage: Setup is simple, doesn’t create a lot of load on your database, and is low latency.
Disadvantage: Requires an update indicator column for each table. Requires additional work to capture deletes.
Change Data Capture (CDC) aka Log Replication
The state of the art in database replication, CDC is the fastest and most reliable way to replicate. For the databases that support it (MySQL, self-hosted PostgreSQL, and MongoDB), we query the database’s internal log every few seconds to get the latest changes. We’ll copy that log of changes to your target data warehouse, and merge them in every 15 minutes. We'll also provide you with a view to see the changes in near real time.
Advantage: State of the art – enables near real time replication with very low query impact.
Disadvantage: Requires more initial setup work.
To support replication, Alooma creates 2 tables and one view in your target data warehouse:
Final (aka “Consolidated”) Table
This is the table which represents the source table from your input. Generally given the same name as your source table, the final table is updated every 15 minutes for incremental and CDC replication, or on the replication interval you defined for full dump and load replication.
When should I query the final table? When query speed is super important to you, and a 15 minute data delay is acceptable.
This is where we write new data (be it the latest dump data, or the freshest incremental or log data) before it’s merged into the final table. For each final table, there’s a matching
_log table. Every time we merge, we clean out some old data, to keep the log table small and the merges fast.
When should I query the log table? Never! Leave the querying to us :)
Sometimes the 15 minute wait for the final table to be updated is just not fast enough. For that we created the Near Realtime (NRT) view, which for each table appears as a matching
_nrt view. It gives you the same data as the final table, but on demand! The view can take up to a couple minutes to generate, which lowers the data latency from 15 minutes to just a couple of minutes.
When should I query the NRT view? When the freshness of the data is the most important, and it’s ok that the query can take a couple more minutes to run.
As explained above, Alooma utilizes a log table as a temporary holding place for new data from your source tables. Then, at a regular interval of 15 (configurable) minutes for incremental and CDC, or the replication interval of your choice for full dump and load, Alooma merges that data into your final table using what we call “consolidation”.
This process is due to the limitations of data warehouses, which were built for appends and not for updates like transactional databases. Check out our consolidation documentation for more info on this process, and why it's required.
Want to learn more about our replication options and what’s best for your data? Talk to us!