You’ve got data in your transactional database - MySQL, Oracle, Microsoft SQL Server, PostgreSQL, MongoDB, or Elasticsearch. Awesome, we can handle that!
When you create a new transactional database input, Alooma gives you up to 3 ways to replicate, each optimized for different data use cases:
Full dump and load
Change data capture (CDC aka log replication)
What's the difference? Let’s dive in!
With full dump and load, you define a replication interval (such as every 4 hours). On that interval, we query each of the tables you’re replicating, take a full snapshot (dump) of the table, and replace the previous snapshot (load) in your target data warehouse.
Advantage: Setup is simple, and this method works best for small tables.
Disadvantage: Tables only update every few hours due to the time it takes to perform the dump. This can also be resource heavy, which may slow down your database, especially for larger tables.
With incremental, you define an “update indicator column” (such
updated_at) for each of the tables you're
replicating. Every time a row in your database is updated, it gets
the next largest value in the updated indicator column.
Every few seconds, we’ll query your tables for what’s changed since the last time we checked, leveraging the update indicator column. We’ll copy those incremental changes to your target data warehouse, and merge them in every 15 minutes. We'll also provide you with a view to see the changes in near real time.
Advantage: Setup is simple, doesn’t create a lot of load on your database, and is low latency.
Disadvantage: Requires an update indicator column for each table. Requires additional work to capture deletes.
The state of the art in database replication, CDC is the fastest and most reliable way to replicate. For the databases that support it (Microsoft SQL Server, MongoDB, MySQL, and PostgreSQL), we query the database’s internal log every few seconds to get the latest changes. We’ll copy that log of changes to your target data warehouse, and merge them in every 15 minutes. We'll also provide you with a view to see the changes in near real time.
Advantage: State of the art – enables near real time replication with very low query impact.
Disadvantage: Requires more initial setup work.
The type of replication you use will dictate some of the fields included in the event metadata. See this article for more information.
To support replication, Alooma creates 2 tables and one view in your target data warehouse:
This is the table which represents the source table from your input. Generally given the same name as your source table, the final table is updated every 15 minutes for incremental and CDC replication, or on the replication interval you defined for full dump and load replication.
When should I query the final table? When query speed is super important to you, and a 15 minute data delay is acceptable.
This is where we write new data (be it the latest dump data, or
the freshest incremental or log data) before it’s merged into the final table. For each final table,
there’s a matching
_log table. Every time we merge, we
clean out some old data, to keep the log table small and the merges
When should I query the log table? Never! Leave the querying to us :)
Sometimes the 15 minute wait for the final table to be updated
is just not fast enough. For that we created the Near Realtime
(NRT) view, which for each table appears as a matching
_nrt view. It gives you the same data as the final
table, but on demand! The view can take up to a couple minutes to
generate, which lowers the data latency from 15 minutes to just a
couple of minutes.
When should I query the NRT view? When the freshness of the data is the most important, and it’s ok that the query can take a couple more minutes to run.
As explained above, Alooma utilizes a log table as a temporary holding place for new data from your source tables. Then, at a regular interval of 15 (configurable) minutes for incremental and CDC, or the replication interval of your choice for full dump and load, Alooma merges that data into your final table using what we call “consolidation”.
This process is due to the limitations of data warehouses, which were built for appends and not for updates like transactional databases. Check out our Consolidation documentation for more info on this process, and why it's required.
Want to learn more about our replication options and what’s best for your data? Talk to us!