MongoDB for Data-Intensive Applications

MongoDB is scalable and schema-less NoSQL key-value storage, which makes it handy for applications to use as a persistent layer. The two features in this post views and change streams from MongoDB can turn your application into a full-fetched back-end engine for data-intensive applications.

Change Streams

Since MongoDB 3.6 you can use change streams to create event-driven applications for data processing. Whenever an update occurs in a specific collection, MongoDB triggers a change event with the modified data. With this functionality you can implement the design pattern change data capture (CDC) and other systems or applications can automatically consume the change in real time. This can be useful in a variety of scenarios, such as keeping a distributed cache up-to-date, triggering notifications, or synchronizing data across different systems.

One of the key benefits of using change streams is that they are integrated into MongoDB natively, so there is no need to set up a separate event processing engine or an orchestration tool. Additionally, change streams are scalable, so they can handle high volumes of updates without impacting performance.

A common way to run ETL jobs is through an orchestration tool, which triggers in a periodic cycle one task after the other and manages the dependencies. Like in the illustration below.

Orchestration with database

To replace such an idea with real-time processing would mean that every transformation script watches the collection of the previous step. So the dependencies are managed by MongoDB and the timing is instant. The data is written from one collection to another. To minimize the disk footprint only the derived or updated data is added to the output collection with a reference to the original document.

MongoDB change steam

If the transformation script fails, it needs to be restarted and resume its work. MongoDB does feature a resumability.

Views

MongoDB offers the same feature as in other SQL databases - views.

A MongoDB view is a read-only queryable object whose contents are defined by an aggregation pipeline on other collections or views. MongoDB does not persist the view contents to disk. A view’s content is computed on-demand when a client queries the view.

These facts make views helpful to hide joins from the main query or to derive computed fields on query time. Reporting or dashboarding have typically these needs. Instead of creating a read-only API, you provide a view.

Summary

In conclusion, MongoDB is a powerful NoSQL database that provides various features to support data-intensive applications. The change streams functionality enables real-time processing and event-driven applications without the need for external tools or engines. Views are also a useful tool for reporting and dashboarding purposes. Overall, MongoDB’s scalability and flexibility make it a great choice for modern data applications.