Cross Aggregate Validations: Exploring Set-Based Validation Techniques in Event Sourcing

Cross Aggregate Validations: Exploring Set-Based Validation Techniques in Event Sourcing

In event sourcing projects, set-based validation plays a crucial role in ensuring data consistency and enforcing business rules. Cross-aggregate validation, which involves validating business rules across multiple aggregates, can be particularly challenging. We will explore different approaches for handling cross-aggregate validation, evaluating their pros and cons, to effectively ensure uniqueness in this scenario.

The Challenge: Enforcing Unique Instagram Page IDs in an Advertising Platform

In our case study, we encounter the challenge of validating unique Instagram page IDs in an advertising platform. The goal is to ensure that each Instagram page added to the platform has a unique InstagramPageId. We will explore various techniques to tackle this validation requirement.

  • Loading All Event Streams: The basic approach for set-based validation is to load all event streams pertaining to a specific aggregate. In the case of Instagram pages, this would involve loading the millions of event streams and applying all the events to reconstitute the Instagram page objects for validation.

    • Pros

      • Simplicity: Loading all event streams is a straightforward solution to implement, especially for simpler or academic projects.

      • Ease of Implementation: Since it involves loading and applying events in sequence, it aligns well with the fundamental principles of event sourcing.

    • Cons:

      • Low Performance: Loading and processing millions of event streams can introduce significant performance bottlenecks, especially as the size of the event streams grows over time.

      • Scalability Challenges: As the number of event streams increases, the approach becomes less scalable, potentially impacting the responsiveness of the system.

      • Resource Intensive: Loading all event streams can consume substantial memory and processing resources, further impacting system performance.

  • Uniqueness Validation using In-Memory Read Models: To validate the uniqueness of attributes like Instagram page IDs, an in-memory read model can be created. This read model stores the necessary information and allows quick access for validation during write operations. When adding a new Instagram page, the read model is consulted to check whether the desired page ID is already in use.

    • Pros:

      • Ease of Implementation: Utilizing an in-memory read model for uniqueness validation is relatively straightforward to implement. It provides a simple and efficient solution to enforce uniqueness constraints.
    • Cons:

      • Eventual Consistency: Read models in event sourcing systems are eventually consistent, meaning they may not reflect the latest state immediately. As a result, there is a possibility of temporarily incorrect validation results until eventual consistency is achieved.

      • Time-Dependent Validations: Uniqueness validations based on read models are subject to race conditions, particularly when concurrent write operations occur. Depending on the implementation, conflicts may arise if multiple write operations attempt to add Instagram pages with the same ID simultaneously.

  • Index Tables: To validate uniqueness, we can create an index table in a relational database, specifically designed to store Instagram page IDs as primary keys. Before adding events to the event stream, a new record is inserted into this table. If no DuplicatePrimaryKey error occurs, it confirms that the page ID is unique and can proceed with event creation.

    • Cons:

      • Relational Database Scaling Problem: As the number of Instagram pages grows, the scalability of a relational database may become a limiting factor.

      • Increased Infrastructure Maintenance Cost: Utilizing a relational database for uniqueness validation introduces an additional component that requires maintenance and management, potentially leading to increased infrastructure costs.

  • Delayed Uniqueness Validation: Rather than performing immediate uniqueness checks, we allow customers to add potentially duplicate Instagram pages to our write side. Subsequently, these duplications can be identified and handled on the read side, either through manual interventions or automated processes.

    • Manual Handling for Occasional Problems: In some cases, not all problems require immediate software solutions. For example, when duplications occur infrequently, it may be more cost-effective and efficient to handle them manually. This can involve contacting the customer directly to remove the duplicate page or sending notifications to inform them about the duplication and request resolution.

    • Weighing Software Costs: Implementing software solutions for every possible problem can come with associated costs. In scenarios where duplications or similar issues are rare or occur only a few times, the cost of developing and maintaining a software solution may outweigh the benefits. In such cases, employing manual handling methods can be a more practical choice.

Conclusion

In this exploration of set-based validation techniques in event sourcing, we have examined different approaches for handling cross-aggregate validation, with a focus on ensuring uniqueness in an advertising platform’s Instagram page IDs. Each technique offers its own set of advantages and disadvantages, providing valuable insights into the complexities of set-based validation. Choosing the most appropriate set-based validation technique depends on the specific requirements and constraints of the event sourcing project. It is essential to carefully evaluate the pros and cons of each approach and consider factors such as performance, scalability, consistency, and maintenance costs. By understanding the challenges and trade-offs associated with cross-aggregate validation in event sourcing, developers and architects can make informed decisions to ensure data consistency and enforce business rules effectively. Continued research and exploration in this area will further refine and enhance set-based validation techniques, contributing to the success of event sourcing projects in various domains.