How Agile Data Engine uses trunk-based development in data platforms

Dec 16, 2024 11:13:52 AM

How much do you enjoy reviewing other people’s code? If you're like most of us, probably not so much.

Now imagine doing this after weeks of isolated work, where branches clash during merges, resulting in conflicting changes and frustrating rework. These challenges are why many teams lean toward trunk-based development, which promotes frequent, smaller integrations that help prevent such issues before they snowball. 

Introduction to trunk-based and GitFlow approaches

Trunk-Based development is a version control strategy that promotes a streamlined workflow where all engineers commit and integrate their changes directly to a single, shared branch, known as the trunk, while keeping the codebase continuously deployable. In contrast, GitFlow uses multiple long-lived branches, allowing teams to work in parallel on different features or releases. While GitFlow offers flexibility, it can lead to more complex merges and delayed integration, especially in fast-paced environment.

Trunk-based development aligns well with DataOps principles by fostering communication, automation, and continuous improvement through frequent, small code changes committed directly to the main branch. In the dynamic DataOps environment, where data pipelines and workflows are constantly evolving, this approach enables teams to swiftly implement and deploy changes to production.

Consistency and Quality

Trunk-Based Development helps standardize and maintain system consistency and quality of work:

  • Trunk-based development forces your teams and team members to establish well working communication channels to enable successful development.

  • Frequent integration of changes into the main branch and regular reviews allow early testing and feedback, which helps identify and address issues as soon as possible in the development process. Trunk-based development is also well-suited for continuous integration (CI) practices. With every commit to the main branch, automated tests can be run, ensuring that code quality is maintained throughout development.

  • Faster development cycles combined with frequent deployments ensure that changes are continuously tested and validated, reducing the risk of data quality issues and minimizing the need for rework. Rather than merging features into the dev branch and scheduling production deployments, delivery cycles are shortened, enabling production deployments to occur even on a daily basis.

  • Simplified merging operations help maintain a consistent codebase by reducing the likelihood of merge conflicts, which in turn improves the speed and quality of development. Trunk-based development removes the risk of long living feature branch impact on downstream dependencies.

Agile Data Engine’s trunk-based development approach

Agile Data Engine (ADE) offers a web-based graphical user interface accessible to all users within the same installation. This collaborative interface is designed to provide real-time updates, ensuring that all developers and development teams work with the most current version, allowing users to track each other's activities.

Unified Collaboration on the Master Branch

In Agile Data Engine, a fundamental principle is the adoption of a single "master branch" for development. This means all developers actively contribute to the same branch directly through the user interface. While it might appear as a singular approach, it has been carefully designed to enhance DataOps efficiency and collaboration.

Package-Based Development

Enabling multiple engineers to work synchronously within the same "master branch" requires organizing the development process into smaller, manageable units called "packages". Each package serves as a container for entities and their related data loads, representing the deployable unit in the CI/CD workflow. Additionally, there is a change log view summary, and detailed changes can be compared across any version using JSON and YAML formats, ensuring full transparency of modifications.

Version Control

When a package is committed and deployed within Agile Data Engine, it is assigned a unique version number. The versioning system creates a historical record of the package's evolution, enabling developers to easily track changes over time. This not only enhances the understanding of progress but also supports the trunk-based development focus on maintaining a continuously deployable state.

Furthermore, ADE’s version control mechanism empowers developers with the flexibility to restore older package versions when needed. This capability provides a safety net, ensuring that prior configurations can be reinstated. It's a powerful feature for troubleshooting, rollbacks, and maintaining data integrity.

Resolving Dependencies

During deployment, Agile Data Engine resolves up- and down-stream dependencies for the deployable package to ensure that all required changes are deployed to correct environment. Agile Data Engine utilizes late build on deployment, allowing for minimal changes to the existing database schema, ensuring minimal disruption to data loads and data availability.

Summary

Agile Data Engine leverages trunk-based development by having all developers collaborate on a single master branch, using packages for code management and version control. As a result, it leads to shortened feedback loop between business and data development, promotes a stronger focus on value, enhanced transparency and trust in data, and increased predictability and clarity regarding development and maintenance costs.