Moving only valid DAG files to Cloud Composer

  airflow, cicd, google-cloud-build, gsutil, python-3.x

I have a Cloud Build which gets triggered when a new commit is made to our Github repo, which is being mirrored in a Google Source Repo.

Cloudbuild YAML file is given as below:

- name: ubuntu
  id: Initialization
  args: ['bash', '-c', "echo '$COMMIT_SHA' > REVISION.txt"]
- name: gcr.io/cloud-builders/gsutil
  id: Deployment
  args: ['rsync', '-r', '-x', "^(?!.*.py$).*", '.', '${_GCS_BUCKET}/dags/']
- name: gcr.io/cloud-builders/gsutil
  id: Check
  args: ['ls', '${_GCS_BUCKET}/dags/']

However, we have realized that there are some non-DAG python files and some invalid DAG files in our repo that can’t be removed, as they are being used for other purposes. I want to add a DAG validation step in my cloud build, and I have a DAG Validation python script.

import unittest
from airflow.models import DagBag

class TestDagIntegrity(unittest.TestCase):

    LOAD_SECOND_THRESHOLD = 2

    def setUp(self):
        self.dagbag = DagBag()

    def test_import_dags(self):
        self.assertFalse(
            len(self.dagbag.import_errors),
            'DAG import failures. Errors: {}'.format(
                self.dagbag.import_errors
            )
        )

    def test_alert_email_present(self):

        for dag_id, dag in self.dagbag.dags.iteritems():
            emails = dag.default_args.get('email', [])
            msg = 'Alert email not set for DAG {id}'.format(id=dag_id)
            self.assertIn('[email protected]', emails, msg)


suite = unittest.TestLoader().loadTestsFromTestCase(TestDagIntegrity)
unittest.TextTestRunner(verbosity=2).run(suite)

However, I can’t figure out how to implement this check. Should it be included before python files are moved from source repo to cloud composer DAG bucket or after? GSUTIL RSYNC doesn’t have an include flag, so including only specified files is not possible. Please help.

Source: Python-3x Questions

LEAVE A COMMENT