Unit testing AWS Glue jobs presents challenges due to the complexities involved in replicating the Glue environment locally. Fortunately, AWS offers a solution in the form of Glue container images available at Glue container images. These images allow us to perform unit tests effectively, as outlined in detail in the official documentation here. In this blog post, we will delve into the process of running AWS Glue job unit tests within an Azure DevOps pipeline and discuss how to calculate and publish code coverage for these tests.
To begin with, the Glue container image operates under a special user named GLUE_USER
, which is referenced in the associated dockerfile.
USER glue_user
Assuming you have developed your Glue job in a Python script named myawesomegluejob.py
, which is stored in an Azure DevOps (AzDO) Git repository, creating a pipeline for this purpose might initially seem straightforward. However, executing build steps directly within the Glue container is not feasible due to permission constraints with the GLUE_USER
.
To overcome this limitation, our approach involves leveraging Docker commands in the pipeline to fetch the Glue image and subsequently mounting the Azure DevOps pipeline's file structure inside the Glue container. This facilitates the sharing of test results and code coverage data back to the Azure DevOps pipeline for future utilization.
By default, the Azure DevOps pipeline file system is not writable by the GLUE_USER
. To address this, we must grant access to all users by executing the command chmod -R 0777 $(Build.SourcesDirectory)
.
Next, we can execute the following command:
docker run -v $(Build.SourcesDirectory):/home/glue_user/workspace -w /home/glue_user/workspace public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 -c "pip install pytest pytest-azurepipelines pytest-cov; python3 -m pytest test --doctest-modules --junitxml=junit/test-results.xml --cov=main --cov-report=xml"
This command effectively mounts $(Build.SourcesDirectory)
into the /home/glue_user/workspace
folder within the container. By setting the working directory to /home/glue_user/workspace
, we proceed to execute a series of commands that install the necessary Python libraries and perform the unit tests. Consequently, a coverage.xml
file is generated at $(Build.SourcesDirectory)
. However, as this file is created within the container, it contains relative paths of the container in its sources
node. To rectify this, we conduct a string replacement using the sed
command.
Here's the relevant snippet encompassing the aforementioned steps in the Azure DevOps pipeline:
- job: 'Scan_and_Build'
steps:
- script: |
docker pull public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01
displayName: 'Pull Glue Image'
- script: |
chmod -R 0777 $(Build.SourcesDirectory)
docker run -v $(Build.SourcesDirectory):/home/glue_user/workspace -w /home/glue_user/workspace public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 -c "pip install pytest pytest-azurepipelines pytest-cov; python3 -m pytest test --doctest-modules --junitxml=junit/test-results.xml --cov=main --cov-report=xml"
sed -i "s|/home/glue_user/workspace|$(Build.SourcesDirectory)|g" $(Build.SourcesDirectory)/coverage.xml
displayName: 'Run tests'
- task: PublishTestResults@2
condition: succeeded()
inputs:
testResultsFiles: '**/test-*.xml'
displayName: 'Publish unit test results'
Conclusion
This blog post has effectively demonstrated how to perform unit tests for AWS Glue jobs within an Azure DevOps pipeline. By leveraging Glue container images and integrating Docker commands, it becomes possible to seamlessly execute unit tests and publish code coverage data, thus ensuring the reliability and stability of your Glue jobs.
Comments
Post a Comment
As far as possible, please refrain from posting Anonymous comments. I would really love to know who is interested in my blog! Also check out the FAQs section for the comment policy followed on this site.