Like almost everyone these days, Splitgraph has a pretty elaborate CI/CD system.
It handles everything from building
our Docker images
to running tests
to staging/production deployments
. To achieve this, we use the extensive GitLab CI/CD tool. Through a combination
of a base .gitlab-ci.yml
configuration file coupled with arbitrary scripts, it
lets us perform a variety of potential DevOps actions.
One such example that recently attracted our attention was something GitLab calls review apps. The concept is pretty straight-forward: a CI/CD job that deploys code changes from a branch, integrated with GitLab's UI. This deployment could be a work-in-progress or a proof-of-concept. GitLab review apps let developers, designers or product owners preview and iterate on their work faster.
We decided to make use of this functionality in our own workflow. We call it preview environments, as the name is a better fit for our use case (more on that below).
Our proprietary product, Splitgraph Cloud, can be deployed to private as well as public clouds. In the latter case, we use a set of multi-cloud Terraform templates that cover the public cloud triumvirate of Azure, AWS and GCP.
Terraform allows us to write maintainable and re-usable infrastructure-as-code in a cloud-agnostic way. Thus our preview environments not only have to deal with application-level deployments, but also with provisioning the cloud infrastructure during the initial setup (as well as cleaning up on tear-down). This makes them veritable environments, in parallel to staging and production.
Terraform also allows us to persist the state of all our existing preview environments between CI/CD job runs. For this, it uses the remote backend configuration (which in our case is set to a GCS bucket).
Here's an example of the preview environment job spec in the .gitlab-ci.yml
configuration file:
deploy_preview:
image: registry.gitlab.com/splitgraph/splitgraph-cloud/cd-environment:development
stage: preview
rules:
- if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH
when: manual
allow_failure: true
environment:
name: preview/$CI_COMMIT_REF_SLUG
url: https://www.$CI_COMMIT_REF_SLUG.splitgraph.io
deployment_tier: development
auto_stop_in: 3 days
on_stop: terminate_preview
script:
- source "$CI_PROJECT_DIR"/.ci/envs/preview.env
- $CI_PROJECT_DIR/.ci/preview_env.sh deploy
A couple of noteworthy points:
yq
and our open
source sgr client to seed all our
preview environments with
a whole lot of data upon
the initial deploy.CI_COMMIT_REF_SLUG
predefined variable
.The actual provisioning and deployment is handled through the preview_env.sh
script. Before that, source a .env
file to populate some environment
variables. These include Terraform variables, as well as some variables
controlling app-level settings. One more thing we had to resort to here is trim
the name of the preview environment for branches with long names. This is to
avoid the length constraints put on us by the cloud providers or Let's Encrypt,
since we use this name for various resources too.
When provisioning/deploying, in the preview_env.sh
script we first check
whether a Terraform workspace for the given branch already exists:
workspace_exists=0
terraform workspace select "$CI_COMMIT_REF_SLUG" && workspace_exists=1
If the workspace doesn't exist, it means that the environment has not been provisioned, so the job then executes:
terraform workspace new "$CI_COMMIT_REF_SLUG"
terraform apply -auto-approve
source ./first_deploy.sh
Terraform then creates the required resources. This includes the VPC, firewalls, subnetwork, virtual machines, disks, as well as Cloudflare domain records and Gitlab project variables (for per-environment secrets, e.g. SSH keys).
In case of preview environments we always deploy a 2-node cluster. The main VM hosts our DBs and the backend services, while the lighter, auxiliary VM, hosts analytics and monitoring services.
The script first_deploy.sh
subsequently takes care of a number of other
important tasks:
If the workspace does exist, then the job has been triggered on a pre-existing preview environment. This means we only need to upgrade our app to pick up the latest changes for the given branch:
"$CI_PROJECT_DIR"/.ci/deploy/deploy.sh
The complete initial setup takes about 15 minutes. A redeploy (if all services are updated) takes about 10.
Finally, in either case we use chatops to distribute/persist some environment-specific user details (e.g. URL, user credentials, etc.) in a Markdown-formatted message. If a merge request is open, the job posts a note on it with the details. If not, the job pushes them directly to a corresponding Mattermost channel through a hook.
There is one more job, called terminate_preview
, which is used for
de-provisioning the environment. Besides being executable manually, note that it
is specified as environment:on_stop
reference for the deploy_preview
job.
This means that once the environment:auto_stop_in
period since the last deploy
job ran expires, the job gets triggered automatically by GitLab.
It sources the same preview.env
and then
executes $CI_PROJECT_DIR/.ci/preview_env.sh terminate
:
terraform destroy -auto-approve
terraform workspace select default
terraform workspace delete "$CI_COMMIT_REF_SLUG"
thereby ensuring that all resources are cleaned up, including the Terraform workspace.
In summary, we utilize preview environments for smoke testing and rapid iteration, in an on-demand, fully fledged production-grade Splitgraph instance. The infrastructure provisioning and application deployment done in this case exercise the same code that we run for our customer environments and production releases. For this reason we view preview environments as an extension of integration tests and, in fact, are planning to start using them for automated end-to-end tests.