Deploying to AWS#
AWS Admin Privileges Required
This deployment can only be run by users with AWS administrator privileges, so contact your organization's cloud admins if you need access. In addition, you should have a basic understanding of code development, Docker containers, and the AWS cloud.
Overview#
The goal of this guide is to help you deploy your Tsdat pipeline(s) to AWS so that you can process data on-the-fly as it enters S3 buckets (Ingest applications), or on a regular cron schedule (VAP applications), and store the processed results in an output S3 bucket in a structured format.
Your Tsdat pipelines will be deployed via an AWS Cloud Formation stack. The stack creates an AWS CodeBuild project that is connected to your GitHub repository so that an AWS deployment is triggered every time you commit code changes. The following images illustrate different aspects of the deployment. The first image shows how the deployed pipelines will function on AWS; the second image shows all the specific AWS resources that are included in the stack, and the final image shows how the CodeBuild build/deploy cycle works.
Ingest and VAP pipelines can be set up to run on S3 event triggers or on a cron schedule.
The following resources will be used or setup during the AWS deployment.
The following image shows how code changes are deployed to AWS and the steps that make up the AWS CodeBuild process to update or create lambda functions and containers for each pipeline.
Prerequisites#
Create code repos#
Make sure that you have two repositories in your GitHub organization/account created from the following templates:
If you are using an existing pipeline-template
repository, make sure that the requirements.txt
file specifies a
tsdat
version of at least tsdat==0.7.1
. The AWS build will not work with earlier versions of tsdat
.
Clone these repos to the same parent folder on your computer.
Warning: Windows Users
If you are using WSL on Windows, make sure you run the git clone
command from a WSL terminal to prevent git from
converting all the file line endings to CRLF
. If your files have CRLF
line endings, it will cause the AWS
pipeline to crash.
Install docker#
We use a Docker container with VSCode to make setting up your development environment a snap. We assume users have a basic familiarity with Docker containers. If you are new to Docker, there are many free online tutorials to get you started.
Note
Docker Desktop can be flaky, especially on Windows, and it requires a license so we recommend not using it. Instead, we are providing alternative, non-Docker Desktop installation instructions for each platform. The Docker Desktop install is easier and requires fewer steps, so it may be fine for your needs, but keep in mind it may crash if you update it (requiring a full uninstall/reinstall, and then you lose all your container environments).
We also recommend installing VSCode and using the ms-vscode-remote.vscode-remote-extensionpack extension, which includes support for editing code in Docker Containers.
Development Environment#
Open the aws-template
repo#
Open the aws-template
repository in VSCode. You can either use the command line for this (i.e.,
code path/to/aws-template
), or just open it using File -> Open Folder.
Windows Users
Make sure you have the WSL extension by Microsoft
(ms-vscode-remote.remote-wsl).
installed. Then press Ctrl+Shift+P and enter the command WSL: Reopen folder in WSL
Start the container#
From your VSCode window, start a terminal (Main Menu -> Terminal -> New, OR you can press Ctrl+`).
Then from the VSCode terminal, run:
- In our testing we found that just
docker compose up -d
works fine on our team's Windows, Linux, and intel MacOS systems, but the--platform
argument was needed for M1/M2 MacBooks. Milage may vary.
Attach to the container#
- Type the key combination: Ctrl+Shift+P to bring up the VSCode command palette.
- Then from the input box type: "Dev-Containers: Attach to Running Container..." and select it
- Then choose the tsdat-cdk container.
This will start up a new VSCode window that is running from inside your tsdat-cdk container.
Open the VSCode workspace#
From the VSCode window that is attached to the tsdat-cdk container click Main Menu -> File-> Open Workspace from
File. In the file chooser dialog, select /root/aws-template/.vscode/cdk.code-workspace
Tip
A box should pop up in the bottom right corner that asks if you want to install the recommended extensions. Select "Install".
Once the extensions are installed, your workspace is ready! In the Explorer, you will see two top-level folders and a directory structure like so:
-
aws-template/
-
.vscode/
-
.build_utils/
-
...
-
pipelines_config.yml
-
-
.aws/
-
config
-
credentials
-
Deploying the AWS Stack#
Configure account settings#
The top part of the aws-template/pipelines_config.yml
contains settings related to the AWS-GitHub integration, where
data should be pulled from & placed, and which AWS account should be used. Open this file and fill out the configuration
options, using your own values as needed. This section only needs to be filled out once.
github_org: tsdat # (1)!
pipelines_repo_name: pipeline-template
aws_repo_name: aws-template
account_id: "XXXXXXXXXXX" # (2)!
region: us-west-2
input_bucket_name: tsdat-input
output_bucket_name: tsdat-output
create_buckets: True # (3)!
github_codestar_arn: arn:aws:codestar-connections:us-west-2:... # (4)!
-
The name of the organization or user that cloned the
aws-template
andpipeline-template
repos. -
Your AWS account ID. You can get this from the AWS console: In the navigation bar at the upper right, choose your username and then copy the Account ID. It should be a 12-digit number.
-
If you have existing buckets that you would like to use for your pipeline inputs and outputs, then set
create_buckets: False
. Ifcreate_buckets
is set toTrue
and the buckets already exist, then the deployment will throw an error. -
This is the ARN of the CodeStar connection to GitHub. Check out the AWS guide for setting up a CodeStar connection, then copy the ARN of your CodeStar connection here.
Tip
Generally it is a best practice to limit read/write access to your github account, so we recommend giving CodeStar access to just your
pipeline-template
andaws-template
repositories. You can always change this later in GitHub if you want.
Configure AWS profile#
From a terminal inside your VSCode window attached to the docker container run the following line. You may leave this blank aside from the region. You only need to do this once.
aws configure --profile tsdat
# AWS Access Key ID [None]:
# AWS Secret Access Key [None]:
# Default region name [None]: us-west-2
# Default output format [None]:
Your ~/.aws/config
file should now look like this:
Edit aws credentials#
Warning
You will need to do this step BEFORE you deploy your stack and any time the credentials expire (usually after about 12 hours).
If you entered your access keys in the last step then you are good to go, otherwise open your ~/.aws/credentials
file
and update your credentials. (1)
-
You can find your AWS credentials using the following steps:
- Go to your AWS login page
- Then click PROJECT -> Administrator -> Command line or programmatic access (use whatever project you are an admin for)
- In the section, "Option 2: Manually add a profile to your AWS credentials file (Short-term credentials)", click on the box to copy the text.
Your credentials file should look like this (with real values instead of the XXXX
):
[tsdat]
aws_access_key_id=XXXXXXX
aws_secret_access_key=XXXXXX
aws_session_token=XXXXXX
Bootstrap AWS resources#
Warning
Check your CloudFormation stacks
first to see if you need to deploy the bootstrap. If you see a stack named CDKToolkit
then you can SKIP this step.
This should only be run ONCE for your AWS Account/Region. It won't break anything if you run it more than once, but it's just not recommended.
Bootstrapping is the process of provisioning resources for the AWS CDK before you can deploy AWS CDK apps into an AWS environment. An AWS environment is a combination of an AWS account and region.
These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.
The required resources are defined in an AWS CloudFormation stack, called the bootstrap stack, which is usually named CDKToolkit. Like any AWS CloudFormation stack, it appears in the AWS CloudFormation console once it has been deployed.
Deploy the stack#
You can re-run this for each branch you want to deploy (e.g., dev, prod, etc.) and any time you make changes to the stack (e.g., if you add a new permission to your lambda role).
Most deployments will not need to change anything in the stack, but advanced users are free to customize.
Note
You will need to commit and push all of your changes for this to work correctly.
- Here
main
refers to themain
branch of thepipeline-template
repo. We recommend deploying themain
branch because it is slightly easier to maintain. You could also create arelease
branch and deploy that instead if you prefer to have a separate branch for production releases.
Tip
The very first time you run ./deploy_stack.sh
for a given branch you will need to manually release a
CodePipeline change in AWS to get it
to build the initial container images and lambda functions.
Deploying pipeline-template
Changes#
Adding an ingest or vap#
The steps to deploy an existing pipeline at a new site, or to deploy an entirely new pipeline are the same:
-
Commit and push your
pipeline-template
changes (to whichever branch you set up for deployment). -
Update the
aws-template/pipelines_config.yml
file for the new pipeline.The second half of the
aws-template/pipelines_config.yml
file contains configurations for each deployed pipeline, including the type of pipeline (i.e.,Ingest
orVAP
), the trigger (i.e.,S3
orCron
), and a collection of configuration files for different sites that the pipeline is deployed at (configs
section).aws-template/pipelines_config.ymlpipelines: - name: lidar # (1)! type: Ingest # (2)! trigger: S3 # (3)! configs: humboldt: input_bucket_path: lidar/humboldt/ # (4)! config_file_path: pipelines/lidar/config/pipeline_humboldt.yaml # (5)! morro: # (6)! input_bucket_path: lidar/morro/ config_file_path: pipelines/lidar/config/pipeline_morro.yaml - name: lidar_vap type: VAP trigger: Cron schedule: Hourly # (7)! configs: humboldt: config_file_path: pipelines/lidar_vap/config/pipeline.yaml
-
A useful name to give the pipeline in AWS. We recommend naming this like the folder names underneath the
pipelines/
folder in thepipeline-template
repo. E.g., if your config file ispipelines/imu/config/pipeline_humboldt.yaml
, thenimu
would be the recommended name for it. -
The type of pipeline, either
Ingest
orVAP
. -
The type of trigger, either
S3
to trigger when a file enters the input bucket path, orCron
to run on a regular schedule. -
The subpath within the input bucket that should be watched. When new files enter this bucket, the pipeline will run with those files as input.
-
The path to the pipeline configuration file in the
pipeline-template
repo. -
Each
pipeline.yaml
config file needs to be registered so it can be deployed.Here we define one for Morro Bay, CA in addition to the ingest for the Humboldt, CA site.
Note
You can keep adding new sites, or versions of this pipeline to the
configs
section. Just make sure that the key (e.g., "morro", "humboldt") is unique for each pipeline config you add. -
If the
Cron
trigger is selected, then you must also specify the schedule. The schedule should be one of the following values:- Hourly
- Daily
- Weekly
- Monthly
-
-
Go to the CodePipeline UI in AWS and find the CodePipeline for this project, then click 'release change'.
Updating an ingest or vap#
Changes to the deployed branch(es) in the pipeline-template
repo will be released automatically via the CodePipeline
build process in AWS, which was set up to watch for branch changes during the ./deploy_stack.sh main
step.
The AWS CodePipeline build (created during the ./deploy_stack.sh main
step) will automatically watch for changes to
your pipeline-template
code in the main
branch (or whatever branch you specified). This means that any time you push
changes to that branch, CodePipeline will automatically update and re-deploy any modified ingests or VAPs.
Success
You've now deployed a pipeline stack to AWS and you know how to update and add new pipelines on-the-fly!
Viewing Resources in AWS#
You can use the AWS UI to view the resources that were created during the build.
-
Code Pipeline
From here you can check the status of your code build to make sure it is running successfully.
-
ECR Container Repository
From here you can check the status of your built images.
-
S3 Buckets
From here you can check the contents of your input and output buckets.
-
Lambda Functions
You can see the lambda functions that were created for each pipeline here.
-
Event Bridge Cron Rules
From here you can check what cron events have been set up for any cron-triggered pipelines.
-
Cloud Formation Stack
You can see the resources that were created via the CDK deploy. You can also delete the stack from here to clean up those resources. Note that any lambda functions and Event Bridge cron rules created via the CodePipeline build are NOT part of the stack, so these would have to be removed by hand.
Removing the AWS Stack#
If for some reason you would like to completely remove everything that's been deployed, then follow the steps below for each branch you deployed.
- Make sure the input and output S3 buckets are completely empty.
- Delete the CloudFormation stack. It should be named like
pipeline-template-main-CodePipelineStack
-
Navigate to the Lambda UI and delete any lambda functions named like
pipeline-template-[BRANCH]-lambda-*
. (1)- There should one lambda function for every
config: location:
entry in youraws-template/pipelines_config.yml
file, for each deployed branch.
- There should one lambda function for every
-
Navigate back to the CloudFormation UI and delete the
CDKToolKit
stack.