Federated credentials from GitHub and GitLab pipelines to Azure

Azure access from GitHub and GitLab pipelines - without secrets

This post also appeared in the FastTrack Blog.

tl;dr

  • Federated credentials / workload identity federation allows your CI/CD pipelines in GitHub and GitLab to access your Azure subscription without any secrets stored in the pipeline config.

  • GitHub's azure/login@1 task handles this transparently, but I also explain how it works under the hood. GitLab supplies the necessary token directly to your pipeline run.

  • Both GitHub and GitLab are easy to setup and federate securely with your Azure subscription.

  • BitBucket can't be setup that way, because tokens issued by BitBucket don't have a predicable subject identifier.

Overview

This article demonstrates how to configure, and securely access, an Azure environment from within a GitHub and a GitLab CI/CD pipeline, without having to store credentials on the GitHub/GitLab side. The article also briefly explains why BitBucket currently doesn't support that capability.

CI/CD pipelines often have to interact with an Azure cloud environment, e.g. to upload artifacts to a storage account, read values from a Key Vault or deploy resources. Service principal credentials are a well-established way for such access, but have the disadvantage of using secrets and passwords, which have to be managed securely. Using 'federated identity credentials' (also sometimes called 'workload identity federation'), such cloud access can happen without having to store secrets in the CI/CD pipeline.

Old fashioned service principals: Traditionally, a service principal credential consists of three values: The client_id of the service principal or app registration, the Azure AD tenant_id where the service principal exists, and a client_secret password needed to fetch a token. Ideally, we want to avoid having to handle a client_secret; these secrets often have a lifetime (they expire and have to be updated after a couple of months), and access to these secrets must be protected so only authorized parties can access these secrets.

Federated credentials: Modern CI/CD systems, such as GitHub or GitLab, allow their users to run pipelines on container-based runners in their infrastructure. As part of these environments, they also expose a service-provider-specific OAuth2 identity provider (IdP), that the code in the CI/CD can fetch tokens from. The idea behind a federated credential is to say: "In Azure, there isa user-assigned managed identity (or a service principal), and the CI/CD pipeline should be able to use a token from the local IdP, to sign-in to that UAMI/SP".

So there are two token exchanges:

  • The CI/CD pipeline somehow talks to the 'local' IdP, says "I am the main branch within this project, please issue me a JWT token which I can then use to sign-in to Azure". That security token has an issuer being GitHub or GitLab, an audience of Azure AD, and the token's subject being information about which CI/CD pipeline is currently running.

  • The CI/CD pipeline then talks to Azure Active Directory, and exchanges the GitHub-issued token with one that can be used to access the desired Azure resource. The exchange basically says "Here's a security token, showing that I'm this CI/CD pipeline, please give me a token to call into Azure KeyVault (or ARM, or Storage, or whatever it might be)".

The Microsoft Entra 'Workload Identity Federation' docs show in depth how the flow works in general:

In our scenario, the 'external identity provider' is the GitHub/GitLab-internal IdP. Simply speaking,

  1. the CI/CD pipeline fetches the token from GitHub,

  2. fetches the Azure token from Azure AD,

  3. during that request, Azure AD validates the GitHub-issued token by retrieving the external IdP's signing credentials and checking the token signature, and

  4. finally the CI/CD code can access the Azure resources:

Service Principals and user-assigned managed identities

Federated credentials are supported by both service principals and user-assigned managed identities (UAMI). In the end, a UAMI under the hood is represented by a service principal in Azure AD, too. The identities of both a service principal and UAMI can be granted access to Azure resources, so both are a fit here, too.

However, the lifecycle management and API surface for these two identities is very different:

A service principal is created and configured within Azure Active Directory (for example by calling az ad app create), and adding a https://graph.microsoft.com/beta/applications/${applicationObjectId}/federatedIdentityCredentials/ via Microsoft Graph API. Depending on where you work, writing to Azure AD and Graph API might be tightly regulated; many companies prevent regular users from directly creating a service principal, so this route might be challenging.

A user-assigned managed identity on the other hand can be completely handled in the Azure Resource Manager (ARM) control plane. A UAMI (and it's federated identity configuration) is a first-party ARM object, so that might be more approachable for teams who have full control over their Azure subscription (but lack Azure AD privileges).

The federated credential configuration for Azure

Both the service principal configuration (via Microsoft Graph API), as well as the UAMI configuration (via ARM API) require the same configuration data:

  • name: Each SP or UAMI might have up to 20 different federated credentials configured, and each credential must have a name attribute.

  • issuer: Each federated credential must have the issuer URL configured.

    • The issuer is something like "https://token.actions.githubusercontent.com" or "https://gitlab.com", or a custom domain name in case of a dedicated GitLab instance. It must be equivalent to the iss claim in the security token.

    • How is it used? Azure AD appends the path .well-known/openid-configuration to the issuer URL, to retrieve the IdP's signing credentials, used to check the signature on the tokens.

  • audience: An array (with exactly one string) of the aud claim in the federated identity token.

    • By default, this is "api://AzureADTokenExchange", but you can customize that if desired.

  • subject: The subject value is the sub claim of the security token, and is determined by the CI/CD environment.

    • For GitHub, this subject for example looks like "repo:chgeuer/azure-workload-identity-github:ref:refs/heads/main", in which chgeuer/azure-workload-identity-github represents the user or organization (chgeuer), and the repository (azure-workload-identity-github), while ref:refs/heads/main indicates a CI/CD pipeline running on the main branch.

    • For GitLab, this subject looks similar, like "project_path:chgeuer/azure-workload-identity-federation-demo:ref_type:branch:ref:main", i.e. chgeuer being the user, azure-workload-identity-federation-demo being the repository and main being the branch.

    • Unfortunately, BitBucket handles this differently: For federated credential sign-in to work well, the expected sub claim in the security token must have a predictable value. BitBucket's sub claims look like this:"{ad073b2b-7126-4f19-9eed-1c9b10abe160}:{2b2ac083-d564-4064-8ea1-43e6aeff2b96}:{37416cfa-3260-4c31-bea4-a6b2f29272a7}". The three GUIDs are "{repositoryUuid}:{deploymentEnvironmentUuid}:{stepUuid}". The repositoryUuid and the deploymentEnvironmentUuid are stable, but unfortunately, the 3rd element in the tuple, the stepUuid, is re-generated with each pipeline run. Therefore, each time a new security token has a different sub claim. Given that an Azure federated identity credential expects a fixed subject, and does not allow semantics like 'Subject claim starts with ... or conforms to a regular expression', BitBucket's tokens can't be used for federated credential flows.

These four values, represented in JSON, in a GitHub configuration would look like this:

{
  "name": "githubfedcred",
  "issuer": "https://token.actions.githubusercontent.com",
  "audience": [ "api://AzureADTokenExchange" ],
  "subject": "repo:chgeuer/azure-workload-identity-github:ref:refs/heads/main"
}

while a GitLab config would look like this:

{
  "name": "gitlabcred",
  "issuer": "https://gitlab.com",
  "audience": [ "api://AzureADTokenExchange" ],
  "subject": "project_path:chgeuer/azure-workload-identity-federation-demo:ref_type:branch:ref:main"
}

Example on creating a service principal with a federated credential (using a script)

The following bash script gives you an idea on how the service principal would be created (az ad app create), and how you can add the federatedIdentityCredentials JSON to Graph API

#!/bin/bash

az ad app create --display-name "${appDisplayName}"

applicationObjectId="$( az ad app list --display-name "${appDisplayName}" | jq -r '.[0].id' )"

az rest \
   --method POST \
   --uri "https://graph.microsoft.com/beta/applications/${applicationObjectId}/federatedIdentityCredentials/" \
   --body '{"name": "github","issuer": "https://token.actions.githubusercontent.com",
            "audience": [ "api://AzureADTokenExchange" ],
            "subject": "repo:chgeuer/azure-workload-identity-github:ref:refs/heads/main"}'

Example creating a user-assigned managed identity with a federated credential (using infra-as-code)

When creating a UAMI via Bicep, these two steps could be combined in a single representation:

resource uami 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: uamiName
  location: location
  resource federatedCred 'federatedIdentityCredentials' = {
    name: 'githubcred'
    properties: {
      issuer: 'https://token.actions.githubusercontent.com'
      audiences: [ 'api://AzureADTokenExchange' ]
      subject: 'repo:chgeuer/azure-workload-identity-github:ref:refs/heads/main'
    }
  }
}

Information from Azure, needed for the CI/CD environment

Once the service principal or the UAMI is created on the Azure side, you need two pieces of information from Azure:

  • The Azure Active Directory's Tenant ID, i.e. either the tenant's GUID (something like 942023a6-efbe-4d97-a72d-532ef7337595), or one of the configured domain names (such as chgeuerfte.onmicrosoft.com).

  • The service principal's or UAMI's client ID, which always is a GUID.

These two values must be configured in the CI/CD environment, as environment variables, or secrets (even though they're strictly speaking not secret).

GitHub-side setup with full integration

The following sample shows how to ZIP the repo's source code and upload it into a storage account:

name: ZIP the source and upload
on:
  workflow_dispatch:
permissions:
  id-token: write
  contents: read
jobs:
  build:
    name: Zip and upload
    runs-on: ubuntu-latest
    env:
      account_name: 'isvreleases'
      container_name: 'backendrelease'
      filename: 'src.zip'
    steps:
      - uses: actions/checkout@v3
      - name: 'Create ZIP'
        run: |
          zip -r "${filename}" src/
      - name: 'Login via azure/login@v1'
        uses: azure/login@v1
        with:
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          allow-no-subscriptions: true
          environment: azurecloud
          # audience: api://AzureADTokenExchange
      - name: 'Use the token to upload to storage'
        uses: azure/CLI@v1
        with:
          azcliversion: 2.37.0
          inlineScript: |
            blob_name="src-azure-workload-identity-github-${GITHUB_SHA}.zip"
            blob_url="https://${account_name}.blob.core.windows.net/${container_name}/${blob_name}"
            az storage blob upload --auth-mode login --account-name "${account_name}" --container-name "${container_name}" --overwrite --file "${filename}" --name "${blob_name}"
            echo "Uploaded to ${blob_url}"

The use of Azure Storage is just a sample of an Azure resource, that we can access from within GitHub. You can see a few interesting parts in that YAML file:

The YAML file must contain the following lines, so that the GitHub IdP is able to issue a token:

permissions:
  id-token: write
  contents: read

The azure/login@1 task on GitHub automagically handles all the federated sign-in to an SP or a UAMI. In the sample below, we set tenant-id and client-id based on GitHub secret values. If the expected audience on Azure would be different from "api://AzureADTokenExchange", it would also be possible to tweak that value:

      - name: 'Login via azure/login@v1'
        uses: azure/login@v1
        with:
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          allow-no-subscriptions: true
          environment: azurecloud
          # audience: api://AzureADTokenExchange

A full example of that configuration can be found here: https://github.com/chgeuer/azure-workload-identity-github/

GitHub-side setup - The hard way

For those interested in understanding what the azure/login@1 task on GitHub does under the hood, we can mimic it all in a bash script with curl and jq. This allows us to inspect the security tokens, along the way. Let's tweak the step in the CI/CD to just run a shell script:

name: ZIP the source and upload
on:
  workflow_dispatch:
permissions:
  id-token: write
  contents: read
jobs:
  build:
    name: Build the stuff
    runs-on: ubuntu-latest
    env:
      account_name: 'isvreleases'
      container_name: 'backendrelease'
      filename: 'src.zip'
    steps:
      ...
      - name: 'Interact with the Github IDP and Azure Workload Identity Federation from shell'
        env: 
          AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
          AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
        run: |
          ./action.sh

In this code, we copy the tenant ID and client ID from the secrets into environment variables. In action.sh, we now manually do the two token exchanges, and print out the claims from the JWT:

#!/bin/bash

encodedAudience="api%3A%2F%2FAzureADTokenExchange"
gh_access_token="$( curl \
     --silent \
     --url "${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=${encodedAudience}" \
     --header "Authorization: Bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}" \
     | jq -r ".value" )"

azure_access_token="$( curl \
    --silent \
    --request POST \
    --data-urlencode "response_type=token" \
    --data-urlencode "grant_type=client_credentials" \
    --data-urlencode "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
    --data-urlencode "client_id=${AZURE_CLIENT_ID}" \
    --data-urlencode "client_assertion=${gh_access_token}" \
    --data-urlencode "scope=https://storage.azure.com/.default" \
    "https://login.microsoftonline.com/${AZURE_TENANT_ID}/oauth2/v2.0/token" \
    | jq -r ".access_token" )"

gh_claims="$( jq -R 'split(".") | .[1] | @base64d | fromjson' <<< "${gh_access_token}" )"
aad_claims="$( jq -R 'split(".") | .[1] | @base64d | fromjson' <<< "${azure_access_token}" )"

echo "# Tokens

| Token Issuer | Claim    |    Value                                    |
| ------------ | -------- | ------------------------------------------- |
| GitHub       | Issuer   | \`iss=$( echo "${gh_claims}"  | jq .iss )\` |
| GitHub       | Audience | \`aud=$( echo "${gh_claims}"  | jq .aud )\` |
| GitHub       | Subject  | \`sub=$( echo "${gh_claims}"  | jq .sub )\` |
| Azure        | Issuer   | \`iss=$( echo "${aad_claims}" | jq .iss )\` |
| Azure        | Audience | \`aud=$( echo "${aad_claims}" | jq .aud )\` |
| Azure        | Subject  | \`sub=$( echo "${aad_claims}" | jq .sub )\` |
" >> "${GITHUB_STEP_SUMMARY}"

First, we fetch a token from the GitHub's IdP: the environment variable ACTIONS_ID_TOKEN_REQUEST_URL contains the URL of the IdP. The URL-encoded audience goes into the query string, and we supply a GitHub-internal security token from the environment variable ACTIONS_ID_TOKEN_REQUEST_TOKEN as bearer token.

Second, we issue a token issuance request against our Azure AD tenant, in which we specify the UAMI's or SP's client_id, and supply the GitHub-issued JWT as client_assertion.

Then we use jq -R 'split(".") | .[1] | @base64d | fromjson' to extract the claims portion from both JWT tokens, and write out a Markdown-formatted table with iss, aud and sub claims of both tokens to the file "${GITHUB_STEP_SUMMARY}", so that the table shows up in the CI/CD's pipeline output:

In the above output table, GitHub blanks-out (***) the Azure AD tenant, in the Azure/Issuer value, because that string comes from the AZURE_TENANT_ID secret.

In the rest of shell script, the azure_access_token shell variable can be used to call Azure services, in this sample Azure Storage.

GitLab-side setup

On the GitLab side, we don't have task like GitHub's azure/login@1 action, so we follow a pure script-based approach in our GitLab YAML:

image: mcr.microsoft.com/azure-cli:latest

build1:
  stage: build
  id_tokens:
    ID_TOKEN_FOR_AZURE:
      aud: "api://AzureADTokenExchange"
  script:
    - echo "##### Logging-in to the Azure user-assigned managed identity"
    - az login --service-principal --tenant "${AZURE_TENANT_ID}" --username "${AZURE_UAMI_CLIENT_ID}" --federated-token "${ID_TOKEN_FOR_AZURE}" --allow-no-subscriptions
    - echo "##### Demo: fetching a secret from Key Vault"
    - az keyvault secret show --vault-name "${AZURE_KEYVAULT_NAME}" --name "${AZURE_KEYVAULT_SECRET_NAME}" | jq .
    - echo "##### Full token contents"
    - jq -R 'split(".") | .[1] | @base64d | fromjson' <<< "${ID_TOKEN_FOR_AZURE}"
    - echo "##### Config necessary for Azure"
    - jq -R 'split(".") | .[1] | @base64d | fromjson | {issuer:.iss,audiences:[.aud],subject:.sub}' <<< "${ID_TOKEN_FOR_AZURE}"

A difference you can see above is that token issuance within GitLab is handled differently: You don't need to use curl to request a GitLab-issued token from some token endpoint. Instead, you just specify a id_tokens section, in which you name a desired environment variable (ID_TOKEN_FOR_AZURE in the above example), and the audience for that token, and GitLab stored the JWT token in your environment variable of choice, prior running your job.

We request the whole thing to use the Azure CLI Docker image (mcr.microsoft.com/azure-cli:latest), so we can use commands like az login in our script. Inside that script, we can then use

az login --service-principal \
   --tenant "${AZURE_TENANT_ID}" \
   --username "${AZURE_UAMI_CLIENT_ID}" \
   --federated-token "${ID_TOKEN_FOR_AZURE}" \
   --allow-no-subscriptions

To login to a given UAMI or service principal, using a federated token from GitLab.

For illustration purposes, we can fetch a secret from a Key Vault (assuming our UAMI is authorized to read that secret), and print out the token contents on screen.

Given that a service principal, or a UAMI, can have up to 20 federated credentials configured, one can also hook up multiple pipelines (from different providers) to the same Azure identity:

What about Azure DevOps?

Since March '23, Managed Identity support for Azure DevOps is in public preview. When you're running ADO-based CI/CD pipelines on a compute resource (such as a VM) in your own subscription, you can bind a system-assigned or user-assigned managed identity to that compute resource, and from within your pipeline run access all the Azure resources the managed identity has access to.

Federated identity credentials are currently targeted to allow external identity providers (non-Azure AD) to facilitate sign-in to Azure environments. Given that both Azure DevOps and your Azure resources are all governed by Azure Active Directory, there's not necessarily a need to use a federated credential.

As of now (June '23), federated identity credentials unfortunately don't work across Azure AD-tenant boundaries (error message AAD STS 700222). Should you want to allow 'inbound' connections from a managed identity in another tenant into resources in your Azure DevOps tenant, check the team's guidance on "Can I add a managed identity from a different tenant to my organization?"

Examples

In case that article sparked your interest, and you'd like to go deeper with end-to-end samples, check out the following resources...

Last updated