Occam's razor Archive Pages Categories Tags

Platform Engineering for Small Teams: Building a Lightweight Internal Developer Portal

12 May 2024

Platform engineering is having a moment. Large companies build dedicated teams to create “golden paths” that let developers deploy without thinking about infrastructure. But what if you’re a team of 5-10 engineers? You don’t have the headcount for a platform team, yet you’re drowning in the same toil: inconsistent environments, flaky CI pipelines, and tribal knowledge about how to deploy things.

This post shows how to build a lightweight Internal Developer Portal (IDP) using Kubernetes-native tools. The goal is maximum automation with minimum maintenance - something a small team can actually sustain.


What We’re Building

An IDP that handles three things:

  1. Service Catalog: What services exist, who owns them, how to use them
  2. Environment Setup: Spin up dev/staging/prod with one command
  3. CI/CD with Gates: Automated pipelines that enforce quality standards

The stack:


The Service Catalog: Keep It Simple

Backstage is powerful but complex. For small teams, a convention-based catalog works better.


Convention: The service.yaml File

Every service repo has a service.yaml at the root:

# service.yaml
apiVersion: platform.example.com/v1
kind: Service
metadata:
  name: user-api
  team: backend
  slack: "#backend-alerts"
spec:
  description: "User authentication and profile management"
  language: go
  framework: gin

  dependencies:
    - name: postgres
      type: database
    - name: redis
      type: cache
    - name: auth-service
      type: service

  endpoints:
    production: https://api.example.com/users
    staging: https://staging-api.example.com/users

  runbook: ./docs/runbook.md

  slos:
    availability: 99.9%
    latency_p99: 200ms

This file is the source of truth. It’s versioned with the code, so it stays up to date.


Catalog Generation

A simple script aggregates all service.yaml files into a searchable catalog:

#!/usr/bin/env python3
# scripts/generate-catalog.py

import yaml
import json
import subprocess
from pathlib import Path

def get_repos():
    """Get all repos from GitHub org"""
    result = subprocess.run(
        ["gh", "repo", "list", "your-org", "--json", "name,url", "-L", "200"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

def fetch_service_yaml(repo_name):
    """Fetch service.yaml from repo"""
    result = subprocess.run(
        ["gh", "api", f"/repos/your-org/{repo_name}/contents/service.yaml"],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        return None

    content = json.loads(result.stdout)
    import base64
    return yaml.safe_load(base64.b64decode(content["content"]))

def main():
    catalog = {"services": []}

    for repo in get_repos():
        service = fetch_service_yaml(repo["name"])
        if service:
            service["_repo"] = repo["url"]
            catalog["services"].append(service)

    # Output as JSON for the catalog UI
    print(json.dumps(catalog, indent=2))

    # Also generate markdown for docs
    with open("docs/catalog.md", "w") as f:
        f.write("# Service Catalog\n\n")
        for svc in catalog["services"]:
            meta = svc["metadata"]
            spec = svc["spec"]
            f.write(f"## {meta['name']}\n")
            f.write(f"**Team:** {meta['team']} | **Slack:** {meta['slack']}\n\n")
            f.write(f"{spec['description']}\n\n")
            f.write(f"- **Language:** {spec['language']}\n")
            f.write(f"- **Runbook:** [{spec['runbook']}]({svc['_repo']}/blob/main/{spec['runbook']})\n\n")

if __name__ == "__main__":
    main()

Run this on a cron job (or GitHub Action) to keep the catalog fresh. The output feeds a simple static site or even just a markdown file in your docs repo.


Environment Setup with Crossplane

Crossplane lets you define infrastructure as Kubernetes resources. This means your environment definitions live in Git and deploy through the same ArgoCD pipeline as your applications.


Composite Resource Definition

First, define what an “environment” means for your organization:

# platform/apis/environment-definition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xenvironments.platform.example.com
spec:
  group: platform.example.com
  names:
    kind: XEnvironment
    plural: xenvironments
  versions:
    - name: v1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                name:
                  type: string
                  description: "Environment name (dev, staging, prod)"
                size:
                  type: string
                  enum: [small, medium, large]
                  default: small
                services:
                  type: array
                  items:
                    type: string
                  description: "List of services to deploy"
              required:
                - name
                - services


Composition: What Gets Created

The composition defines what infrastructure to provision for each environment:

# platform/compositions/environment-composition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: environment-aws
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1
    kind: XEnvironment

  resources:
    # Namespace for the environment
    - name: namespace
      base:
        apiVersion: kubernetes.crossplane.io/v1alpha1
        kind: Object
        spec:
          forProvider:
            manifest:
              apiVersion: v1
              kind: Namespace
              metadata:
                name: "" # patched
      patches:
        - fromFieldPath: spec.name
          toFieldPath: spec.forProvider.manifest.metadata.name

    # RDS PostgreSQL instance
    - name: database
      base:
        apiVersion: database.aws.crossplane.io/v1beta1
        kind: RDSInstance
        spec:
          forProvider:
            region: us-west-2
            dbInstanceClass: db.t3.micro  # patched based on size
            engine: postgres
            engineVersion: "15"
            masterUsername: admin
            allocatedStorage: 20
            skipFinalSnapshot: true
          writeConnectionSecretToRef:
            namespace: "" # patched
            name: "" # patched
      patches:
        - fromFieldPath: spec.name
          toFieldPath: spec.forProvider.dbName
        - fromFieldPath: spec.name
          toFieldPath: spec.writeConnectionSecretToRef.namespace
        - fromFieldPath: spec.name
          toFieldPath: spec.writeConnectionSecretToRef.name
          transforms:
            - type: string
              string:
                fmt: "%s-db-creds"
        # Size mapping
        - fromFieldPath: spec.size
          toFieldPath: spec.forProvider.dbInstanceClass
          transforms:
            - type: map
              map:
                small: db.t3.micro
                medium: db.t3.small
                large: db.t3.medium

    # Redis ElastiCache
    - name: cache
      base:
        apiVersion: cache.aws.crossplane.io/v1beta1
        kind: ReplicationGroup
        spec:
          forProvider:
            region: us-west-2
            engine: redis
            cacheNodeType: cache.t3.micro  # patched based on size
            numCacheClusters: 1
      patches:
        - fromFieldPath: spec.name
          toFieldPath: spec.forProvider.replicationGroupDescription
        - fromFieldPath: spec.size
          toFieldPath: spec.forProvider.cacheNodeType
          transforms:
            - type: map
              map:
                small: cache.t3.micro
                medium: cache.t3.small
                large: cache.t3.medium


Creating an Environment

Now developers can spin up environments with a simple YAML file:

# environments/staging.yaml
apiVersion: platform.example.com/v1
kind: XEnvironment
metadata:
  name: staging
spec:
  name: staging
  size: medium
  services:
    - user-api
    - order-service
    - notification-service

Commit this to Git, and ArgoCD provisions everything: namespace, database, cache, and deploys the services.


CLI Wrapper

Make it even simpler with a CLI:

#!/bin/bash
# platform-cli

case "$1" in
  env:create)
    ENV_NAME=$2
    SIZE=${3:-small}

    cat <<EOF | kubectl apply -f -
apiVersion: platform.example.com/v1
kind: XEnvironment
metadata:
  name: $ENV_NAME
spec:
  name: $ENV_NAME
  size: $SIZE
  services: []
EOF
    echo "Environment '$ENV_NAME' created. Add services with: platform-cli env:add-service $ENV_NAME <service>"
    ;;

  env:add-service)
    ENV_NAME=$2
    SERVICE=$3
    kubectl patch xenvironment $ENV_NAME --type=json \
      -p="[{\"op\": \"add\", \"path\": \"/spec/services/-\", \"value\": \"$SERVICE\"}]"
    ;;

  env:delete)
    ENV_NAME=$2
    kubectl delete xenvironment $ENV_NAME
    echo "Environment '$ENV_NAME' and all resources will be deleted."
    ;;

  env:list)
    kubectl get xenvironments -o wide
    ;;

  *)
    echo "Usage: platform-cli <command>"
    echo "Commands:"
    echo "  env:create <name> [size]  - Create new environment"
    echo "  env:add-service <env> <svc> - Add service to environment"
    echo "  env:delete <name>         - Delete environment"
    echo "  env:list                  - List all environments"
    ;;
esac


CI/CD with Quality Gates

GitHub Actions handles CI. The key is defining reusable workflows that enforce standards without slowing teams down.


Reusable CI Workflow

Create a shared workflow that all services use:

# .github/workflows/ci-template.yaml (in your platform repo)
name: CI Template

on:
  workflow_call:
    inputs:
      language:
        required: true
        type: string
      run_e2e:
        required: false
        type: boolean
        default: false
    secrets:
      SONAR_TOKEN:
        required: false

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Lint (Go)
        if: inputs.language == 'go'
        run: |
          go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
          golangci-lint run

      - name: Lint (Python)
        if: inputs.language == 'python'
        run: |
          pip install ruff
          ruff check .

      - name: Lint (TypeScript)
        if: inputs.language == 'typescript'
        run: |
          npm ci
          npm run lint

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Test (Go)
        if: inputs.language == 'go'
        run: go test -race -coverprofile=coverage.out ./...

      - name: Test (Python)
        if: inputs.language == 'python'
        run: |
          pip install pytest pytest-cov
          pytest --cov=. --cov-report=xml

      - name: Test (TypeScript)
        if: inputs.language == 'typescript'
        run: |
          npm ci
          npm test -- --coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v3

  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

      - name: Run Semgrep
        uses: returntocorp/semgrep-action@v1
        with:
          config: p/default

  build:
    needs: [lint, test, security]
    runs-on: ubuntu-latest
    outputs:
      image_tag: $
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: $
          password: $

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/$
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=ref,event=pr

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: $
          cache-from: type=gha
          cache-to: type=gha,mode=max


Service-Specific CI

Each service calls the template:

# .github/workflows/ci.yaml (in each service repo)
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  ci:
    uses: your-org/platform/.github/workflows/ci-template.yaml@main
    with:
      language: go
    secrets:
      SONAR_TOKEN: $

That’s it. All the lint, test, security, and build logic is centralized. Update the template, and all services get the improvement.


Quality Gates

Add gates that block merges if standards aren’t met:

# In the template, add a gate job
  gate:
    needs: [lint, test, security, build]
    runs-on: ubuntu-latest
    steps:
      - name: Check coverage threshold
        run: |
          COVERAGE=$(curl -s "https://codecov.io/api/v2/github/your-org/repos/$/commits/$" \
            -H "Authorization: Bearer $" | jq '.totals.coverage')

          if (( $(echo "$COVERAGE < 70" | bc -l) )); then
            echo "Coverage is $COVERAGE%, must be at least 70%"
            exit 1
          fi

      - name: Check for breaking changes
        if: github.event_name == 'pull_request'
        run: |
          # Use oasdiff or similar for API breaking change detection
          # This is a placeholder - implement based on your API spec format
          echo "Checking for breaking API changes..."


GitOps Deployment with ArgoCD

ArgoCD watches Git and deploys changes automatically. The pattern is:

  1. CI builds image, pushes to registry
  2. CI updates image tag in GitOps repo
  3. ArgoCD detects change, deploys to cluster


Application Definition

# gitops/applications/user-api.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: user-api
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/gitops
    targetRevision: HEAD
    path: services/user-api
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true


Kustomize Overlays

Structure your GitOps repo with Kustomize for environment-specific config:

gitops/
├── services/
│   └── user-api/
│       ├── base/
│       │   ├── kustomization.yaml
│       │   ├── deployment.yaml
│       │   └── service.yaml
│       └── overlays/
│           ├── staging/
│           │   ├── kustomization.yaml
│           │   └── replicas-patch.yaml
│           └── production/
│               ├── kustomization.yaml
│               └── replicas-patch.yaml

Base deployment:

# gitops/services/user-api/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: user-api
  template:
    metadata:
      labels:
        app: user-api
    spec:
      containers:
        - name: user-api
          image: ghcr.io/your-org/user-api:latest  # Updated by CI
          ports:
            - containerPort: 8080
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: user-api-db-creds
                  key: url

Production overlay with more replicas:

# gitops/services/user-api/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
patches:
  - path: replicas-patch.yaml
images:
  - name: ghcr.io/your-org/user-api
    newTag: abc123  # Updated by CI


CI Updates GitOps Repo

Add a step to CI that updates the image tag:

  deploy:
    needs: [build]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - name: Checkout GitOps repo
        uses: actions/checkout@v4
        with:
          repository: your-org/gitops
          token: $

      - name: Update image tag
        run: |
          cd services/$/overlays/production
          kustomize edit set image ghcr.io/your-org/$:$

      - name: Commit and push
        run: |
          git config user.name "CI Bot"
          git config user.email "ci@example.com"
          git add .
          git commit -m "Deploy $:$"
          git push

ArgoCD picks up the change and rolls out the new version.


Putting It Together

The full flow:

  1. Developer creates service.yaml in new repo
  2. Catalog script picks it up, adds to documentation
  3. Developer runs platform-cli env:create dev for a dev environment
  4. Crossplane provisions database, cache, namespace
  5. Developer pushes code
  6. CI runs lint, test, security scan, builds image
  7. CI updates GitOps repo with new image tag
  8. ArgoCD deploys to the environment
  9. Quality gates enforce standards before merge

All of this runs on ~4 tools (Crossplane, ArgoCD, GitHub Actions, a shell script) and requires no dedicated platform team to maintain.


Maintenance Tips

The goal isn’t to build a perfect platform. It’s to eliminate the most common sources of toil so your small team can focus on building product.

blog comments powered by Disqus