Platform engineering is having a moment. Large companies build dedicated teams to create “golden paths” that let developers deploy without thinking about infrastructure. But what if you’re a team of 5-10 engineers? You don’t have the headcount for a platform team, yet you’re drowning in the same toil: inconsistent environments, flaky CI pipelines, and tribal knowledge about how to deploy things.
This post shows how to build a lightweight Internal Developer Portal (IDP) using Kubernetes-native tools. The goal is maximum automation with minimum maintenance - something a small team can actually sustain.
An IDP that handles three things:
The stack:
Backstage is powerful but complex. For small teams, a convention-based catalog works better.
service.yaml FileEvery service repo has a service.yaml at the root:
# service.yaml
apiVersion: platform.example.com/v1
kind: Service
metadata:
name: user-api
team: backend
slack: "#backend-alerts"
spec:
description: "User authentication and profile management"
language: go
framework: gin
dependencies:
- name: postgres
type: database
- name: redis
type: cache
- name: auth-service
type: service
endpoints:
production: https://api.example.com/users
staging: https://staging-api.example.com/users
runbook: ./docs/runbook.md
slos:
availability: 99.9%
latency_p99: 200ms
This file is the source of truth. It’s versioned with the code, so it stays up to date.
A simple script aggregates all service.yaml files into a searchable catalog:
#!/usr/bin/env python3
# scripts/generate-catalog.py
import yaml
import json
import subprocess
from pathlib import Path
def get_repos():
"""Get all repos from GitHub org"""
result = subprocess.run(
["gh", "repo", "list", "your-org", "--json", "name,url", "-L", "200"],
capture_output=True, text=True
)
return json.loads(result.stdout)
def fetch_service_yaml(repo_name):
"""Fetch service.yaml from repo"""
result = subprocess.run(
["gh", "api", f"/repos/your-org/{repo_name}/contents/service.yaml"],
capture_output=True, text=True
)
if result.returncode != 0:
return None
content = json.loads(result.stdout)
import base64
return yaml.safe_load(base64.b64decode(content["content"]))
def main():
catalog = {"services": []}
for repo in get_repos():
service = fetch_service_yaml(repo["name"])
if service:
service["_repo"] = repo["url"]
catalog["services"].append(service)
# Output as JSON for the catalog UI
print(json.dumps(catalog, indent=2))
# Also generate markdown for docs
with open("docs/catalog.md", "w") as f:
f.write("# Service Catalog\n\n")
for svc in catalog["services"]:
meta = svc["metadata"]
spec = svc["spec"]
f.write(f"## {meta['name']}\n")
f.write(f"**Team:** {meta['team']} | **Slack:** {meta['slack']}\n\n")
f.write(f"{spec['description']}\n\n")
f.write(f"- **Language:** {spec['language']}\n")
f.write(f"- **Runbook:** [{spec['runbook']}]({svc['_repo']}/blob/main/{spec['runbook']})\n\n")
if __name__ == "__main__":
main()
Run this on a cron job (or GitHub Action) to keep the catalog fresh. The output feeds a simple static site or even just a markdown file in your docs repo.
Crossplane lets you define infrastructure as Kubernetes resources. This means your environment definitions live in Git and deploy through the same ArgoCD pipeline as your applications.
First, define what an “environment” means for your organization:
# platform/apis/environment-definition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xenvironments.platform.example.com
spec:
group: platform.example.com
names:
kind: XEnvironment
plural: xenvironments
versions:
- name: v1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
name:
type: string
description: "Environment name (dev, staging, prod)"
size:
type: string
enum: [small, medium, large]
default: small
services:
type: array
items:
type: string
description: "List of services to deploy"
required:
- name
- services
The composition defines what infrastructure to provision for each environment:
# platform/compositions/environment-composition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: environment-aws
spec:
compositeTypeRef:
apiVersion: platform.example.com/v1
kind: XEnvironment
resources:
# Namespace for the environment
- name: namespace
base:
apiVersion: kubernetes.crossplane.io/v1alpha1
kind: Object
spec:
forProvider:
manifest:
apiVersion: v1
kind: Namespace
metadata:
name: "" # patched
patches:
- fromFieldPath: spec.name
toFieldPath: spec.forProvider.manifest.metadata.name
# RDS PostgreSQL instance
- name: database
base:
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
spec:
forProvider:
region: us-west-2
dbInstanceClass: db.t3.micro # patched based on size
engine: postgres
engineVersion: "15"
masterUsername: admin
allocatedStorage: 20
skipFinalSnapshot: true
writeConnectionSecretToRef:
namespace: "" # patched
name: "" # patched
patches:
- fromFieldPath: spec.name
toFieldPath: spec.forProvider.dbName
- fromFieldPath: spec.name
toFieldPath: spec.writeConnectionSecretToRef.namespace
- fromFieldPath: spec.name
toFieldPath: spec.writeConnectionSecretToRef.name
transforms:
- type: string
string:
fmt: "%s-db-creds"
# Size mapping
- fromFieldPath: spec.size
toFieldPath: spec.forProvider.dbInstanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.small
large: db.t3.medium
# Redis ElastiCache
- name: cache
base:
apiVersion: cache.aws.crossplane.io/v1beta1
kind: ReplicationGroup
spec:
forProvider:
region: us-west-2
engine: redis
cacheNodeType: cache.t3.micro # patched based on size
numCacheClusters: 1
patches:
- fromFieldPath: spec.name
toFieldPath: spec.forProvider.replicationGroupDescription
- fromFieldPath: spec.size
toFieldPath: spec.forProvider.cacheNodeType
transforms:
- type: map
map:
small: cache.t3.micro
medium: cache.t3.small
large: cache.t3.medium
Now developers can spin up environments with a simple YAML file:
# environments/staging.yaml
apiVersion: platform.example.com/v1
kind: XEnvironment
metadata:
name: staging
spec:
name: staging
size: medium
services:
- user-api
- order-service
- notification-service
Commit this to Git, and ArgoCD provisions everything: namespace, database, cache, and deploys the services.
Make it even simpler with a CLI:
#!/bin/bash
# platform-cli
case "$1" in
env:create)
ENV_NAME=$2
SIZE=${3:-small}
cat <<EOF | kubectl apply -f -
apiVersion: platform.example.com/v1
kind: XEnvironment
metadata:
name: $ENV_NAME
spec:
name: $ENV_NAME
size: $SIZE
services: []
EOF
echo "Environment '$ENV_NAME' created. Add services with: platform-cli env:add-service $ENV_NAME <service>"
;;
env:add-service)
ENV_NAME=$2
SERVICE=$3
kubectl patch xenvironment $ENV_NAME --type=json \
-p="[{\"op\": \"add\", \"path\": \"/spec/services/-\", \"value\": \"$SERVICE\"}]"
;;
env:delete)
ENV_NAME=$2
kubectl delete xenvironment $ENV_NAME
echo "Environment '$ENV_NAME' and all resources will be deleted."
;;
env:list)
kubectl get xenvironments -o wide
;;
*)
echo "Usage: platform-cli <command>"
echo "Commands:"
echo " env:create <name> [size] - Create new environment"
echo " env:add-service <env> <svc> - Add service to environment"
echo " env:delete <name> - Delete environment"
echo " env:list - List all environments"
;;
esac
GitHub Actions handles CI. The key is defining reusable workflows that enforce standards without slowing teams down.
Create a shared workflow that all services use:
# .github/workflows/ci-template.yaml (in your platform repo)
name: CI Template
on:
workflow_call:
inputs:
language:
required: true
type: string
run_e2e:
required: false
type: boolean
default: false
secrets:
SONAR_TOKEN:
required: false
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lint (Go)
if: inputs.language == 'go'
run: |
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
golangci-lint run
- name: Lint (Python)
if: inputs.language == 'python'
run: |
pip install ruff
ruff check .
- name: Lint (TypeScript)
if: inputs.language == 'typescript'
run: |
npm ci
npm run lint
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Test (Go)
if: inputs.language == 'go'
run: go test -race -coverprofile=coverage.out ./...
- name: Test (Python)
if: inputs.language == 'python'
run: |
pip install pytest pytest-cov
pytest --cov=. --cov-report=xml
- name: Test (TypeScript)
if: inputs.language == 'typescript'
run: |
npm ci
npm test -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v3
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
severity: 'CRITICAL,HIGH'
exit-code: '1'
- name: Run Semgrep
uses: returntocorp/semgrep-action@v1
with:
config: p/default
build:
needs: [lint, test, security]
runs-on: ubuntu-latest
outputs:
image_tag: $
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: $
password: $
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/$
tags: |
type=sha,prefix=
type=ref,event=branch
type=ref,event=pr
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: $
cache-from: type=gha
cache-to: type=gha,mode=max
Each service calls the template:
# .github/workflows/ci.yaml (in each service repo)
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
ci:
uses: your-org/platform/.github/workflows/ci-template.yaml@main
with:
language: go
secrets:
SONAR_TOKEN: $
That’s it. All the lint, test, security, and build logic is centralized. Update the template, and all services get the improvement.
Add gates that block merges if standards aren’t met:
# In the template, add a gate job
gate:
needs: [lint, test, security, build]
runs-on: ubuntu-latest
steps:
- name: Check coverage threshold
run: |
COVERAGE=$(curl -s "https://codecov.io/api/v2/github/your-org/repos/$/commits/$" \
-H "Authorization: Bearer $" | jq '.totals.coverage')
if (( $(echo "$COVERAGE < 70" | bc -l) )); then
echo "Coverage is $COVERAGE%, must be at least 70%"
exit 1
fi
- name: Check for breaking changes
if: github.event_name == 'pull_request'
run: |
# Use oasdiff or similar for API breaking change detection
# This is a placeholder - implement based on your API spec format
echo "Checking for breaking API changes..."
ArgoCD watches Git and deploys changes automatically. The pattern is:
# gitops/applications/user-api.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: user-api
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/gitops
targetRevision: HEAD
path: services/user-api
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Structure your GitOps repo with Kustomize for environment-specific config:
gitops/
├── services/
│ └── user-api/
│ ├── base/
│ │ ├── kustomization.yaml
│ │ ├── deployment.yaml
│ │ └── service.yaml
│ └── overlays/
│ ├── staging/
│ │ ├── kustomization.yaml
│ │ └── replicas-patch.yaml
│ └── production/
│ ├── kustomization.yaml
│ └── replicas-patch.yaml
Base deployment:
# gitops/services/user-api/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-api
spec:
replicas: 1
selector:
matchLabels:
app: user-api
template:
metadata:
labels:
app: user-api
spec:
containers:
- name: user-api
image: ghcr.io/your-org/user-api:latest # Updated by CI
ports:
- containerPort: 8080
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: user-api-db-creds
key: url
Production overlay with more replicas:
# gitops/services/user-api/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- path: replicas-patch.yaml
images:
- name: ghcr.io/your-org/user-api
newTag: abc123 # Updated by CI
Add a step to CI that updates the image tag:
deploy:
needs: [build]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Checkout GitOps repo
uses: actions/checkout@v4
with:
repository: your-org/gitops
token: $
- name: Update image tag
run: |
cd services/$/overlays/production
kustomize edit set image ghcr.io/your-org/$:$
- name: Commit and push
run: |
git config user.name "CI Bot"
git config user.email "ci@example.com"
git add .
git commit -m "Deploy $:$"
git push
ArgoCD picks up the change and rolls out the new version.
The full flow:
service.yaml in new repoplatform-cli env:create dev for a dev environmentAll of this runs on ~4 tools (Crossplane, ArgoCD, GitHub Actions, a shell script) and requires no dedicated platform team to maintain.
apiVersion in your CRDs. When you need breaking changes, create a v2.The goal isn’t to build a perfect platform. It’s to eliminate the most common sources of toil so your small team can focus on building product.