owen.bio

Caesars Platform Optimisation

Caesars previously managed custom CI agents manually and had challenges responding to cyber events time-efficiently. I've lead the set-up of simpler self-hosted CI agents and internal tools to manage cyber incidents faster.

What shipped

  • Business Web Utilities
    • Next.js internal app that generates reports of cyber incidents
    • Terraform with simple CI to AWS serverless infra
  • Self-hosted GitHub Enterprise runners
    • MVP deployed EC2s to AWS with a userdata script to automate provisioning of all CI agents (previously manual)
    • Post-MVP proof-of-concept using GitHub ARC on EKS

Impact

  • Cost saving: $30k/yr saved in CI costs
  • Time-to-market: ~2min to spin up new CI agents
  • Efficiency: 2x faster builds with ARM64 architecture
  • Time-to-triage: Double-digit reduction via new report tool
  • Multi-team adoption: Newer, simpler tech matching teams needs

My role & scope

  • Lead proposal & designed tech stack for Business Web Utilities
  • Sole individual contributor on self-hosted GitHub agents
  • Knowledge-transferred CI agents to DevOps team to take ownership and move forward with emphemeral plan

Key decisions & trade-offs

  • Chose EC2 over EKS to rapidly enable reliable CI - but it created large job queues, increasing wait time for devs
  • Adopted GitHub ARC PoC to enable emphemeral runners to remove queues for jobs, enabling faster dev work
  • Used Next.js with React and not monorepo react native to reduce time to market, but missing reuse opportunities

Tech stack

  • Next.js, React
  • AWS with ECS, Fargate, CloudWatch
  • EC2s for GitHub runners, then EKS proof-of-concept for GitHub ARC