Caesars previously managed custom CI agents manually and had challenges responding to cyber events time-efficiently. I've lead the set-up of simpler self-hosted CI agents and internal tools to manage cyber incidents faster.
What shipped
- Business Web Utilities
- Next.js internal app that generates reports of cyber incidents
- Terraform with simple CI to AWS serverless infra
- Self-hosted GitHub Enterprise runners
- MVP deployed EC2s to AWS with a userdata script to automate provisioning of all CI agents (previously manual)
- Post-MVP proof-of-concept using GitHub ARC on EKS
Impact
- Cost saving: $30k/yr saved in CI costs
- Time-to-market: ~2min to spin up new CI agents
- Efficiency: 2x faster builds with ARM64 architecture
- Time-to-triage: Double-digit reduction via new report tool
- Multi-team adoption: Newer, simpler tech matching teams needs
My role & scope
- Lead proposal & designed tech stack for Business Web Utilities
- Sole individual contributor on self-hosted GitHub agents
- Knowledge-transferred CI agents to DevOps team to take ownership and move forward with emphemeral plan
Key decisions & trade-offs
- Chose EC2 over EKS to rapidly enable reliable CI - but it created large job queues, increasing wait time for devs
- Adopted GitHub ARC PoC to enable emphemeral runners to remove queues for jobs, enabling faster dev work
- Used Next.js with React and not monorepo react native to reduce time to market, but missing reuse opportunities
Tech stack
- Next.js, React
- AWS with ECS, Fargate, CloudWatch
- EC2s for GitHub runners, then EKS proof-of-concept for GitHub ARC