Sarah Murphy

Principal SRE, independent practice — San Francisco

Reliability engineering, incident management, and AI-native infrastructure work, for engineering orgs that need principal-level judgment without the full-time headcount.

Previously: Google · Facebook · Microsoft · WePay / JPMorgan Chase

What I do

Scoped diagnostic day — a written diagnostic with prioritized recommendations, delivered within one week. Named shapes include On-Call Audit, Postmortem Corpus Review, and AI-Coding Adoption Risk Assessment, among others.

Retainers — ongoing access to principal-level SRE judgment, structured as Advisory, Fractional, or Embedded.

Specialized engagements — incident facilitation, on-call restructuring, observability adoption, SRE practice standup, AI-augmented development workflow.

Why me

15+ years of production SRE experience — Google (twice), Facebook (pre-IPO, 1B users), Microsoft Azure, and WePay. Recruited by Chuck Rossi as Facebook’s second release engineer, pre-IPO; helped prove the continuous high-volume release doctrine — multi-deploy-daily at 100,000-node scale — now standard across Silicon Valley and tech as a whole.
Current practice is AI-native — I ship production Go and Rust systems using an LLM-augmented workflow, acting as architect, reviewer, and technical director. Projects in flight include Signatory (supply-chain trust analysis, launching soon) and several technical curricula.
Principal-level, hands-on — I solve site reliability and DevOps problems you aren’t staffed to solve, and I ship: production Go and Rust systems, MCP servers, and agent tooling, alongside the diagnostic and advisory work.

Recent writing

All posts →

Stay in the loop

Writing on reliability, incident management, and AI-native engineering practice. Roughly every other week, no filler.

Ready to talk?

sarah@gitupandgo.com