Sarah Murphy
Principal SRE, independent practice — San Francisco
Reliability engineering, incident management, and AI-native infrastructure work, for engineering orgs that need principal-level judgment without the full-time headcount.
Previously: Google · Facebook · Microsoft · WePay / JPMorgan Chase
What I do
Scoped diagnostic day — a written diagnostic with prioritized recommendations, delivered within one week. Named shapes include On-Call Audit, Postmortem Corpus Review, and AI-Coding Adoption Risk Assessment, among others.
Retainers — ongoing access to principal-level SRE judgment, structured as Advisory, Fractional, or Embedded.
Specialized engagements — incident facilitation, on-call restructuring, observability adoption, SRE practice standup, AI-augmented development workflow.
Why me
15+ years of production SRE experience — Google (twice), Facebook (pre-IPO, 1B users), Microsoft Azure, and WePay. Recruited by Chuck Rossi as Facebook’s second release engineer, pre-IPO; helped prove the continuous high-volume release doctrine — multi-deploy-daily at 100,000-node scale — now standard across Silicon Valley and tech as a whole.
Current practice is AI-native — I ship production Go and Rust systems using an LLM-augmented workflow, acting as architect, reviewer, and technical director. Projects in flight include Signatory (supply-chain trust analysis, launching soon) and several technical curricula.
Senior-IC only — I solve the problems your current team can’t yet solve. I don’t ship feature code, write runbooks, or carry a pager. If your need is implementation capacity, I can help you hire for it.
Recent writing
- The Shadow Glass
- What we talk about when we talk about code
- Fixing broken postmortems
- Release engineering lessons
Stay in the loop
Writing on reliability, incident management, and AI-native engineering practice. Roughly every other week, no filler.