Sarah Murphy

sarah@gitupandgo.com · San Francisco, CA · github.com/sarahmaeve

Site reliability engineer with over a decade of experience in release engineering, deployment automation, and incident management, including over seven years at Google and Facebook. Focused on production stability, observability, and infrastructure as code. Currently shipping Go and Rust systems independently with an AI-augmented development workflow.

Experience

Principal Engineer, Independent PracticeFeb 2026 – Present

Morrigan Tech

Shipping production-quality Go and Rust systems using LLM code generation; acting as architect, reviewer, and technical director across projects in reliability tooling, developer education, and language-learning platforms.
Designed, built and launched Signatory, a supply chain analysis tool for AI-native developers. Went from zero to MVP in four weeks. Go CLI, MCP server, and local trust database that allows humans and LLMs to query evidence before adopting a code dependency.
Built an append-only change registry in Go and SQLite for correlating incidents with deploys and infrastructure changes.
Shipped two Rust language-learning platforms (rusty-french.com and Intreccio) with generated audio.

Career BreakJul 2022 – Jan 2026

Personal sabbatical: travel, French language study, and family.

Principal Site Reliability EngineerOct 2021 – Jul 2022

Treasure Financial

Recruited directly by the incoming CTO as the lead IC; stabilized infrastructure and automated deployments, reducing cadence from biweekly (2+ week sprints) to daily releases with on-demand hotfixes.
Identified the need for production observability, selected Honeycomb as the vendor, and led its adoption; restructured alert systems and on-call rotations.
Converted Kubernetes configurations into infrastructure as code, enabling scalability and streamlining regulatory compliance.

Kubernetes, GKE, GCP, Docker, Helm, ArgoCD, Flux, CI/CD, Python, PagerDuty

Sr. Staff Site Reliability EngineerOct 2020 – Sep 2021

WePay (JPMorgan Chase)

Introduced blameless postmortems and restructured the incident management process, creating a formal incident commander rotation to reduce recurrence.
Served as site commander and incident commander in on-call rotations; expanded and formalized rotation structure for faster failovers.
Tech lead for an SRE team; trained junior and mid-level SREs on process changes, code reviews, and on-call participation.

Kubernetes, GCP, Terraform, Docker, Python, Prometheus

Principal Site Reliability EngineerJun 2018 – Aug 2019

Microsoft

Championed SRE best practices – including SLO-driven reliability, blameless postmortems, and automated release processes – across Microsoft Azure teams globally.
Created release engineering and site reliability training videos distributed on Microsoft’s internal TV network; consulted with Azure teams on adopting modern production practices.

Azure, AKS, Azure DevOps, Python, Go

Infrastructure EngineerOct 2016 – Jul 2017

Lever

Modernized configuration management by migrating from manual Chef configs to Terraform, improving reliability and reproducibility.
Added Test Kitchen and CI/CD support for infrastructure validation.
Moved releases from a manual system to rapid ChatOps-driven deployment.

Terraform, Chef, AWS, EC2, Docker, CI/CD, ChatOps, ELK stack, Grafana, Python, Ansible

Senior Release EngineerSep 2012 – Sep 2015

Google

Stabilized Google Play releases, automating builds and reducing deployment time from 20+ hours to 4 hours.
Provided pre-launch release engineering consulting for Google Fi phone service, acting as SRE and Release Engineer for Android Metrics.

Python, Go, Kubernetes, Borg, Linux, TCP/IP

Release EngineerMay 2010 – Sep 2011

Facebook

Recruited by Chuck Rossi as Facebook’s second release engineer, pre-IPO; worked with Rossi and others on the team that created and proved the continuous high-volume release doctrine — multi-deploy-daily at 100,000-node scale — now standard across Silicon Valley and the wider tech industry.
Managed production releases at pre-IPO Facebook, deploying multiple times daily to an infrastructure of over 100,000 nodes serving 500M to 1B users.
Maintained 24/7 on-call responsibility for facebook.com with high-visibility escalation duties.

Python, Linux scripting, internal cloud infrastructure

Release EngineerAug 2006 – Mar 2009

Google

Deployed the web front-end for Google Search (GWS / google.com), participating in the on-call rotation.
Automated and deployed Google Ads, Google’s primary revenue-generating product.

Python, Borg (predecessor to Kubernetes), Blaze (predecessor to Bazel)

Education

Machine Learning for Business with Python (CS68) – Stanford Continuing Studies

Other Interests

Freelance photojournalist in North Africa and SW Asia, 2002–2005
French (upper-intermediate), Spanish (elementary-intermediate), and Arabic (elementary)
Historical (14th–17th century) fencing with longsword and side sword