Rubin Observatory / SLAC - SRE / Software Engineer
Location: Menlo Park, CA (REMOTE - US) Type: Full-time
About the Role
The Vera C. Rubin Observatory in Chile is a new astronomy facility creating a 10-year time-lapse map of the southern sky. It features the world’s largest digital camera, emitting 20GB of raw pixels per minute. The team is developing the Prompt Processing System to distribute alerts in real-time on every astrophysical object that has moved, changed, or appeared. (https://rubinobservatory.org/news/first-alerts)
This position is based at SLAC, a DOE-funded national laboratory hosting the data facility and many of the 80 scientists and engineers of the world-wide Rubin Data Management team. (https://www6.slac.stanford.edu/)
The SRE will be responsible for the reliability of the Prompt Processing Framework, a Kubernetes-based, event-driven system.
Responsibilities
- Own the reliability of the Prompt Processing Framework.
- Write software to improve resilience.
- Operate and evolve core infrastructure services.
- Build monitoring, alerting, and on-call practices to ensure system robustness during nightly observing operations.
Stack
- Python
- Kubernetes
- Helm
- ArgoCD
- Kafka
- Redis
- KEDA
- InfluxDB
- PostgreSQL
- Cassandra
Apply
Details and apply here: ls.st/sre-ad