We’re looking for a Site Reliability Engineer to design, implement, and deliver software and infrastructure solutions to improve the scalability, availability, and efficiency of Pinterest’s services. The Core team operates the most fundamental layers of Pinterest’s global infrastructure, which handles billions of requests per month.
What you'll do:
- Influence and create new designs, architectures, standards and methods for large-scale distributed systems with a focus on operability
- Collaborate with developers in the deployment and scaling of new product features
- Perform deep dives into reliability issues, partnering with software and systems engineers across the organization to produce and roll out fixes
What we're looking for:
- Proficient in a scripting language, Python preferred. Systems languages (Go, C) are a plus
- Strong knowledge of Linux/Unix/BSD internals and shell scripting. Production experience with JVM, Python, and Golang runtimes are a plus
- Deep knowledge of a configuration management tool (i.e. Puppet, Chef, Ansible, Salt, CFEngine). Experience with containers is a plus
- Experience operating in a modern cloud environment such as AWS, GCP, or Azure or large scale data centers
- Familiarity with distributed systems including service discovery, pub/sub, search indexing, storage, and caching. We use Zookeeper, Kafka, Elasticsearch, MySQL, Hbase, and Memcache respectively.