New York, NY, USA

Site Reliability Engineer, Cloud Services

New York City

Our Site Reliability Engineer will help build the best database management service for the leading document database server in the world. MongoDB’s Cloud Management service runs databases holding petabytes of data and processes over a billion metrics and tens of billions of backup operations every day. But we have barely begun. In the future, our online database service will auto-scale, self-heal and hide nearly all of the complexity of running a large scalable system.


  • Manage the infrastructure for a cloud service that processes a billion metrics per day, and replicates tens of billions of database writes to our backup service
  • Design, implement, operate and troubleshoot the automation and monitoring of a service that seamlessly spans several data centers and several cloud providers
  • Become an expert in MongoDB performance, helping us optimize from the application level all the way through the firmware
  • Participate in a weekly on-call rotation, and make trips to our data centers as needed
  • Troubleshoot and resolve issues in multiple environments
  • Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability


  • You are passionate about the revolution going on in information technology as core services migrate to the cloud
  • You have experience running a mission critical service at scale
  • A working knowledge of information security issues
  • Prior experience as a systems administrator in a Linux environment
  • Firm grasp of at least one modern programming language, beyond basic scripting
  • Solid experience using configuration management frameworks (e.g. Chef, Puppet)
  • Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
  • Bachelor’s degree in Computer Science or equivalent experience
  • Experience with Amazon Web Services

Nice to haves

  • Experience building large applications from scratch, complete with deployment tools
  • Experience writing automation tools & eagerness to "automate all the things"
  • Experience in networking, security, hardware or OS performance tuning
  • Experience with Google Compute, Microsoft Azure and other cloud services

How to apply

Please email your resume direct to Tom Cirri at

Does this job really require Go skills? If not, please report it and we will take a look.