Our Team: Bloomberg Law SRE combines software and systems engineering to champion the use of sound engineering principles, operational discipline, and automation. We focus on improving Bloomberg Law (BLAW) product reliability, stability, and scaling with an interest in fault-tolerant distributed system design. Our culture of diversity, intellectual curiosity, methodical problem solving and openness in a blameless environment are keys to our success.
What's in it for you: As a Site Reliability Engineer (SRE) at Bloomberg Law, your mission is to improve reliability, scalability and performance of the BLAW Platform running on hybrid environment (on-premise and AWS). You will be empowered to promote and implement industry-wide SRE best practices. You will have the opportunity to work alongside application engineers across the full stack that uses modern open source web and data processing technologies.
We'll trust you to:
Implement systems that are highly available, scalable and self-healing on Bloomberg data centers and on AWS
Design infrastructure and implement automation using infrastructure-as-code solutions (Terraform)
Improve overall observability by implementing monitoring, metrics, logs and Service Level Objectives (SLO)
Work alongside application engineers as they build/migrate applications on your infrastructure
Troubleshoot production problems as they occur, and drive post-mortem process
Measure current capacity, predict future capacity needs and make suggestions accordingly
You need to have:
3+ years of experience working on highly available, fault-tolerant distributed systems
Experience in developing automated infrastructure in AWS or other cloud providers.
A mindset to ensure stability of production environment, applying software engineering solutions to run/manage applications
Understanding of Linux operating systems and networking.
BS/MS/PhD in Computer Science, Engineering or related technology field
We'd love to see:
Prior experience in AWS infrastructure and related DevOps practices.
A deep understanding stability & reliability engineering (SRE) principles and practices
Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
Create project ideas and implement them with effective collaboration and communication.
Familiarity with kubernetes/docker/containers
Ability to work with diverse teams and personalities
Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.