The DataHub Engineering team provides a distributed platform for hosting datasets, complete with managed data stores, search, discovery, batch analytics and real-time stream processing capabilities. The platform provides a single place in the company to discover, access, publish and subscribe to data.
We introduced the abstraction of "dataset", invented a schema language to formally define all "data"@Bloomberg, complete with schema evolution, versioning, and a true point in time semantics. Were the first to introduce Kafka, Avro, Schema Registry, Mesos, Clustered MySQL, Vitess and Spark for ETL, @Bloomberg for designing this new Data Intensive Platform.
DataHub is a platform as a service, composed of many distributed systems like Kafka, Vitess, Spark, Data Lakehouse, Kubernetes. Our goal: ensure high quality content, which is indispensable to financial markets, is cataloged, standardized, discoverable and accessible in one place. We are looking to hire for a variety of skill sets - backend engineers, Spark and Kafka experts, and MySQL/MyRocks specialists!
You'll work in, one of the following:
Distributed Datastores composed of engines like MySQL/MyRocks scaled using Vitess to Petabytes of data with millisecond response on billions of reads and writes across 1000s of financial applications using us as the foundational stores.
Kafka based Data Pipes for company wide Data Mesh architecture and Data Streaming platform. DataHub was the 1st at Bloomberg to introduce Kafka as the foundation for data pipes and has been collaborating with founders of Kafka, since 2013. Love to learn from your experiences building large scale real time data streaming platforms used by 1000s of engineers in the company.
Engineering a Data Lakehouse for large Analytics workloads on financial datasets hosted on the DataHub Platform. You will work with Apache Airflow, Parquet, Avro, Spark, Query Optimizations, S3 store optimizations at the scale of trillions of IOPS.
Engineering, deploying, and managing on-prem and on public clouds Kubernetes clusters, system control planes, writing Kubernetes operators, designing for observability, monitoring, performance, debuggability for all of the above DataHub distributed systems. Help the DataHub platform rapidly deploy to cloud native Kubernetes based foundation.
You'll need to have:
4+ years working with Java/Scala or Go.
A Degree in Computer Science, Engineering, Mathematics, similar field of study or equivalent work experience.
Experience designing and building Software Infrastructure APIs.
Experience designing scalable systems using object stores, messaging and databases.
Engineering systems for observability.
We'd love to see, any one of:
Any of your contributions in open source to Kafka, Spark, Vitess, MySQL, RocksDB, MyRocks
Vitess Production Experience.
Experience with MySQL Clustering Protocols Galera Replication and/or Group Replication.
Knowledge of debugging a cluster manager, YARN/Kubernetes for Spark.
Designing S3 based database storage engines & query optimizer experience for analytics.
Experience building custom resource controllers and Kubernetes operators.
About Us: Meet the DataHub Engineering Team https://www.techatbloomberg.com/blog/meet-the-team-datahub-engineering/
Bloomberg is an equal opportunity employer, and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or maternity/parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law.
Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email email@example.com.