The Scientific Computing Group (SCG) in the Information Technology Division at Lawrence Berkeley National Laboratory (LBNL) is looking for a versatile Linux systems administrator / DevOps Engineer / Site Reliability Engineer to provide computing support to the Berkeley Lab research community. We manage the Lab's High Performance Computing infrastructure and provide state of the art Linux solutions in support of the science at Berkeley Lab. We help to enable some of the most advanced fundamental research in the world by providing the computing tools, networks, and expertise to enable pioneering science.
Under the supervision of the Group Lead or senior team members, the successful candidate will participate in building, integrating and supporting Linux-based resources and end-users to meet the computing needs for various scientific disciplines. In addition, this position will revamp and automate complex sysadmin processes to make them more robust. This person may also support large high performance computing cluster systems depending on the individual's experience, aptitude and skill set. The successful candidate should exhibit a passion for learning, the ability to integrate new computing technologies, an ability to comprehensively re-engineer sysadmin processes, and a deep desire to support scientific research.
What You Will Do:
Within defined policies, procedures and practices provide Linux systems administration and user support for LBNL scientific research groups. This includes:
Linux system and HPC cluster maintenance and installations, operating system upgrades, system security hardening and intrusion detection, storage and file system management, system hardware and peripheral management, customization of user group working environment, troubleshooting, network monitoring, and crash recovery.
Design and implement build, deployment, and configuration management; Build and test automation tools for infrastructure provisioning; Handle code deployments; Monitor metrics and develop ways to improve; Build and manage CI and CD tools.
Assist users with program compilation, commercial and public domain software installation, and use of Linux tools.
Configure, administer, and troubleshoot desktop, server and storage infrastructures as well as racking, installing, and maintaining systems in a datacenter.
Plan, organize, prioritize and complete assigned tasks and projects in a timely manner.
Frequently and clearly communicate task or project status to customers to either set or negotiate expectations.
Market IT Division services to the scientific community by providing excellent customer service coupled with competent technical support skills.
Participate in developing system administration, security, and network policies, documentation, and tools oriented towards efficient systems management.
In addition to the above, the Level 3 Engineer will:
Provide cluster support to LBNL and UC researchers. This includes: travel to remote site if necessary, initial installation, integration and the on-going maintenance of Linux High Performance Computing cluster systems.
Lead technical efforts in one or more areas of HPC technologies such as job schedulers, high performance interconnects, parallel file systems, cybersecurity, cluster management, VM infrastructure, networking, performance tuning, support of scientific applications, or data center planning.
Lead group projects, of small to medium size and complexity, to implement and deploy new computing technologies and associated services to the research community.
What is Required:
Bachelor's degree and a minimum of 5 years of related experience or an equivalent combination of education and work experience.
Linux system administration experience in a large distributed computing environment. Experience providing systems and end-user support for multiple scientific or computational research groups.
Experience with Red Hat Enterprise Linux (including derivatives such as CentOS and Scientific Linux), Debian, Ubuntu and use of large-scale system administration tools and configuration management tools such as Kickstart, Ansible, Puppet, Chef, CFEngine, or in-house developed systems management tools. Support of common services such as NFS, LDAP, CIFS, MySQL, Apache/Nginx HTTPD.
Moderate knowledge of Linux internals, TCP/IP networking, software programming, and cybersecurity concepts. Must demonstrate technical understanding of Linux internals including the boot process, kernel versions, and the differences between major Linux distributions. Experience with building, patching, and modifying Linux RPMs is required. Able to quickly troubleshoot computer and storage hardware problems such as RAID devices, and be familiar with procedures to expedite or coordinate vendor service and bring resolution to outstanding problems.
Must be able to demonstrate programming proficiency in Python and Bash. Must understand how to build, optimize and debug scientific codes that are written in C, C++, Fortran and Java. Must have experience with popular compilers (e.g. GCC, Intel), program debugging tools, use of Makefiles, use of version-control systems such as git and Subversion.
Experience with implementing solutions based on Virtual Machines (VM) technologies such as KVM, VMWare, OpenStack etc. as well as container technologies such as Docker and Singularity.
Excellent interpersonal, communications and customer service skills and exhibit tact and good judgement. Must be able to work with multiple end-user groups where each group may have different needs and requirements. Able to plan, organize, prioritize, and complete assigned tasks and projects with general supervision while providing timely updates on work progress to end-users and co-workers.
Climb stairs, ladders, scaffolds; work at heights on above rack cabling; work in confined spaces, under florescent lights; ability to bend, stoop, kneel, crawl; manual dexterity in both hands; able to lift 60 lbs. to chest height; distinguish colors.
In addition to the above, the Level 3 Engineer Required Qualifications:
Typically requires a minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or equivalent experience in a large distributed computing environment including 2 years experience providing support for Linux HPC clusters used for scientific research.
In-depth expertise in two or more areas of HPC technologies such as Linux operating systems, job schedulers, high performance interconnects, parallel file systems, cybersecurity, cluster management, VM infrastructure, networking, performance tuning, support of scientific applications, or data center planning.
Ability to plan, organize and successfully implement group projects for deploying new technologies and services.
Ability to work on complex issues where analysis of situations or data requires an in-depth evaluation of variable factors.
Experience supporting HPC systems and end-users. HPC Linux clustering technology expertise (Job schedulers, MPI, Infiniband, parallel file systems, parallel programming).
Software engineering or development experience
Previous experience supporting research at a National Lab or academic institution.
This is a full-time career appointment, exempt (monthly paid) from overtime pay.
This position will be hired at a level commensurate with the business needs, skills, knowledge, and abilities of the successful candidate.
This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
Work will be primarily performed at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA though this may be modified to meet Covid-19 restrictions regarding onsite work. Some early morning, evenings, and weekend work will be required to support critical systems and this can be on very short notice.
Equal Employment Opportunity: Berkeley Lab is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status. Berkeley Lab is in compliance with the Pay Transparency Nondiscrimination Provision under 41 CFR 60-1.4. Click here to view the poster and supplement: "Equal Employment Opportunity is the Law."
Internal Number: 92363
About Lawrence Berkeley National Laboratory
In the world of science, Lawrence Berkeley National Laboratory (Berkeley Lab) is synonymous with excellence. Thirteen scientists associated with Berkeley Lab have won the Nobel Prize. Fifty-seven Lab scientists are members of the National Academy of Sciences (NAS), one of the highest honors for a scientist in the United States. Thirteen of our scientists have won the National Medal of Science, our nation's highest award for lifetime achievement in fields of scientific research. Eighteen of our engineers have been elected to the National Academy of Engineering, and three of our scientists have been elected into the Institute of Medicine. In addition, Berkeley Lab has trained thousands of university science and engineering students who are advancing technological innovations across the nation and around the world. Berkeley Lab is a member of the national laboratory system supported by the U.S. Department of Energy through its Office of Science. It is managed by the University of California (UC) and is charged with conducting unclassified research across a wide range of scientific disciplines. Located on a 200-acre site in the hills above the UC Berkeley campus that offers spectacular... views of the San Francisco Bay, Berkeley Lab employs approximately 4,200 scientists, engineers, support staff and students. Its budget for 2011 is $735 million, with an additional $101 million in funding from the American Recovery and Reinvestment Act, for a total of $836 million. A recent study estimates the Laboratory's overall economic impact through direct, indirect and induced spending on the nine counties that make up the San Francisco Bay Area to be nearly $700 million annually. The Lab was also responsible for creating 5,600 jobs locally and 12,000 nationally. The overall economic impact on the national economy is estimated at $1.6 billion a year. Technologies developed at Berkeley Lab have generated billions of dollars in revenues, and thousands of jobs. Savings as a result of Berkeley Lab developments in lighting and windows, and other energy-efficient technologies, have also been in the billions of dollars. Berkeley Lab was founded in 1931 by Ernest Orlando Lawrence, a UC Berkeley physicist who won the 1939 Nobel Prize in physics for his invention of the cyclotron, a circular particle accelerator that opened the door to high-energy physics. It was Lawrence's belief that scientific research is best done through teams of individuals with different fields of expertise, working together. His teamwork concept is a Berkeley Lab legacy that continues today.