The Senior IT Systems Engineer will oversee a range of mission-critical Linux servers, services, and multi-tiered storage platforms at a high-performance computing facility for Columbia???s Institute for Genomic Medicine and Precision Genomic Laboratory. The incumbent will serve as the primary system administrator for a large SunGridEngine cluster (hosted), virtualization and container environment, and a growing AWS cloud presence. The position will report directly to the Director of IT in the IGM but will work closely with the Bioinformatics and Software Engineering teams.
Responsibilities
Maintain and extend a pervasive automation solution capable of managing the complete lifecycle of all IGM Linux systems (Ansible/Puppet).
Oversee mission critical virtualization and container environments. Ensuring system availability, recovery, performance tuning and information security.
Administer, validate, and review user and system accounts, access controls, audit logs, and system integrity to maximize system security and ensure data confidentiality.
Provide technical support and monitoring of all IGM systems including in-house and open source tools.
Design, deploy, and maintain the IGM HPC environment (SGE/OGE) and cloud services (AWS).
Researches, deploys and optimizes resource management and scheduling software and policies. Develops and implements storage policies.
Serve as the primary contact in an on-call rotation coverage of mission critical functions.
Minimum Qualifications
Bachelor???s degree or equivalent in education and experience, plus three years of related experience
Preferred Qualifications
Bachelor's degree in Computer Science/Information Systems.
Working in a related or research environment, Linux on-prem and cloud-based.
5+ years of experience managing Linux servers (RedHat/Centos) and services. Solid understanding of high availability systems, schedulers, and performance tuning.
Proven experience in Python and Shell scripting. Familiarity with code versioning systems.
Extensive experience designing and maintaining Linux virtual environments (RHV/oVirt/KVM), containers, and system automation.
Thorough understanding of network design and configuration, with the ability to troubleshoot and solve network bottlenecks.
Experience working in a diverse data center environment consisting of an HPC cluster and high-volume storage appliances (NFS/CIFS). Experience managing NetApp, Quantum Active Scale, Avere appliances is a plus.
Extensive experience with Identity Management solutions governing users, hosts, services, authentication procedures, and authorization (FreeIPA, LDAP, Kerberos).
Database management (MySQL) and AWS VPC management experience.
Familiarity with ticketing systems (ServiceNow, Redmine), project management, and data protection policies.
Proven ability to read, understand, and apply technical documentation, and to learn new technologies quickly.
Ability to communicate effectively with team members and customers, both verbally and through documentation.
Equal Opportunity Employer / Disability / Veteran
Columbia University is committed to the hiring of qualified local residents.
Columbia University is one of the world's most important centers of research and at the same time a distinctive and distinguished learning environment for undergraduates and graduate students in many scholarly and professional fields. The University recognizes the importance of its location in New York City and seeks to link its research and teaching to the vast resources of a great metropolis. It seeks to attract a diverse and international faculty and student body, to support research and teaching on global issues, and to create academic relationships with many countries and regions. It expects all areas of the university to advance knowledge and learning at the highest level and to convey the products of its efforts to the world.