The Maryland Advanced Research Computing Center (MARCC) is a state of the art High Performance Computing (HPC) facility that provides resources (HPC, storage and analytics) for researchers at Johns Hopkins University, The University of Maryland at College Park and eventually to all other schools in the state of Maryland. The Sr. Software Engineer (Research Facilitator) provides direct support to all users on the effective utilization of resources, code development, debugging, optimization, installation and maintenance of open source scientific applications. The incumbent will enable faculty to advance research-computing agendas by providing direct technical support and training modules on HPC topics to the research community. There will be also many opportunities to establish scientific collaborations and partnerships with research groups.
Describe the position’s roles & interactions: What is the position’s role in interacting with clients? Describe the types of clients supported. What is the position’s role in system/application maintenance and/or development, i.e. which part of system/project is position responsible for, what function does position perform? Participate? Collaborate? Lead? What degree of supervision does the position receive, i.e. how work is assigned, carried out, and reviewed? At what level of independence does position function?
The Sr. Software Engineer will interact directly with the user community to support their research computing activities. There will be many opportunities to teach, train and educate users in traditional HPC topics but perhaps more importantly in new techniques and methodologies in HPC. These topics may include programming, parallel and gpu programming, data intensive applications, scripting and best practices in advanced computing. The Sr. System Engineer will also interact with the systems group to assist them in troubleshooting and fixing potential problems. The Sr. position should be independent but the director supervises projects, reviews progress and may assign projects or tasks as necessary.
Describe the specific systems, applications, projects for which the position is responsible: Describe the type of system/application and its functionality. How does it work? Who uses it? What is the systems/applications impact on an employee’s/client’s work? What makes the system/application complex?
MARCC houses a set of HPC clusters used by a wide variety of researchers. Their HPC skills are very diverse and many need extensive support.
Computational clusters. A set of computers with a high speed interconnect that will run mostly parallel jobs. The clusters are used by computational scientists that need to generate and analyze large amounts of data.
High Performance storage. Managed storage mounted directly to the compute nodes to allow intensive Input/Output (I/O) for all applications. Used by most the computational scientists.
Unmanaged storage. A set of storage enclosures dedicated for big data projects.
Middleware management. A set of packages that need to be installed so the interaction of users with the cluster is controlled and more effective.
Accounting and statistics. Create and manage authentication to the system
Describe scale/size of area, project and/or system supported(# of users, # of servers, # of machines, # of systems supported, transaction volume, # of schools/areas that use system, # of environments, geography, # of interfaces/integration with other systems, etc.): Explain how these factors contribute to the complexity of the job.
MARCC provides services to over 1000 active researchers (approximately 450 research groups) from these schools: KSAS, SOM, WSE, PH, and UMCP. Currently, we have two HPC clusters. The Bluecrab consists of over 23,000 cores and the new Rockfish cluster of over 18,000 cores and expected to double its size within one year. Most of these researchers utilize different applications and conduct different types of computing. We expect a multitude of large parallel jobs (256 cores per job), and high throughput computing jobs that use job arrays (thousands of serial jobs like parameter sweeps), serial and parallel jobs that require large memory, and several users with applications that use GPUs. The system is located at the Bayview facility and will interact with other HPC systems at JHU and at UMCP. Although this is a shared facility, the diverse set of applications and users is definitely a factor in the complexity of the system, along with the different applications that need to be synchronized for seamless functioning of the facility.
List required & preferred skills specific to position:
Proficient in scientific programming languages, C, C++, or Fortran
Experience in parallel programming, MPI and/or OpenMP
In-depth knowledge in the design, organization of cutting-edge technology in HPC environments.
Advanced knowledge of Linux, PHP/Python/Perl technology/toolkits.
In-depth understanding of HPC Cluster management software.
Understanding of massive high performance parallel storage and methodologies.
Understand, implement, troubleshoot, and support batch and workload management systems, including diagnosis of failed jobs, implementation of policies, and investigations of new features and services.
Experience installing and configuring infrastructure applications by following industry best practices to deliver effective solutions.
Experience designing, developing, debugging and optimizing scientific applications
Proficiency on scientific applications like Matlab, R, others per discipline.
In-depth understanding of data management best practices
Understanding of data architecture
Must have the ability to multi-task and prioritize.
Must be adaptable and able to meet conflicting deadlines.
Exceptional organizational skills.
The ability to interact with peer institutions to support HPC directives effectively; furthering the goals of the MARCC facility.
Excellent oral and written interpersonal skills in terms of customer service, training, and evangelism of new technologies, negotiation, and persuasion.
Produce effective and thorough technical documentation.
Provide outstanding direct and indirect user support.
Research, recommend, and implement new technologies based on the value to the research facility.
GPU and Cuda programming desired but not required
Familiarity with visualization packages, Visit, Paraview desired but not required
Experience building containers to facilitate workflows and software pipelines desired but not required
Experience with SLURM desired but not required. On call requirements (if applicable): yes Preferred Qualifications:
Minimum 5 years’ experience providing user support on an HPC environment
Responsible for the creation, implementation, maintenance, performance, production support and documentation of various departmental and enterprise-wide application systems. This includes but is not limited to the installation, modification, and testing of new and/or upgraded applications (packages or home grown), operating systems, file structures, hardware, communication devices, and productivity tools. Applies analysis techniques and procedures to gather and then translate business requirements into functional/technical specifications and designs. Using functional specifications and designs, produces all or part of the deliverables. Maintains databases and application system code.
Responsible for full life-cycle of large/long-term highly complex projects. Typically manages multiple projects of varying complexities. Based on expert technical knowledge, skills and experience, develops broad-based solutions involving multifaceted technologies, and business processes. Leads overall strategy, design & architecture for solutions.
Specific duties & responsibilities:
The responsibilities listed below are typical examples of the work performed by this position. Not all duties assigned to this position are included, nor is it expected that everyone in this position will be assigned every job responsibility.
ANALYSIS AND REQUIREMENTS GATHERING
Define highly complex business/clinical/education problems by meeting with clients to observe and understand current processes and the issues related to those processes. Provide written documentation of findings to share with the client and other IT colleagues.
Gather highly complex system requirements by meeting with clients and researching existing technology to understand the business requirements and possible solutions for new applications.
DESIGN AND DEVELOPMENT
Develop detailed tasks and project plans by analyzing project scope and milestones for highly complex projects in order to ensure product is delivered in a timely fashion according to software lifecycle standards. Direct lower level staff by reviewing tasks and milestones for adherence to quality of deliverables.
Write functional/technical specifications from the highly complex system requirements, putting them into functional and technical descriptions for use by programmers and business analysts to develop technical solutions. Direct lower level staff by reviewing their completed work.
Develop/change data input, files/database structures, data transformation, algorithms, and data output by using appropriate computer language/tools to provide technical solutions for highly complex application development tasks. Direct lower level staff by reviewing their work.
Document code and associated processes by adhering to development methodologies, adding code comments and appropriate documentation to various knowledge-base system(s) to simplify code maintenance and to improve support. Direct lower level staff by reviewing their work.
Provide monitoring and guidance in application design and development to more junior staff. Give direction and leadership in techniques and tools to lower level staff.
Provide experienced leadership for strategic planning in designing and developing comprehensive innovative integrated solutions.
TESTING AND DOCUMENTATION
Create and document highly complex test scenarios using the appropriate testing tools to validate and verify application functionality.
Test all changes by using the appropriate highly complex test scenarios to ensure all delivered solutions work as expected and errors are handling in a meaningful way.
Author and maintain documentation by writing audience-appropriate materials to serve as technical and/or end-user references.
Mentor junior staff in testing tools and technologies by reviewing their work.
IMPLEMENTATION AND MAINTENANCE
Implement changes by adhering to the change management policies and procedures for any given project to communicate to all parties the nature, significance, and risk factors of the solution.
Monitor changes and resolve highly complex problems requiring the highest level of technical expertise by responding as they occur, by reviewing all processing and output of the newly implemented solution, and by proactively ensuring the solution works successfully in order to satisfy the customer requirements and to provide a smooth transition to the new solution.
Provide support by investigating and resolving highly complex issues to ensure prompt, effective service.
Minimum qualifications (mandatory):
Bachelor’s degree required.
Six years of related work experience with computer systems and applications.
Additional experience may be substituted for education. Additional education may substitute for experience.
Preferred Job Qualifications:
Knowledge in the assigned application as well as the platform on which it runs.
Special knowledge, skills, and abilities:
Must possess all requisite knowledge, skills, and abilities as posted in the supplemental section.
Must demonstrate strong critical thinking and analytical reasoning skills.
Ability to work on multiple priorities effectively.
Ability to prioritize conflicting demands.
Ability to execute assigned project tasks within established schedule.
Ability to work collaboratively in a team environment.
Ability to communicate effectively in the service of users and colleagues.
Writes and communicates clearly and concisely.
Possesses sound documentation skills.
Ability to maintain confidentiality.
Must demonstrate exemplary customer service skills
The successful candidate(s) for this position will be subject to a pre-employment background check.
If you are interested in applying for employment with The Johns Hopkins University and require special assistance or accommodation during any part of the pre-employment process, please contact the HR Business Services Office at email@example.com. For TTY users, call via Maryland Relay or dial 711.
The following additional provisions may apply depending on which campus you will work. Your recruiter will advise accordingly.
During the Influenza ("the flu") season, as a condition of employment, The Johns Hopkins Institutions require all employees who provide ongoing services to patients or work in patient care or clinical care areas to have an annual influenza vaccination or possess an approved medical or religious exception. Failure to meet this requirement may result in termination of employment.
The pre-employment physical for positions in clinical areas, laboratories, working with research subjects, or involving community contact requires documentation of immune status against Rubella (German measles), Rubeola (Measles), Mumps, Varicella (chickenpox), Hepatitis B and documentation of having received the Tdap (Tetanus, diphtheria, pertussis) vaccination. This may include documentation of having two (2) MMR vaccines; two (2) Varicella vaccines; or antibody status to these diseases from laboratory testing. Blood tests for immunities to these diseases are ordinarily included in the pre-employment physical exam except for those employees who provide results of blood tests or immunization documentation from their own health care providers. Any vaccinations required for these diseases will be given at no cost in our Occupational Health office.
Equal Opportunity Employer Note: Job Postings are updated daily and remain online until filled.
Johns Hopkins University remains committed to its founding principle, that education for all students should be grounded in exploration and discovery. Hopkins students are challenged not just to learn but also to advance learning itself. Critical thinking, problem solving, creativity, and entrepreneurship are all encouraged and nourished in this unique educational environment. After more than 130 years, Johns Hopkins remains a world leader in both teaching and research. Faculty members and their research colleagues at the university's Applied Physics Laboratory have each year since 1979 won Johns Hopkins more federal research and development funding than any other university. The university has nine academic divisions and campuses throughout the Baltimore-Washington area. The Krieger School of Arts and Sciences, the Whiting School of Engineering, the School of Education and the Carey Business School are based at the Homewood campus in northern Baltimore. The schools of Medicine, Public Health, and Nursing share a campus in east Baltimore with The Johns Hopkins Hospital. The Peabody Institute, a leading professional school of music, is located on Mount Vernon Place in downtown Bal...timore. The Paul H. Nitze School of Advanced International Studies is located in Washington's Dupont Circle area.