The Stanford Research Computing Center (SRCC) is seeking outstanding applicants for the position of HPC System Engineer. Embedded with world-class researchers in the School of Earth, Energy and Environmental Sciences, you will join a dynamic and growing team of technology specialists supporting the computational and data needs of Stanford's research community. This position will specifically focus on managing and supporting HPC clusters.
The successful candidate will be someone who:
Has built, managed, secured and supported HPC clusters before and is comfortable with handling all aspects of that, from racking servers, to configuring networking, to installing software for end-users to providing one-on-one instruction and support
Thrives when working in an academic environment
Is passionate about technology and is driven by challenge and intellectual curiosity
Is self-motivated to learn, sometimes on your own time
Has user support experience and actually likes working with end-users on a daily basis
Is extremely detail-oriented, documents well, and understands the importance of documentation
Isn't afraid of hardware
Understands the need to ensure the usability of systems from the end-users' perspectives
The SRCC is jointly sponsored by University IT (UIT) and the Office of the Dean of Research. The SRCC team of 18 cyberinfrastructure professionals offers research computing platforms, consultation, tool and software development, system engineering, and system administration in support of computational and data-intensive research across the Stanford campus.
This position will provide system administration, engineering and specialized technical consultation for existing and future systems and services for research computing workloads. The position will also specifically have responsibilities for managing high performance computing infrastructure in the School of Earth, Energy and Environmental Sciences and for providing technical consultation to researchers there. The work will include hands-on installation, management and support of complex compute environments, including filesystems and storage platforms, Linux server environments, containers, job schedulers, scientific tools, and application software.
Support and administration of research computing clusters, servers and storage systems, including installation, network and security configuration, monitoring, producing and maintaining system documentation for users, maintenance, application software build/configuration, upgrading, patching, and complex user problem solving. Those systems will be housed in Stanford data centers.
Provision computing platforms and associated storage and networking for research environments, incorporating novel technical solutions as needed to meet research requirements. Install, test and configure software tools, libraries and compilers to meet researchers' needs.
Customize environments as requested by research teams, with specific focus on the optimization of end-users' experiences
Provide advanced cyberinfrastructure training and consultation for faculty, postdocs and graduate students in the School of Earth, Energy and Environmental Sciences.
Ensure systems are configured and managed in accordance with Stanford policies and any regulatory requirements specific to data sources and classifications.
Conceive, design, develop, optimize, integrate, and maintain information technology at a complex level.
Troubleshoot highly complex problems for which the analysis and resolution require extensive knowledge of many diverse system components
Develop long range technology plans.
Provide leadership and IT solutions for complex problems
Bachelor's degree and eight years of related increasingly technical work experience or a combination of education and relevant experience. Strong, demonstrated knowledge of Linux and demonstrated experience managing multiuser compute clusters and associated storage environments are required as well.
Knowledge, Skills and Abilities
Advanced knowledge of Linux and HPC cluster management and operation are required; experience managing, using, supporting and consulting on research computing cyberinfrastructure in an academic or research environment is strongly preferred. Proven ability to deliver outstanding system and service administration and end-user support in a thorough and timely manner is needed. This position requires that you be able to juggle multiple competing priorities, work quickly and accurately, and demonstrate initiative in conceptualizing and moving technical projects successfully to completion. The position must be able to do independent analysis, troubleshooting and problem resolution, but also must work collaboratively with other team members and across organizational group boundaries. An essential component of the job is keeping up with and mastering current and emerging technologies to facilitate researchers' computing work and also that streamline and automate system administration tasks; that requires a demonstrated passion for and curiosity about the breadth of HPC technologies and tools and also of technology trends in general.
This position requires hands-on experience building and supporting multi-tenant Linux servers/clusters and their associated networks, file systems and storage devices in production research environments. Specifically, this technical knowledge needed to be successful in this position includes:
Expert demonstrated knowledge of Linux and managing Linux-based environments, including securing systems, and day-to-day troubleshooting, monitoring, support, software packaging, and working within industry-wide best practices
Experience administering, configuring, and supporting systems with accelerators, and shared file systems and large-scale storage platforms. This includes hardware installation, configuration, upgrades and repairs
Knowledge of and experience utilizing data and system security techniques, practices and standards as they relate to multi-user systems, storage and networks
Hands-on experience installing, configuring and supporting job schedulers and resource managers (e.g., SLURM, OGE, LSF, Torque, Maui, etc.) is desirable.
Familiarity with deploying virtualization technologies and basic knowledge of container technologies
Exceptional written and verbal communication skills
Experience using shells scripts (bash), programming languages (Python), and programming automated system management tools (e.g. Puppet)
Familiarity with TCP/IP, Internet Routing Protocols, private and public networks, VLANs, Firewalls, Load Balancers, addressing schemes, subnet creation and subnet masking. Proven ability to troubleshoot basic network issues and communicate and work with a team of network engineers to solve possible network design issues
Familiarity with the intersection of storage and networking disciplines: transport media, speeds of media, storage networks, IP based storage delivery, other storage delivery technologies
Experience with some the following applications: Git, Apache, Kerberos, LDAP
Software installation and maintenance experience supporting research codes and clients
Exceptional client service and communication, focusing on proactive system administrator actions and interactions to reduce or remove barriers to clients' efficient use of resources to advance research
This position requires the ability to lift and manipulate storage and compute servers, rack and unrack equipment up to 40 pounds, and occasionally climb ladders.
This position requires the ability to lift and manipulate storage and compute servers up to 40 pounds, rack and unrack equipment, and occasionally climb ladders. The position will support equipment in off-campus locations, so having a valid driver's license is necessary. The position is expected to respond to critical system problems off-hours and also must also be available for routine on-site system maintenance and patching, typically scheduled for evenings and weekends so to minimize the disruption of research work. The position is expected to rotate on-call duties during winter break and other closures.
Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.
Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned.
Subject to and expected to comply with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in the University's Administrative Guide, http://adminguide.stanford.edu/.
Why Stanford is for You:
Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with:
Freedom to grow. We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak.
A caring culture. We provide superb retirement plans, generous time-off, and family care resources.
A healthier you. Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits.
Discovery and fun. Stroll through historic sculptures, trails, and museums.
Enviable resources. Enjoy free commuter programs, ridesharing incentives, discounts and more.
Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Stanford welcomes applications from all who would bring additional dimensions to the University's research, teaching and clinical missions.
Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of the job.
Location: Business Affairs: University IT (UIT), Stanford, California, United States Schedule: Classification Level:
Stanford is an equal opportunity employer and all qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other characteristic protected by law.
Copyright 2017 Jobelephant.com Inc. All rights reserved.
Located between San Francisco and San Jose in the heart of Silicon Valley, Stanford University is recognized as one of the world's leading research and teaching institutions. Leland and Jane Stanford founded the University to "promote the public welfare by exercising an influence on behalf of humanity and civilization." Stanford opened its doors in 1891, and more than a century later, it remains dedicated to finding solutions to the great challenges of the day and to preparing students for leadership in a complex world. The University's thriving diverse community is comprised of nearly 7000 undergraduate students, 9000 graduate students, 2000 faculty members, 1900 postdoctoral scholars, and over 11,000 academic and administrative staff in seven schools including several interdisciplinary research centers and institutes. The campus spreads over 8000 contiguous acres and nearly all undergraduates live on campus. Stanford offers bachelor's and master's degrees in addition to doctoral degrees (PhD, MD, DMA and JD) plus a number of professional and continuing education programs and certifications. More at http://facts.stanford.edu and http://www.stanford.edu. Stanford University is an ...equal opportunity employer and is committed to increasing the diversity of its faculty. It welcomes nominations of and applications from women, members of minority groups, protected veterans and individuals with disabilities, as well as from others who would bring additional dimensions to the university’s research, teaching and clinical missions.