Altair Grid Engine Advanced Training 2021
Detailed aspects of functionality and commands
Live Interactive Online Training
Starts Tuesday, November 16, 2021, 10:00 CET
4 Hours per day, Tuesday to Thursday for two weeks
€ 2,500 per person
This course is designed to extend the system administrator's and end user's knowledge of Grid Engine by covering the detailed aspects of its functionality and commands. The course empowers administrators to translate business goals into a Grid Engine configuration and enables advanced end users to create workflows to efficiently use the workload management system.
The class provides valuable experience with the gathering of site-defined shared resources, such as licenses, the configuration of job submission and execution environments for classic and containerized workloads (e.g. Docker), GPU management, dynamic cluster configurations, and more.
Hands-on exercises are integrated into the courses as well as practical trouble-shooting tips.
Who should attend the course?
This advanced course is designed for system administrators and advanced end-users who are responsible for extending the role of Grid Engine in site-defined cluster resource management and require the implementation of job and resource controls.
The course content is applicable to all versions of Grid Engine.
- Basic knowledge of Linux or Unix operating system
- Basic knowledge of Unix shell (like bash/csh/ksh and vi editor)
- Basic knowledge of system administration concepts and parallel programming models (shared memory/distributed memory)
- Basic knowledge of practical Grid Engine (or similar) administration skills or advanced
Grid Engine user experience is advantageous but not required
What customers are saying
Training brought us many new insights... Just a few weeks later, when we experienced a small issue, we were able to solve it instantly.
- Ralf Nolte, Systems Administrator, CeBiTec, Bielefeld University
The benefits of training are significant, especially in managing the risk the business is exposed to.
- Mike Twelves, Supply Chain Solutions, Tata Steel
- Concepts Review
- Grid Engine concepts and components
- Advanced Configurations
- Global configuration
- Host configuration
- Queue configuration
- Load sensors and resources
- Job Types and Environments
- Parallel jobs and environments
- Multi-threaded, MPI, etc
- Loose vs. tight parallel job integration
- Array jobs
- Interactive jobs
- Diagnostics and performance tuning
- Debugging and failure diagnosis
- Tuning for high throughput
- Data spooling and implications
- Scheduler configuration
- Scheduling Policies (Entitlement, Urgency and Priority Policies)
- Resource Reservation (RR) and backfilling
- Introduction to Advance Reservations (ARs) and Standing Reservations (SRs)
- Resource Quota Sets (RQS) for flexible execution limits
- Managing different types of workloads
- Job Classes (JCs) for encapsulating complex job submissions
- Managing Job Submission Verifiers (JSVs)
- Core/Memory Binding, Linux CGROUPS
- Managing GPUs and integration with NVIDIA Data Center GPU manager
- Using Docker with Altair Grid Engine
- Submitting Docker jobs and requesting Docker run options
- Questions and Answers