Resume and JobRESUME AND JOB
xAI logo

RDMA Engineer - Supercomputing

xAI

RDMA Engineer - Supercomputing

full-timePosted: Dec 29, 2025

Job Description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

RDMA Engineers on xAI’s Supercomputing team design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability.

Focus

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.

Ideal Experience

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments.
  • Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization).

Tech Stack

  • NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)
  • RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)
  • Kubernetes
  • Rust and C/C++
  • MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Locations

  • Palo Alto, CA,
  • San Francisco, CA,

Salary

180,000 - 440,000 USD / yearly

Skills Required

  • NVIDIA RDMA technologies (GPUDirect RDMA, RoCE, InfiniBand)intermediate
  • Rust, C, C++intermediate
  • NVIDIA networking stack (Mellanox drivers, libibverbs, NVPeerMemory)intermediate
  • MPI, NCCLintermediate
  • Kubernetes networkingintermediate
  • HPC or AI supercomputingintermediate
  • Distributed systems optimizationintermediate
  • Low-level networking and system optimizationintermediate

Required Qualifications

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments (experience)
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization (experience)
  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory) (experience)
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads (experience)
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments (experience)

Preferred Qualifications

  • Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization) (experience)

Responsibilities

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments

Benefits

  • general: Equity
  • general: Comprehensive medical, vision, and dental coverage
  • general: Access to a 401(k) retirement plan
  • general: Short & long-term disability insurance
  • general: Life insurance
  • general: Various other discounts and perks

Target Your Resume for "RDMA Engineer - Supercomputing" , xAI

Get personalized recommendations to optimize your resume specifically for RDMA Engineer - Supercomputing. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "RDMA Engineer - Supercomputing" , xAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

InfrastructureInfrastructure
Quiz Challenge

Answer 10 quick questions to check your fit for RDMA Engineer - Supercomputing @ xAI.

10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

xAI logo

RDMA Engineer - Supercomputing

xAI

RDMA Engineer - Supercomputing

full-timePosted: Dec 29, 2025

Job Description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

RDMA Engineers on xAI’s Supercomputing team design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability.

Focus

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.

Ideal Experience

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments.
  • Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization).

Tech Stack

  • NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)
  • RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)
  • Kubernetes
  • Rust and C/C++
  • MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Locations

  • Palo Alto, CA,
  • San Francisco, CA,

Salary

180,000 - 440,000 USD / yearly

Skills Required

  • NVIDIA RDMA technologies (GPUDirect RDMA, RoCE, InfiniBand)intermediate
  • Rust, C, C++intermediate
  • NVIDIA networking stack (Mellanox drivers, libibverbs, NVPeerMemory)intermediate
  • MPI, NCCLintermediate
  • Kubernetes networkingintermediate
  • HPC or AI supercomputingintermediate
  • Distributed systems optimizationintermediate
  • Low-level networking and system optimizationintermediate

Required Qualifications

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments (experience)
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization (experience)
  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory) (experience)
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads (experience)
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments (experience)

Preferred Qualifications

  • Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization) (experience)

Responsibilities

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments

Benefits

  • general: Equity
  • general: Comprehensive medical, vision, and dental coverage
  • general: Access to a 401(k) retirement plan
  • general: Short & long-term disability insurance
  • general: Life insurance
  • general: Various other discounts and perks

Target Your Resume for "RDMA Engineer - Supercomputing" , xAI

Get personalized recommendations to optimize your resume specifically for RDMA Engineer - Supercomputing. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "RDMA Engineer - Supercomputing" , xAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

InfrastructureInfrastructure
Quiz Challenge

Answer 10 quick questions to check your fit for RDMA Engineer - Supercomputing @ xAI.

10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.