Resume and JobRESUME AND JOB
xAI logo

Infrastructure Engineering - Traffic

xAI

Infrastructure Engineering - Traffic

full-timePosted: Dec 29, 2025

Job Description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

In this role, you will be a key contributor to xAI’s Supercomputing team, focusing on building and optimizing scalable, high-performance traffic platforms that power our production inference engines. You will work on critical systems that manage traffic flow, service discovery, and network reliability across both on-premise and cloud-based Kubernetes clusters. Collaborating closely with Network Fabric Engineers and other technical teams, you will drive projects that enhance the stability and efficiency of our AI infrastructure, including support for large-scale training runs for advanced models like Grok 4 and beyond. This role demands deep technical expertise in Kubernetes, L4/L7 proxies like Envoy, and service discovery systems, along with a proactive approach to debugging and optimizing complex network performance issues from L3 to L7.

What you’ll do

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit.
  • Manage, extend, and optimize xAI’s production inference capabilities with L4/L7 proxies such as Envoy, NGINX.
  • Manage and extend xAI’s Service Discovery systems, both in and outside of kubernetes (DNS, xDS control planes).
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond).
  • Work with a fast, small technical team to execute projects in the critical path of xAI.

What we’d like to see

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers.
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer.
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM.
  • 1+ years of experience with DNS systems (ex: CoreDNS, Unbound) or service discovery control planes (xDS)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent)
  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) is a plus.
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) is a plus.
  • Experience with service mesh (Istio, Linkerd) is a plus.
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state?
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex: how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering/PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP/IP)

Location

This role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to be located near the Bay Area or open to relocation.

Tech Stack

  • Kubernetes
  • Envoy / xDS

  • Golang and Rust

Interview Process

  1. Application Review: Submit your CV and a statement of exceptional work. Our team will review your application to assess fit.

  2. Phone Interview (45 minutes): A brief conversation with a team member to discuss your background, key accomplishments, and motivation.

  3. Main Interview Process 

  • 2 Coding Assessments: Solve problems in a language of your choice.

  • Systems Hands-On: Demonstrate practical skills in a live problem-solving session.

  • Project Deep-Dive: Present your past exceptional work to a small audience.

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Locations

  • Palo Alto, CA,
  • San Francisco, CA,

Salary

180,000 - 440,000 USD / yearly

Skills Required

  • Kubernetesintermediate
  • L4/L7 proxies (Envoy, NGINX, HAProxy)intermediate
  • kubernetes CNI plugins (Calico, Cilium, Flannel)intermediate
  • IPAMintermediate
  • DNS systems (CoreDNS, Unbound)intermediate
  • service discovery control planes (xDS)intermediate
  • cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers)intermediate
  • host level network proxies (iptables, nftables, IPVS, eBPF programs)intermediate
  • gRPC Client libraries (grpcio / grpc-go / grpc-java)intermediate
  • service mesh (Istio, Linkerd)intermediate
  • Golangintermediate
  • Rustintermediate

Required Qualifications

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers (experience)
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer (experience)
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM (experience)
  • 1+ years of experience with DNS systems (ex: CoreDNS, Unbound) or service discovery control planes (xDS) (experience)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent) (experience)
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state? (experience)
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex: how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering/PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP/IP) (experience)

Preferred Qualifications

  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) (experience)
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) (experience)
  • Experience with service mesh (Istio, Linkerd) (experience)

Responsibilities

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit
  • Manage, extend, and optimize xAI’s production inference capabilities with L4/L7 proxies such as Envoy, NGINX
  • Manage and extend xAI’s Service Discovery systems, both in and outside of kubernetes (DNS, xDS control planes)
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond)
  • Work with a fast, small technical team to execute projects in the critical path of xAI

Benefits

  • general: equity
  • general: comprehensive medical, vision, and dental coverage
  • general: access to a 401(k) retirement plan
  • general: short & long-term disability insurance
  • general: life insurance
  • general: various other discounts and perks

Target Your Resume for "Infrastructure Engineering - Traffic" , xAI

Get personalized recommendations to optimize your resume specifically for Infrastructure Engineering - Traffic. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Infrastructure Engineering - Traffic" , xAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

InfrastructureInfrastructure
Quiz Challenge

Answer 10 quick questions to check your fit for Infrastructure Engineering - Traffic @ xAI.

10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

xAI logo

Infrastructure Engineering - Traffic

xAI

Infrastructure Engineering - Traffic

full-timePosted: Dec 29, 2025

Job Description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

In this role, you will be a key contributor to xAI’s Supercomputing team, focusing on building and optimizing scalable, high-performance traffic platforms that power our production inference engines. You will work on critical systems that manage traffic flow, service discovery, and network reliability across both on-premise and cloud-based Kubernetes clusters. Collaborating closely with Network Fabric Engineers and other technical teams, you will drive projects that enhance the stability and efficiency of our AI infrastructure, including support for large-scale training runs for advanced models like Grok 4 and beyond. This role demands deep technical expertise in Kubernetes, L4/L7 proxies like Envoy, and service discovery systems, along with a proactive approach to debugging and optimizing complex network performance issues from L3 to L7.

What you’ll do

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit.
  • Manage, extend, and optimize xAI’s production inference capabilities with L4/L7 proxies such as Envoy, NGINX.
  • Manage and extend xAI’s Service Discovery systems, both in and outside of kubernetes (DNS, xDS control planes).
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond).
  • Work with a fast, small technical team to execute projects in the critical path of xAI.

What we’d like to see

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers.
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer.
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM.
  • 1+ years of experience with DNS systems (ex: CoreDNS, Unbound) or service discovery control planes (xDS)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent)
  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) is a plus.
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) is a plus.
  • Experience with service mesh (Istio, Linkerd) is a plus.
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state?
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex: how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering/PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP/IP)

Location

This role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to be located near the Bay Area or open to relocation.

Tech Stack

  • Kubernetes
  • Envoy / xDS

  • Golang and Rust

Interview Process

  1. Application Review: Submit your CV and a statement of exceptional work. Our team will review your application to assess fit.

  2. Phone Interview (45 minutes): A brief conversation with a team member to discuss your background, key accomplishments, and motivation.

  3. Main Interview Process 

  • 2 Coding Assessments: Solve problems in a language of your choice.

  • Systems Hands-On: Demonstrate practical skills in a live problem-solving session.

  • Project Deep-Dive: Present your past exceptional work to a small audience.

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Locations

  • Palo Alto, CA,
  • San Francisco, CA,

Salary

180,000 - 440,000 USD / yearly

Skills Required

  • Kubernetesintermediate
  • L4/L7 proxies (Envoy, NGINX, HAProxy)intermediate
  • kubernetes CNI plugins (Calico, Cilium, Flannel)intermediate
  • IPAMintermediate
  • DNS systems (CoreDNS, Unbound)intermediate
  • service discovery control planes (xDS)intermediate
  • cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers)intermediate
  • host level network proxies (iptables, nftables, IPVS, eBPF programs)intermediate
  • gRPC Client libraries (grpcio / grpc-go / grpc-java)intermediate
  • service mesh (Istio, Linkerd)intermediate
  • Golangintermediate
  • Rustintermediate

Required Qualifications

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers (experience)
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer (experience)
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM (experience)
  • 1+ years of experience with DNS systems (ex: CoreDNS, Unbound) or service discovery control planes (xDS) (experience)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent) (experience)
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state? (experience)
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex: how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering/PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP/IP) (experience)

Preferred Qualifications

  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) (experience)
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) (experience)
  • Experience with service mesh (Istio, Linkerd) (experience)

Responsibilities

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit
  • Manage, extend, and optimize xAI’s production inference capabilities with L4/L7 proxies such as Envoy, NGINX
  • Manage and extend xAI’s Service Discovery systems, both in and outside of kubernetes (DNS, xDS control planes)
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond)
  • Work with a fast, small technical team to execute projects in the critical path of xAI

Benefits

  • general: equity
  • general: comprehensive medical, vision, and dental coverage
  • general: access to a 401(k) retirement plan
  • general: short & long-term disability insurance
  • general: life insurance
  • general: various other discounts and perks

Target Your Resume for "Infrastructure Engineering - Traffic" , xAI

Get personalized recommendations to optimize your resume specifically for Infrastructure Engineering - Traffic. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Infrastructure Engineering - Traffic" , xAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

InfrastructureInfrastructure
Quiz Challenge

Answer 10 quick questions to check your fit for Infrastructure Engineering - Traffic @ xAI.

10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.