AI agent Infrastructure Engineers

Mercor

3 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Tech stack

Application Programming Interfaces (APIs)

Artificial Intelligence

Amazon Web Services

Microsoft Azure

C++ (Programming Language)

Cloud Computing

Distributed Systems

Fault Tolerance

Python (Programming Language)

Software Engineering

Data Streaming

Reinforcement Learning

Rust (Programming Language)

Large Language Models

Multi-Agent Systems

Caching

Reliability of Systems

Containerization

Kubernetes

Information Technology

Docker

Network Optimization

Microservices

Job description

Design, build, and optimize infrastructure for training, deploying, and scaling AI agents across distributed systems.
Develop robust backend services, APIs, and orchestration frameworks that support multi-agent workflows and high-performance compute environments.
Collaborate closely with research and product teams to integrate model-serving pipelines, memory systems, and reasoning components.
Implement monitoring, observability, and failover mechanisms to ensure high system reliability and fault tolerance.
Evaluate and refine infrastructure performance, identifying bottlenecks and improving efficiency across data, compute, and model layers.
Participate in synchronous collaboration sessions (4-hour windows, 2-3 times per week) to review architecture decisions, troubleshoot distributed systems, and iterate on design improvements.

Requirements

Do you have experience in Rust (programming language)?, Do you have a Master's degree?, * Strong background in Computer Science, Software Engineering, or Systems Design, with focus on large-scale distributed infrastructure.

Experience with cloud computing (AWS, GCP, or Azure) and containerization/orchestration tools such as Docker and Kubernetes.
Proficiency in backend programming languages such as Go, Rust, Python, or C++.
Familiarity with LLM inference pipelines, multi-agent architectures, or reinforcement learning environments is a strong plus.
Knowledge of network optimization, data streaming, and caching architectures preferred.
Excellent collaboration and communication skills.
Ability to commit 20-30 hours per week, including required synchronous collaboration sessions.

About the company

Why Join * Work directly with a world-class AI research lab building the infrastructure behind tomorrow's intelligent agent ecosystems. * Influence the foundations of AI scalability, reliability, and deployment, enabling complex agents to operate in real-world environments. * Enjoy schedule flexibility - select your own 4-hour collaboration windows and manage your 20-30 hour work week. * Be engaged as an hourly contractor through Mercor, giving you autonomy while contributing to mission-critical AI infrastructure projects. * Collaborate with top systems engineers, researchers, and AI developers working at the intersection of distributed systems and advanced intelligence. * Join a global network of technical experts shaping how the next generation of AI agents reason, interact, and evolve at scale.