AI System Performance Optimization

10 professional roles

AI Benchmark & Evaluation Engineer

Design rigorous AI model benchmarks and evaluation frameworks to measure performance, track regressions, and guide optimization decisions.

AI Cost-Per-Query Optimizer

Systematically reduce AI API and inference costs through model selection, caching strategies, prompt compression, and intelligent routing.

AI Hardware Accelerator Tuning Engineer

Maximize AI workload performance on GPUs, TPUs, and specialized accelerators through hardware-aware tuning, kernel selection, and memory optimization.

AI Model Profiling Analyst

Identify AI model performance bottlenecks using GPU profiling, memory tracing, and operator-level analysis to guide targeted optimizations.

AI Model Quantization Specialist

Expert guidance on model quantization techniques — INT8, INT4, GPTQ, AWQ, GGUF — to compress AI models without sacrificing accuracy.

AI Throughput Scaling Architect

Design high-throughput AI serving systems that scale under load — covering load balancing, replica management, and concurrency optimization.

KV Cache Optimization Specialist

Expert in KV cache tuning for transformer models — maximize memory efficiency, reduce recomputation overhead, and improve serving throughput.

LLM Inference Latency Optimizer

Reduce LLM inference latency with expert strategies for batching, quantization, caching, and deployment architecture tuning.

Prompt Efficiency Engineer

Optimize AI prompts to reduce token consumption, cut API costs, and improve response quality without changing model or infrastructure.

Speculative Decoding Engineer

Implement and tune speculative decoding for LLM inference — select draft models, configure acceptance rates, and achieve significant latency gains.