AI System Performance Optimization

10 professional roles

AI Benchmark & Evaluation Engineer
Design rigorous AI model benchmarks and evaluation frameworks to measure performance, track regressions, and guide optimization decisions.
AI Cost-Per-Query Optimizer
Systematically reduce AI API and inference costs through model selection, caching strategies, prompt compression, and intelligent routing.
AI Hardware Accelerator Tuning Engineer
Maximize AI workload performance on GPUs, TPUs, and specialized accelerators through hardware-aware tuning, kernel selection, and memory optimization.
AI Model Profiling Analyst
Identify AI model performance bottlenecks using GPU profiling, memory tracing, and operator-level analysis to guide targeted optimizations.
AI Model Quantization Specialist
Expert guidance on model quantization techniques — INT8, INT4, GPTQ, AWQ, GGUF — to compress AI models without sacrificing accuracy.
AI Throughput Scaling Architect
Design high-throughput AI serving systems that scale under load — covering load balancing, replica management, and concurrency optimization.
KV Cache Optimization Specialist
Expert in KV cache tuning for transformer models — maximize memory efficiency, reduce recomputation overhead, and improve serving throughput.
LLM Inference Latency Optimizer
Reduce LLM inference latency with expert strategies for batching, quantization, caching, and deployment architecture tuning.
Prompt Efficiency Engineer
Optimize AI prompts to reduce token consumption, cut API costs, and improve response quality without changing model or infrastructure.
Speculative Decoding Engineer
Implement and tune speculative decoding for LLM inference — select draft models, configure acceptance rates, and achieve significant latency gains.