Model Evaluation Jobs – Page 3

How to use Model Evaluation in applications

If a job requires Model Evaluation, the skill should be supported by a project, course or portfolio example. The application check reviews whether the skill is actually evidenced in your CV.

Document a small project or notebook
Name relevant tools and methods
Connect the evidence to the target role

Results

Companies

94.2

Average score

Remote

10 results on this page. 67 results in total. More results are available via pagination, company pages, skill pages and job detail pages.

Software Engineer, GenAI

Abridge · SF Office

95/100

SF OfficeUSAFullTimeashby2026-07-31

Why this is a real AI job: The role is explicitly focused on designing, building, and evaluating LLM-powered systems for a core healthcare application. The job description heavily emphasizes LLM APIs, agentic workflows, and model evaluation, indicating a primary focus on AI/ML.

ABOUT ABRIDGE Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients. Our e…

Details Open source / apply

AI Engineer, Model Quality and Performance

Cerebras Systems · Headquarters/Sunnyvale Office

95/100

Headquarters/Sunnyvale OfficeUSAFullTimeashby2026-07-31

AI Agents Machine Learning LLMs Data Science Model Evaluation Performance Tuning NLP

Why this is a real AI job: The role is fundamentally focused on building and automating systems for model quality, performance evaluation, and benchmarking using AI agents. The description explicitly centers around leveraging AI to solve core problems related to inference and model dep…

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transfor…

Details Open source / apply

Member of Technical Staff (ML Engineer, Recommendations & User Modeling)

Perplexity AI · San Francisco

95/100

San FranciscoUSAFullTimeashby2026-07-31

Machine Learning Recommendation Systems User Modeling LLMs Data Science MLOps Model Evaluation Online Experimentation Data Pipelines

Why this is a real AI job: The role explicitly focuses on building and optimizing recommendation systems powered by LLMs, requiring deep ML expertise. The job description heavily emphasizes ML fundamentals, model building, and data foundations for learning and improvement.

Perplexity is seeking experienced ML engineers to design, build, and optimize the recommendation systems that power core experiences on Perplexity. Perplexity builds AI for those who expect more. Our products are designed to help people find answers, make their most consequential decisions, and com…

Details Open source / apply

Senior Research Scientist, Model Evaluation

Cohere · Toronto

95/100

TorontoCanadaFullTimeashby2026-07-31

LLM Machine Learning Data Science Model Evaluation NLP Python Software Engineering

Why this is a real AI job: The role is explicitly focused on researching, developing, and implementing new evaluation methods for large language models (LLMs). The core responsibilities revolve around advancing the state-of-the-art in LLM evaluation, building tools for model performanc…

Who are we? Cohere is the leading security-first enterprise AI company. We build cutting-edge foundation AI models and end-to-end products that are designed to solve real-world business problems. We’re training and deploying frontier models for enterprises who are building AI systems. We believe th…

Details Open source / apply

Senior Machine Learning Engineer, Agent Oversight

Scale AI · San Francisco, CA

95/100

San Francisco, CAUSAgreenhouse2026-07-31

Machine Learning LLM Agentic Systems Data Science MLOps Model Evaluation RLHF NLP

Why this is a real AI job: The role is explicitly focused on building and scaling production ML/LLM systems, specifically agentic applications. The job description heavily emphasizes tasks directly related to machine learning engineering, model evaluation, and improvement loops.

About Scale Scale’s mission is to develop reliable AI systems for the world’s most important decisions. As the leading AI data foundry, we provide the high-quality data and full-stack technologies that power the world’s most advanced models — fueling breakthroughs in generative AI, defense, and aut…

Details Open source / apply

Staff Applied AI Engineer

Scale AI · London, UK

95/100

London, UKUSAgreenhouse2026-07-31

MLOps Machine Learning Data Science LLM Model Evaluation Agent Architectures Fine-tuning AI Governance Responsible AI

Why this is a real AI job: The role explicitly focuses on building, governing, and evaluating AI systems in production. The description details responsibilities directly related to core AI/ML engineering tasks like model evaluation, MLOps practices, agent architectures, and addressing…

About the role Scale's Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. As a Staff Applied AI Engineer, you'll raise the bar for how AI gets built, governed, and evaluated across the team. You'll define standards that other e…

Details Open source / apply

Applied AI Engineer, Global Public Sector

Scale AI · London, UK

95/100

London, UKUSAgreenhouse2026-07-31

AI ML LLM Deep Learning Python Data Annotation Model Evaluation MLOps Generative AI Data Science

Why this is a real AI job: The role explicitly focuses on building and deploying AI applications, generating training data for LLMs, and upskilling in AI. The required experience includes applying AI/ML in production environments. The core responsibilities are heavily AI-focused.

Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of: Creating custom AI applications that will impact millions of citizens Generating high-quality training data for national LLMs…

Details Open source / apply

Open-Source Machine Learning Engineer - US Remote

Hugging Face · USA

95/100

USAworkable2026-07-31

Machine Learning Python Open Source Deep Learning Model Training Model Evaluation

Why this is a real AI job: The role explicitly focuses on machine learning engineering within an open-source context, directly contributing to ML models and frameworks. Hugging Face is a leading AI company.

Open-Source Machine Learning Engineer - US Remote

Details Open source / apply

Staff Software Engineer - Machine Learning (Search)

Databricks · Bengaluru, India

95/100

Bengaluru, IndiaUSAgreenhouse2026-07-31

Machine Learning NLP LLM Data Science MLOps Model Evaluation Query Understanding Text Mining Search Relevance

Why this is a real AI job: The role explicitly focuses on developing and deploying ML-based search relevance models, NLP pipelines, and evaluating search ranking improvements. The core responsibilities are heavily centered around AI/ML techniques.

P-1408 The Applied AI team at Databricks sits at the forefront of advancing AI/ML-powered products . Databricks’ customers are continuously creating new assets (tables, notebooks, dashboards, datarooms, pipelines, sql queries, ml models etc.) on the platform. Some of them can have hundreds of milli…

Details Open source / apply

Staff Machine Learning Engineer, CustomerLake (ML/LLM)

Databricks · New York City, New York

95/100

New York City, New YorkUSAgreenhouse2026-07-31

Machine Learning LLM MLOps Python PyTorch Data Science Model Evaluation Generative AI RAG

Why this is a real AI job: The role explicitly focuses on building and improving ML/LLM models for a customer data platform. The job description details tasks like model evaluation, production monitoring, and pushing the boundaries of personalization algorithms.

RDQ427R109 At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best Data Intelligence Platfo…

Details Open source / apply