AI Job Radar

Model Evaluation Jobs – Page 3

Aktuelle KI-Jobs mit Model Evaluation, passende Lernpfade und Bewerbungsbezug.

How to use Model Evaluation in applications

If a job requires Model Evaluation, the skill should be supported by a project, course or portfolio example. The application check reviews whether the skill is actually evidenced in your CV.

39
Results
18
Companies
94.2
Average score
11
Remote

10 results on this page. 39 results in total. More results are available via pagination, company pages, skill pages and job detail pages.

Bengaluru, IndiaUSAgreenhouse2026-06-16

Why this is a real AI job: The role explicitly focuses on developing and deploying ML-based search relevance models, NLP pipelines, and evaluating search ranking improvements. The core responsibilities are heavily centered around AI/ML techniques.

P-1408 The Applied AI team at Databricks sits at the forefront of advancing AI/ML-powered products . Databricks’ customers are continuously creating new assets (tables, notebooks, dashboards, datarooms, pipelines, sql queries, ml models etc.) on the platform. Some of them can have hundreds of milli…

Details Open source / apply

Senior Staff Applied AI Engineer - Context Retrieval

Databricks · Mountain View, California; San Francisco, California

95/100
Mountain View, California; San Francisco, CaliforniaUSAgreenhouse2026-06-16

Why this is a real AI job: The role is entirely focused on building and improving context retrieval systems for AI agents, encompassing all core AI/ML/LLM aspects. The job description explicitly mentions building the retrieval stack, search subagents, and optimizing for LLMs, indicatin…

P-1549 At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure p…

Details Open source / apply

PhD GenAI Research Scientist Intern

Databricks · Mountain View, California

95/100
Mountain View, CaliforniaUSAgreenhouse2026-06-16

Why this is a real AI job: The role is explicitly focused on research and development of LLMs and AI systems for enterprise applications. The job description details tasks directly related to adapting, improving, and evaluating these models.

Company Description: At Databricks, we are obsessed with enabling data teams to solve the world’s toughest problems, from security threat detection to cancer drug development. We do this by building and running the world’s best data and AI platform, so our customers can focus on the high value chal…

Details Open source / apply

San FranciscoUSAFullTimeashby2026-06-16

Why this is a real AI job: The role is explicitly focused on AI fairness, bias testing, and evaluation of AI-assisted systems. The job description details extensive work with ML models, data analysis, and building evaluation infrastructure. This is a core AI research and application ro…

ABOUT THE TEAM OpenAI’s People team hires, engages, and retains world-class talent to safely build and deploy AGI that benefits all of humanity. The People Analytics team helps leaders make rigorous, evidence-based talent decisions and ensures that the systems supporting those decisions are valid,…

Details Open source / apply

Data Scientist, Safety

OpenAI · London, UK

95/100
London, UKUSAFullTimeashby2026-06-16

Why this is a real AI job: The role is explicitly focused on building analytical foundations for responsible AI deployment, evaluating and improving safety systems, and quantifying safety risks. The tasks directly involve applying data science and ML techniques to address core AI safet…

About the Team OpenAI’s Safety teams work to ensure our products are safe, trusted, and resilient as frontier AI systems scale globally. We tackle some of the company’s most important challenges across understanding and preventing misuse and misalignment, intercepting fraud and abuse, and protectin…

Details Open source / apply

Data Scientist, Safety

OpenAI · San Francisco

95/100
San FranciscoUSAFullTimeashby2026-06-16

Why this is a real AI job: The role is explicitly focused on building analytical foundations for responsible AI deployment, evaluating and improving safety classifiers, and quantifying safety risks. The job description heavily emphasizes AI/ML-related tasks and requires expertise in ar…

About the Team OpenAI’s Safety teams work to ensure our products are safe, trusted, and resilient as frontier AI systems scale globally. We tackle some of the company’s most important challenges across understanding and preventing misuse and misalignment, intercepting fraud and abuse, and protectin…

Details Open source / apply

Data Scientist, Preparedness

OpenAI · San Francisco

95/100
San FranciscoUSAFullTimeashby2026-06-16

Why this is a real AI job: The role is explicitly focused on building, evaluating, and improving mitigations for harms from AI systems. It requires deep technical skills in data science, machine learning, and model evaluation, with a strong emphasis on identifying and addressing risks…

About the Team The Preparedness team is an important part of the Safety Systems https://openai.com/safety/safety-systems org at OpenAI, and is guided by OpenAI’s Preparedness Framework https://openai.com/index/updating-our-preparedness-framework/. Frontier AI models have the potential to benefit al…

Details Open source / apply

San FranciscoUSAFullTimeashby2026-06-16

Why this is a real AI job: The role is explicitly focused on building and improving AI agents (Codex), working directly with LLMs, and turning research into production systems. The tasks described are overwhelmingly centered around AI/ML concepts.

About the Team The Codex Core Agent team builds the kernel of Codex. We own making the agent better, accelerating research, and making those improvements real in production for our users. That means working across the systems that make Codex actually function as an agent in the real world: the prod…

Details Open source / apply

Research Engineer, Model Evaluations

Anthropic · San Francisco, CA

95/100
San Francisco, CAUSAgreenhouse2026-06-15

Why this is a real AI job: The role is explicitly focused on evaluating and improving large language models (Claude), building evaluation infrastructure, and monitoring model health. The core responsibilities directly involve AI/ML concepts and techniques.

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working to…

Details Open source / apply