Researcher, AI Evaluation (San Francisco)

Source: reddit-r-forhire

About the Role

Build benchmarks that measure the real-world value of AI models.
Publish LLM evaluation papers in top conferences with the support of the Mercor Applied AI and Operations teams.
Push the frontier of understanding data ROI in model development, including multi-modality, code, tool-use, and more.
Design and validate novel data collection and annotation offerings for leading industry labs and big tech companies.

PhD or M.S. and 2+ years of work experience in computer science, electrical engineering, econometrics, or another STEM field that provides a solid understanding of ML and model evaluation.
Strong publication record in AI research, ideally in LLM evaluation. Dataset and evaluation papers are preferred.
Strong understanding of LLMs and the data on which they are trained and evaluated against.
Strong communication skills and the ability to present findings clearly and concisely.
Familiarity with data annotation workflows.
Good understanding of statistics.

DM for Referral Link