Job descriptions & requirements
Job Title: AI Policy Reviewer
Location: Remote (Worldwide)
Job Summary: The AI Policy Reviewer is responsible for evaluating AI-generated and user-generated content to ensure compliance with internal governance standards, regulatory requirements, and responsible AI principles. This role plays a key part in safeguarding model integrity by reviewing outputs for safety risks, bias, misinformation, harmful content, and policy violations, while ensuring consistent enforcement of AI usage guidelines.
Responsibilities
- Review and score AI-generated responses against detailed policy rubrics. Assess outputs for safety, truthfulness, fairness, and alignment with community guidelines.
- Act as a quality assurance checkpoint for automated systems. Identify instances where the AI misinterprets policy (e.g., being over-sensitive and censoring benign content, or under-sensitive and allowing harmful content).
- Handle complex “edge cases” where policy application is ambiguous. Make nuanced judgement calls regarding context, satire, or emerging risks that the AI model struggles to process.
- Analyze and review data to identify systematic flaws in the AI’s reasoning. Report patterns of bias, hallucination, or policy gaps to the Product and Engineering teams.
- Collaborate with Policy teams to test and refine evaluation rubrics. Provide feedback on whether current policies are “teachable” to AI models or if they require human-only judgement.
- Participate in adversarial testing (red teaming) by attempting to “jailbreak” the model or provoke unsafe responses to identify vulnerabilities before launch.
- Work closely with Machine Learning Engineers to explain the “why” behind your ratings, helping them adjust model behavior.
- Write high-quality examples (prompts and ideal responses) that’s serve as “golden sets” for training the AI on how to handle difficult policy scenarios.
Requirements
- Minimum of 3 years of professional experience in Trust & Safety Operations, Content Policy, Risk Analysis, or Legal/Compliance review.
- Deep understanding of content moderation principles, including hate speech, harassment, misinformation, and graphic violence policies.
- Strong ability to deconstruct complex AI responses and identify logical flaws, hallucinations, or subtle biases.
- Clear and concise written communication skills. You must be able to explain why an AI response was wrong in a way that engineers and policy experts can understand.
- This role involves exposure to disturbing AI-generated text and images designed to test safely limits. Proven emotional resilience and self-care strategies are required.
- Comfortable working with dashboards, spreadsheets, and specialized review tools. Familiarity with LLMs (ChatGPT, Gemini, etc.).
- Proven ability to follow complex, detailed instructions and scoring rubrics with high consistency and accuracy.
- Understanding of global cultural and political nuances to assess whether AI responses are appropriate for diverse international audiences.
<
Important safety tips
- Do not make any payment without confirming with the BrighterMonday Customer Support Team.
- If you think this advert is not genuine, please report it via the Report Job link below.