The OpenAI RFT adapter lets you reuse Eval Protocol evaluation tests as Python graders for OpenAI Reinforcement Fine-Tuning (RFT). Because your grading logic lives in an Eval ProtocolDocumentation Index
Fetch the complete documentation index at: https://evalprotocol.io/llms.txt
Use this file to discover all available pages before exploring further.
@evaluation_test, you can reuse the exact same code as an OpenAI Python grader—making it easy to start with OpenAI RFT and later move to other Eval-Protocol supported training workflows (or vice versa) without rewriting your evals.
For a minimal working example, clone the openai-rft-quickstart repository, which contains the example_rapidfuzz.py and test_openai_grader.py files used in the examples below.
High Level Overview
The core helper function lives in:build_python_grader_from_evaluation_test:
- Takes your Eval Protocol
@evaluation_testfunction that operates on anEvaluationRow. - Wraps it into a self-contained
{"type": "python", "source": ...}grader module with agrade(sample, item)entrypoint. - Builds a minimal
EvaluationRowfrom the OpenAI RFT inputs by:- Mapping
item["reference_answer"]torow.ground_truth - Mapping
item["messages"](if present) torow.messages - Mapping
sample["output_text"]to the last assistant message
- Mapping
- Removes any runtime dependency on
eval-protocolinside the grader by using simple duck-typed stand-ins forEvaluationRow,EvaluateResult, andMessage. - Normalizes whatever your evaluation returns (e.g.,
EvaluateResult,EvaluationRowwith.evaluation_result, or a bare number) into a single float score.
eval_protocol/integrations/openai_rft.py.
Grader Constraints
When you convert an@evaluation_test into an OpenAI Python grader, it must satisfy OpenAI’s runtime limits, i.e. no network access, fixed set of packages (e.g., numpy, pandas, rapidfuzz, etc.). For more details, see OpenAI graders documentation.
Basic Usage
1. Write an Eval Protocol @evaluation_test
In example_rapidfuzz.py (from the openai-rft-quickstart repo) we define a simple evaluation test that uses rapidfuzz to score how close a model’s answer is to the ground truth:
2. Convert to a Python grader and call /graders/*
In test_openai_grader.py (also in the openai-rft-quickstart repo) we show how to:
- Build a Python grader spec from
rapidfuzz_eval - Validate it via
/fine_tuning/alpha/graders/validate - Run it once via
/fine_tuning/alpha/graders/run
End-to-End Example
To see an end-to-end example that takes an@evaluation_test (rapidfuzz_eval), converts it into a {"type": "python", "source": ...} grader spec with build_python_grader_from_evaluation_test, and validates/runs it against the OpenAI /graders/* HTTP APIs, clone the quickstart repo and run:
- Your Eval Protocol
@evaluation_test(rapidfuzz_eval) runs as a normal eval viapytest. - The same function can be converted into a
type: "python"grader spec and validated / run through the OpenAI RFT graders API.

