Evaluator-Optimizer | Edgardo Ponce

Evaluator-Optimizer (The Writer and the Demanding Editor Analogy)

evaluator-optimizer

Imagine the process of writing an important article for a magazine. The process is rarely perfect on the first try. It generally follows a cycle of creation and review.

The Writer (Optimizer/Generator): Receives a task ("Write an article about the vineyards of San Rafael") and produces a first draft.
The Editor (Evaluator): Receives that draft. Their job is not to write from scratch, but to evaluate the text based on clear criteria: Is the style correct? Is it easy to read? Does it cover all the requested points? Is the title appealing?
The Feedback: The editor doesn't just say "I don't like it." They give specific and constructive feedback: "The content is good, but the second paragraph is confusing. Rework it. Also, the title is boring; find something more catchy."
The Cycle (Loop): The writer takes that feedback, improves the draft, and returns it to the editor. This cycle repeats until the editor says, "Perfect! Publish it."

The Evaluator-Optimizer pattern works exactly like this:

An "Optimizer" LLM (the Writer) generates an initial response to a request.
A second "Evaluator" LLM (the Editor), armed with a prompt containing very clear evaluation criteria (a kind of checklist), reviews the first one's work.
The Evaluator generates feedback to improve the response.
The Optimizer uses that feedback to refine its work in a loop, until the response meets the defined quality standard.

This method is ideal when the final quality is critical and you have a clear way to measure how "good" a response is.

Real-World Example: "Professional Email Writing Assistant"

Imagine an AI tool that helps you draft important emails, ensuring they sound professional and courteous.

The User's Request:

"Help me write an email to an important client to reschedule next week's meeting. I want to move it to two weeks from now."

Refinement Cycle (Loop):

Iteration #1

Optimizer LLM (The Rushed Writer):
- Receives the request and generates a first draft that is functional but unpolished.
- Output v1 (too direct):
  
  "Hi [Client], I won't be able to make it next week. Should we move the meeting to two weeks from now? Regards, Edgardo."
Evaluator LLM (The Communications Editor):
- Receives "Output v1." Its prompt contains a quality checklist: 1. Is the tone courteous and professional? 2. Does it apologize for the inconvenience? 3. Does it actively propose solutions (specific dates/times)? 4. Is the closing formal?
- Generated Feedback: "The draft is too informal and abrupt. You should add a professional greeting, offer a brief apology, propose specific times to facilitate coordination, and use a more cordial closing."

Iteration #2

Optimizer LLM (receiving the feedback):
- Receives a new instruction: "Improve this email draft [Output v1] based on the following feedback: [Generated Feedback]."
- Output v2 (polished and professional):
  
  "Dear [Client's Name],
  
  I hope this email finds you well.
  
  I am writing to request if it would be possible to reschedule our meeting scheduled for next week, due to an unavoidable last-minute matter.
  
  I would like to propose moving it to two weeks from now. I have availability on Tuesday, October 28th at 10:00 AM or Thursday, October 30th at 2:00 PM. Please let me know if either of those times works for you or if you prefer another time.
  
  I sincerely apologize for any inconvenience this may cause.
  
  Kind regards, Edgardo"
Evaluator LLM (reviewing again):
- Receives "Output v2" and runs it through its checklist again.
- Final Feedback: "Excellent. This draft meets all criteria for professionalism, courtesy, and clarity. Approved."

Final Result: The tool doesn't give the user the first draft but rather the final, polished, and approved version, ensuring the communication with their client is impeccable.