Fine-tuning GPT-4o-mini for Spam Detection

What We Built

We fine-tuned a small LLM for spam detection using dozens of historical spam tags. It's working remarkably well.

While it's currently tailored to our needs, the approach could be generalized for broader email spam detection. Our schema is roughly as follows:

type Contact = {
  company: string;
  email: string;
  message: string;
  // ...
}

The Problem

We get lots of inbound messages. Most are spam. Our workflow for triaging them was simple but inefficient:

Someone submits a message on rubriclabs.com/contact
We get a Slack notification
Someone on our team manually flags it as spam or legitimate

Here's the kicker: even at just 30 seconds per day, this adds up to hours per year (not to mention the lingering cost of context-switching) making it worthwhile to automate in a post-Cursor world.

The Solution: Fine-tuning

The data from our spam flags was simply stored in Postgres, creating what would become our training dataset:

{ "message": "We sell the best leather couches", "status" : "spam" }
// ...
{ "message": "Looking to build an agentic flight booking system", "status": "legit" }

Given the hundreds of upvotes/downvotes, deduped and cleaned (a 10-minute process, given the simple schema), fine-tuning on OpenAI was a straightforward process.

The Technical Details

The fine-tuning schema follows a standard chat message format:

type Message = {
  role: {
    role: "user" | "assistant" | "system";
    content: string;
  }
}

The actual examples are stored as JSONL, a file format where each line is valid JSON.

The actual process was refreshingly simple:

Write our array of examples to a .jsonl file
Upload the file
Wait ~10 minutes
Pay ~$10
...
Profit?

Does It Work?

Quantitative Evaluation

We did a head-to-head comparison between GPT-4o and ros-spam:
We held back 10% of our dataset for testing

We ran comparisons in both OpenAI playground and OpenPipe evals

The result: ros-spam achieved 100% accuracy vs ~80% for a frontier model, even with prompt engineering.

Qualitative Assessment

We shipped it to prod with a feedback loop:

Each run appears with the message in Slack as spam or legit
We can immediately spot and correct errors
When needed, we re-tune the model ♺

Deployment

The implementation was surprisingly painless. Accessing the model requires just a single-line change from standard GPT-4o calls, whether you're using:

the Node.js fetch API,
OpenAI SDK,
AI SDK,
@rubriclab/agents or any other standard method.

For those interested in alternatives, you could also host this on:

Fireworks
Together
OpenPipe

or self-host on bare metal.

Conclusion

The ROI of this exercise was clear: human-level spam tagging running 24/7 for a couple hours of dev.

Have questions or feedback? Drop us a message at .

We don't have a sales team. Let's talk.

Keep reading

My Summer at Rubric

Team

September 8, 2024

Multi-Staging → Local to Prod in Record Time

DevOps

February 16, 2024