Fine-tuning GPT-4o-mini for Spam Detection

TLDR: We fine-tuned a small LLM on a few dozen spam tags and it's working well.

Disclaimer on naming: we called it ros-spam, which is hopefully more scalable than OpenAI's gpt-3-3.5-turbo-4-4o-06-23-2024-11-20 or Anthropic's sonnet-3-3.5-3.5-but-better. Hindsight is 20/20.

What We Built

We created a purpose-built spam detection model specifically for our schema:

type Contact = {
  company: string;
  email: string;
  message: string;
  // ...
}

While it's currently tailored to our needs, the approach could be generalized for broader email spam detection.

The Problem

We get lots of inbound messages. Most are spam. Our workflow for triaging them was simple but inefficient:

Someone submits a message on rubriclabs.com/contact
We get a Slack notification
Someone on our team manually flags it as spam or legitimate

Here's the kicker: even at just 30 seconds per day, this adds up to hours per year (not to mention the lingering cost of context-switching) making it worthwhile to automate in a post-Cursor world.

The Solution: Fine-tuning

The data from our spam flags was simply stored in Postgres, creating what would become our training dataset:

{ "message": "We sell the best leather couches", "status" : "👎" }
// ...
{ "message": "Looking to build an agentic flight booking system", "status": "👍" }

Given the hundreds of upvotes/downvotes, deduped and cleaned (a 10-minute process, given the simple schema), fine-tuning on OpenAI was a straightforward process.

The Technical Details

The fine-tuning schema follows a standard chat message format:

type Message = {
  role: {
    role: "user" | "assistant" | "system";
    content: string;
  }
}

The actual examples are stored as JSONL, a file format where each line is valid JSON.

The actual process was refreshingly simple:

Write our array of examples to a .jsonl file
Upload the file
Wait ~10 minutes
Pay ~$1
Profit?

Does It Work?

Quantitative Evaluation

We did a head-to-head comparison between GPT-4o and ros-spam:

We held back 10% of our dataset for testing
We ran comparisons in both OpenAI playground and OpenPipe evals

The result: ros-spam achieved 100% accuracy vs ~80% for a frontier model, even with prompt engineering.

Qualitative Assessment

We shipped it to prod with a feedback loop:

Each run appears with the message in Slack as 👍/👎
We can immediately spot and correct errors
When needed, we re-tune the model 🔃

Deployment

The implementation was surprisingly painless. Accessing the model requires just a single-line change from standard GPT-4o calls, whether you're using:

the Node.js fetch API
OpenAI SDK
AI SDK
@rubriclab/agents

or any other standard method.

For those interested in alternatives, you could also host this on:

Fireworks
Together
OpenPipe

or self-host on bare metal.

Conclusion

The ROI of this exercise was clear: human-level spam tagging running 24/7 for a couple hours of dev.

Have questions or feedback? Drop us a message at hello@rubriclabs.com.