Fine-tuning GPT-4o-mini for Spam Detection
November 29, 2024
TLDR: We fine-tuned a small LLM on a few dozen spam tags and it's working well.
Disclaimer on naming: we called it ros-spam
, which is hopefully more scalable than OpenAI's gpt-3-3.5-turbo-4-4o-06-23-2024-11-20
or Anthropic's sonnet-3-3.5-3.5-but-better
. Hindsight is 20/20.
What We Built
We created a purpose-built spam detection model specifically for our schema:
type Contact = { company: string; email: string; message: string; // ... }
While it's currently tailored to our needs, the approach could be generalized for broader email spam detection.
The Problem
We get lots of inbound messages. Most are spam. Our workflow for triaging them was simple but inefficient:
Someone submits a message on rubriclabs.com/contact
We get a Slack notification
Someone on our team manually flags it as spam or legitimate
Here's the kicker: even at just 30 seconds per day, this adds up to hours per year (not to mention the lingering cost of context-switching) making it worthwhile to automate in a post-Cursor world.
The Solution: Fine-tuning
The data from our spam flags was simply stored in Postgres, creating what would become our training dataset:
{ "message": "We sell the best leather couches", "status" : "π" } // ... { "message": "Looking to build an agentic flight booking system", "status": "π" }
Given the hundreds of upvotes/downvotes, deduped and cleaned (a 10-minute process, given the simple schema), fine-tuning on OpenAI was a straightforward process.
The Technical Details
The fine-tuning schema follows a standard chat message format:
type Message = { role: { role: "user" | "assistant" | "system"; content: string; } }
The actual examples are stored as JSONL, a file format where each line is valid JSON.
The actual process was refreshingly simple:
Write our array of examples to a .jsonl file
Upload the file
Wait ~10 minutes
Pay ~$1
Profit?
Does It Work?
Quantitative Evaluation
We did a head-to-head comparison between GPT-4o
and ros-spam
:
We held back 10% of our dataset for testing
We ran comparisons in both OpenAI playground and OpenPipe evals
The result: ros-spam
achieved 100% accuracy vs ~80% for a frontier model, even with prompt engineering.
Qualitative Assessment
We shipped it to prod with a feedback loop:
Each run appears with the message in Slack as π/π
We can immediately spot and correct errors
When needed, we re-tune the model π
Deployment
The implementation was surprisingly painless. Accessing the model requires just a single-line change from standard GPT-4o calls, whether you're using:
the Node.js
fetch
API
or any other standard method.
For those interested in alternatives, you could also host this on:
Fireworks
Together
OpenPipe
or self-host on bare metal.
Conclusion
The ROI of this exercise was clear: human-level spam tagging running 24/7 for a couple hours of dev.
Have questions or feedback? Drop us a message at hello@rubriclabs.com.