Fine-tuning GPT-4o-mini for Spam Detection
What We Built
We fine-tuned a small LLM for spam detection using dozens of historical spam tags. It's working remarkably well.
While it's currently tailored to our needs, the approach could be generalized for broader email spam detection. Our schema is roughly as follows:
type Contact = {
company: string;
email: string;
message: string;
// ...
}
The Problem
We get lots of inbound messages. Most are spam. Our workflow for triaging them was simple but inefficient:
- Someone submits a message on rubriclabs.com/contact
- We get a Slack notification
- Someone on our team manually flags it as spam or legitimate
Here's the kicker: even at just 30 seconds per day, this adds up to hours per year (not to mention the lingering cost of context-switching) making it worthwhile to automate in a post-Cursor world.
The Solution: Fine-tuning
The data from our spam flags was simply stored in Postgres, creating what would become our training dataset:
{ "message": "We sell the best leather couches", "status" : "spam" }
// ...
{ "message": "Looking to build an agentic flight booking system", "status": "legit" }
Given the hundreds of upvotes/downvotes, deduped and cleaned (a 10-minute process, given the simple schema), fine-tuning on OpenAI was a straightforward process.
The Technical Details
The fine-tuning schema follows a standard chat message format:
type Message = {
role: {
role: "user" | "assistant" | "system";
content: string;
}
}
The actual examples are stored as JSONL, a file format where each line is valid JSON.
The actual process was refreshingly simple:
- Write our array of examples to a .jsonl file
- Upload the file
- Wait ~10 minutes
- Pay ~$10
- ...
- Profit?
Does It Work?
Quantitative Evaluation
- We did a head-to-head comparison between
GPT-4o
andros-spam
: - We held back 10% of our dataset for testing
We ran comparisons in both OpenAI playground and OpenPipe evals
The result: ros-spam
achieved 100% accuracy vs ~80% for a frontier model, even with prompt engineering.
Qualitative Assessment
We shipped it to prod with a feedback loop:
- Each run appears with the message in Slack as spam or legit
- We can immediately spot and correct errors
- When needed, we re-tune the model ♺
Deployment
The implementation was surprisingly painless. Accessing the model requires just a single-line change from standard GPT-4o calls, whether you're using:
- the Node.js
fetch
API, - OpenAI SDK,
- AI SDK,
- @rubriclab/agents or any other standard method.
For those interested in alternatives, you could also host this on:
- Fireworks
- Together
- OpenPipe
or self-host on bare metal.
Conclusion
The ROI of this exercise was clear: human-level spam tagging running 24/7 for a couple hours of dev.
Have questions or feedback? Drop us a message at .