The Scoring Model Trap
“I spent six months building a sophisticated VC scoring engine. Then six months figuring out why it was making me worse at evaluating companies.”
I spent six months building the most sophisticated scoring model I could design.
Then I spent the next six months figuring out why it was making me worse at evaluating companies.
NUVC runs an 8-agent AI pipeline across 13 intelligence layers. I designed the whole thing from scratch — the architecture, the weighting logic, the signal hierarchy. The model was trained on 172 real VC deal memos from investments with known outcomes. It had more data about VC decision quality than most investors accumulate in a decade.
And I kept watching it get things wrong in ways that felt embarrassingly obvious from the outside.
Not because the model was bad. Because scoring models, at their core, are a category error.
Why Every Investor Builds One
The appeal is real.
Consistency. Apply the same criteria across every opportunity. No mood-based decisions, no availability heuristics.
Speed. Score more companies in less time, triage the pipeline, allocate attention where it matters.
Defensibility. Explain your decisions to LPs with something that looks like evidence.
These are legitimate problems. The model promises to solve all of them.
The problem is what it does to the quality of what you see.
What the Data Actually Showed
When I analysed the 172 deal memos against known outcomes — including companies like Canva, Stripe, SpaceX on the win side and Theranos, FTX on the loss side — the predictive relationships were not what the standard VC rubric suggests.
Product execution quality predicted outcomes with an r² of 0.77. Team composition — the thing most scoring models weight most heavily — came in at 0.31.
That gap should stop every investor who uses a scoring model.
We're weighting team because team is legible. You can score it: pedigree, track record, domain expertise, references. You can put it in a spreadsheet and defend it to an IC.
Product execution quality is harder. It requires actually using the product. Reading the technical architecture. Talking to early users not on the reference list. Sitting with the gap between what a founder says the product does and what it actually does.
That work is uncomfortable and time-consuming. So we score team and call it rigorous.
What Gets Measured Gets Gamed
As soon as founders understand your rubric, they optimise for it.
This doesn't mean they're building better companies. It means they're getting better at scoring well.
The pitch improves. The narrative hits all the right triggers. The team slides get more impressive. The market size calculation is done exactly the way you like.
And somewhere underneath all of that, the product is still mediocre — because product execution quality isn't on the rubric, or it's buried under team and market and traction.
The model creates a feedback loop that validates itself. You see more companies that look like your model's preferences, not because they're better companies, but because they've learned to present themselves that way.
The Precision Illusion
A company scores 8.2 out of 10.
What does that mean?
It means you've taken subjective judgments — is this market big enough, is this team strong enough, is this product defensible — and converted them into numbers that look precise while remaining fundamentally subjective.
The number doesn't sharpen your judgment. It hides it.
When a great opportunity scores 7.8 (below your threshold of 8.0), you pass — not because you disagree with the investment, but because the model says so. The model becomes the investor, and you become its administrator.
The best investments I've studied almost never would have scored well at the time of the initial check. They were doing something too early, too weird, too different from whatever recently worked. That's exactly what makes them interesting.
Scoring models are calibrated on the past. The best investments point toward futures the model hasn't seen.
What Models Cannot See
The factors that most reliably predict investment outcomes resist quantification. Not because we lack the right metrics — because they're fundamentally about human judgment in context.
Conviction. Is this person building this because they have to, or because venture looked like a good career move? You can't score this. You feel it through conversation, through the gaps in their answers, through what they bring up without being asked.
The NUVC research found conviction was the single most underweighted signal in VC decision-making. Investors report caring about it but don't consistently operationalise it in their process. The scoring models they use don't have a conviction field — or if they do, it maps to proxies (years spent on the problem, previous pivots) that miss the point entirely.
Narrative coherence under pressure. Does their story hold together when you introduce information that challenges it? This requires a conversation, not a rubric. The "conviction test" — asking questions they haven't prepared for, introducing contrary evidence — tells you more than their deck does.
Strategic clarity. Can they articulate the two or three things that matter right now and explain why everything else doesn't? This is a signal of focus and cognitive quality, not just intelligence. You find it by watching what they choose not to talk about.
The Right Role for Models
This isn't an argument against structure. It's an argument against letting structure replace judgment.
Models are useful for screening — for prioritising which opportunities deserve deep attention. They're useful for known domains with well-understood success factors. They're useful as input to a conversation, not as output that ends one.
At NUVC, the system doesn't make investment decisions. It surfaces signal, flags inconsistencies, and identifies where human attention is most needed. The judgment happens downstream. The model's job is to make that judgment better-informed, not to replace it.
Use models for screening, not deciding.
Use quantitative analysis as the beginning of a question, not the end of one.
When a model tells you to pass on something that feels right, treat that as a prompt to investigate harder — not as permission to stop thinking.
The Real Discipline
The discipline isn't building a better scoring model. It's building a decision process that amplifies judgment.
That means asking different questions. Not "does this score well?" but "what needs to be true about this company for it to succeed — and do I believe those things?" Not "is this team credentialed?" but "is this founder building something they'd build regardless of whether I invested?"
The goal is clarity, not precision. Those are different things.
Precision is a 8.2. Clarity is knowing what you actually believe about a company and why.
The best investors I've observed don't have better models. They have better questions.
Related: What 172 Deal Memos Taught Me About Pattern Recognition — the full empirical findings from NUVC's research. Reading Founder Conviction — on the signal most scoring models miss entirely.
Tick Jiang is the technical co-founder of NUVC (nuvc.ai), an AI-native venture capital intelligence platform built in Melbourne. She writes on capital, decision quality, and building across the Asia-Pacific.