— / Field note

What the assessment actually measures

20 April 2026 · 9 min read

Why three dimensions and not four. Why we score 0–100 instead of stars. Why the assessment ends at a report URL rather than a sales call. The methodology behind the 10-minute organizational AI literacy assessment.

The assessment is the centerpiece of the 174 site. Not as a marketing instrument — though it does also work as one — but as a deliberate test of whether a buyer should engage us at all. The shortest description: ten minutes, twelve questions, three dimensions, one leadership-shareable report.

This essay covers why each piece is what it is. The full rubric — every question, every option weight, every bucket threshold — lives in the assessment rubric resource. The point of this piece is to talk about the design decisions, not just describe the output.

Why three dimensions

When we drafted the assessment, we tested it at three, four, five, and seven dimensions. Five was a mess (the dimensions overlapped); seven was unreadable; four nearly survived but the fourth dimension was always either a duplicate of governance or a duplicate of capability. Three turned out to be the smallest set that captured what actually mattered for a rollout.

Adoption is the breadth signal. Are AI tools in the work, day-to-day, across the people who would benefit? You can’t have a literacy program for a workforce that isn’t using AI, and you can’t measure capability you don’t have access to.

Capability is the depth signal. Is the AI work being done any good? Documented prompts, structured evaluation, multi-step workflows that actually ship — versus people pasting questions into a general assistant and accepting the first answer.

Governance is the durability signal. Will the program survive scale? A written policy, a clear owner, quality controls before customer-facing output ships, executive sponsorship. Without this, programs reach 100 seats and unwind under the first incident.

The case for splitting governance into “policy” and “controls” is real — they are conceptually different. But mid-market buyers reliably think of them together, and conflating them costs us very little. Three dimensions, equally weighted, scored 0–100 each.

Why 0–100 instead of stars

We tested a five-star rating early. It produced cleaner-looking reports. It also produced a number nobody trusted.

Mid-market L&D buyers need to forward the assessment report to their CHRO, who will probably forward it to their CEO. Each handoff is a chance for the report to be dismissed. A 5-star rating system invites the reader to argue with the math: “Why is this a 3 and not a 4? What’s the difference?” A 0–100 percentile invites the reader to interpret the math: “We scored 43 on Capability — what does that mean?”

The second conversation is more useful. A number out of 100 is also more legible to CFOs, who have spent careers reading percentage scores in similar formats. Clean, unsexy, defensible.

Why the assessment ends at a URL

Most “assessments” on AI vendor sites are lead capture in a costume. You answer some questions; you give your email; a salesperson calls you. The “report” is a thinly-disguised list of features the vendor offers.

We made a different bet. The artifact is the value. If the report is genuinely useful — if a buyer can show it to their CHRO and say “this is where we are” — then the buyers who become customers are the ones who already understand they need a program. The buyers who don’t get pressed by a salesperson; they keep their report and go on with their lives. We get fewer leads, of higher intent. The buyers get more value, sooner.

This is not a sales-funnel optimization argument. It’s a brand argument. The vendors who treat their lead-magnet assessment as an actual instrument become trusted; the vendors who use it as a manipulation tool become an annoyance the moment the buyer realizes what’s happening. We’d rather be in the first category.

Why honest answers matter

The single biggest failure mode of the assessment is a buyer who picks aspirational answers because the score feels low.

A 25 on Capability is a clear instruction: curriculum first, before any rollout work. A 50 fudged into existence is misleading. The recommendation engine — which maps (dimension, bucket) pairs to specific gap copy and a recommended next move — produces useful output only when the inputs are honest. A program designed against a flattering score is a program designed for a company that doesn’t exist.

The instructions inside the assessment say “honest answers beat aspirational” twice for this reason. We considered building in cross-checks (do your A1 and A3 answers contradict?), and we may eventually. For now we’ve decided to trust the buyer to self-report honestly, on the grounds that the buyers who fudge their scores aren’t going to take 174’s recommendations seriously anyway.

Why three buckets

Inside each dimension, the 0–100 score collapses into one of three buckets:

Emerging (below 40): the program for that dimension is not yet running.
Developing (40–69): the program is running but inconsistent.
Mature (70+): the program is durable and ready to scale or deepen.

The thresholds aren’t tuned to a normal distribution — they’re tuned to the rollout decisions a buyer needs to make. We don’t care about how a customer compares to other customers; we care about what the customer should do next. Three buckets give us three meaningfully different recommendations. More buckets would give us more cases to author copy for, with no clearer guidance.

A consequence of this design: the rubric is unforgiving at the edges. A 39 reads “Emerging” and a 41 reads “Developing.” We’ve debated softening this — adding pluses and minuses, or a four-bucket scheme. The case against is that the buckets are decision instruments, not ego instruments. A 39 is a clear instruction to start the program; a 41 is a clear instruction to standardize what’s already happening. That’s the work. The two-point gap doesn’t change which work to do.

Why we publish the rubric

The full assessment rubric lists every question, every option, every score weight, and the bucket thresholds. We publish it on purpose, even though publishing it makes the assessment more “gameable” by buyers who want to engineer a particular score.

Three reasons. First, a methodology that’s been argued in public is more credible than one that’s been protected in private. Second, the buyers who would game their score to look better are not buyers we want — they’re buying a program for their own appearances rather than for their company’s actual capability lift. Third, the assessment isn’t really about the score in isolation; it’s about the recommendation that follows from it. Gaming the score gets you the wrong recommendation, which gets you the wrong program, which costs you more than the embarrassment of a low score would have.

Methodology in the open is also a brand position. We’re a studio for applied AI education in a category dominated by demos and vendor decks. We win by showing our work.

What happens after the report

Three things, in roughly equal measure:

The buyer keeps the URL and uses it internally to greenlight a pilot. Some of these become customers later; some don’t. Either way, the assessment did the work it was supposed to do.
The buyer reaches out for a Concierge conversation. These are the highest-intent leads we get. They’ve taken the assessment, they’ve read the report, they know roughly what they want to do, and they want a human to help them shape the rollout.
The buyer self-onboards onto the Self-paced tier and starts a 1-seat pilot the same day. We don’t talk to most of these buyers until they’re ready to expand, and that’s fine.

If you haven’t taken the assessment yet, it’s here. Ten minutes. The same instrument we’d run with any mid-market buyer.

— / Next move

Where does your org actually stand?

Ten minutes. Three dimensions. A leadership-shareable baseline.

Initiate the assessment ↗