— / Case study

Inside 174: we took our own assessment

20 April 2026 · 11 min read

A studio building an AI literacy program should be able to take its own assessment and publish the results. This is that — real numbers, real gaps, the rollout shape our own report points us toward.

A studio building an AI literacy program should be able to take its own assessment and publish the results. This is that.

We took the 10-minute organizational AI literacy assessment — the same one we ask mid-market buyers to take — and answered it as honestly as we could about ourselves. Then we let the same scoring engine produce the same report it would produce for any customer. Below: the actual numbers, the bucket per dimension, the gap copy and recommendation copy from the rubric, what we agree with, what surprised us, and what we’re going to do about it.

Methodology

A note on how the answers were chosen, before any of the scores.

174 is a small studio. Two of us, plus periodic contractors. We ship AI tooling daily — the marketing site you’re reading is one such artifact. The team is unusual in that everyone is a senior practitioner using AI heavily, but we have approximately none of the program infrastructure (written policies, formal review processes, funded learning curriculum) that we’d recommend to any of our mid-market customers.

That asymmetry is exactly the kind of thing the assessment is built to surface — so we expected the report to show high adoption and capability with weak governance. We answered each question by picking the option that most accurately described us, even when an option further up the scale would have been flattering. The point of the exercise was to see clearly, not to look good.

The full answers are listed at the bottom of this piece for transparency.

The scores

These are the numbers the live scoreAnswers() function returned for our answers. You can verify the math by reading the assessment rubric — every option weight is published.

Adoption

96 / 100

Mature

Highest-leverage gap

Adoption is strong — the lever is now depth, not breadth.

Recommended next move

Move to depth: agentic workflows curriculum + custom curriculum tuning.

Capability

62 / 100

Developing

Highest-leverage gap

Patterns exist but aren't reliably reused. Standardize the wins and ship them as a curriculum.

Recommended next move

Concierge ($199/seat/mo) for the rollout team — they'll cascade the lift.

Governance

42 / 100

Developing

Highest-leverage gap

Policy exists but enforcement is thin. Tie controls to actual review steps people use.

Recommended next move

Concierge runs a governance workshop with leadership and IT.

Overall: 67 / 100 — Developing, two points shy of the Mature threshold at 70.

Recommended rollout shape: Department-wide rollout — starting with governance.

What the report actually says about us

A few honest reactions.

The Adoption score (96) is roughly right. Both of us use AI tools daily, across general assistants, code copilots, custom GPTs, and internal automations. The 4-point gap from 100 reflects “we don’t actively use the AI features inside every SaaS we touch” — which is true. That’s not a gap worth closing for its own sake. The bucket — Mature — is the right call, and the recommendation (“move to depth, not breadth”) is the right next move.

The Capability score (62) is also roughly right, and we agree with the diagnosis. We have prompts we reuse. We don’t have prompts we evaluate. Our multi-step systems are strong (we ship them) but our individual prompt evaluation discipline is informal. The bucket gap copy — “patterns exist but aren’t reliably reused” — describes us accurately. The recommendation is calibrated for a customer who’d pay us, which we obviously won’t, but the underlying instruction is right: ship our wins as a documented internal curriculum.

The Governance score (42) is the most interesting result, and the one where the rubric’s recommendation is least applicable to us. We answered “No” to “do you have a written AI usage policy?” and “No one in particular” to “who owns AI enablement?” — both are accurate for a 2-person studio where the founders are also the practitioners. The score (42) lands in Developing because our G3 (review controls — 66) and G4 (executive sponsorship — 100, since we are the executives) pulled the dimension up.

The bucket gap copy (“policy exists but enforcement is thin”) doesn’t really fit us — we have no policy at all, and the 42 is averaged out of mixed answers. This is a real limitation of bucket-based assessments at the per-question level. A future iteration might surface single-question outliers explicitly. For now, the dimension score is honest, even if the bucket copy is slightly over-fitted.

The overall recommendation (“department-wide rollout starting with governance”) doesn’t apply to us in literal terms — we have no department, just two people. But the underlying instruction translates: governance is our weakest dimension and the next move should address it. For us, that means writing the same kind of policy we recommend our customers write. We’ve been operating without one because we never needed to bind anyone else’s behavior; that’s no longer a sufficient reason given that we’re now publishing one.

What we’re doing about it

The report’s recommendation, translated for a 2-person studio:

Write the policy. This week. Using our own governance starter template. It will be embarrassingly short — 2-person studios don’t need 5,000 words of governance — but the act of writing it is the work. Everything else follows.

Stand up an internal curriculum index. We have prompts, evaluation rubrics, and agent patterns scattered across notes, Slack, and individual heads. The Capability recommendation is right: we should ship them as a documented internal artifact. We’re going to use the same MDX content collection we used for the resources library on this site, because it works.

Stop adding adoption surface; deepen the existing one. The Adoption recommendation is the one that’s easiest to follow. We don’t need to add more AI tools to our daily work; we need to use the ones we have more carefully. That’s a posture, not a project.

Re-run the assessment in three months. The URL will be the same; the report will be dated. We’ll publish the second report alongside this one and let you see whether anything actually changed.

Why publish this

Three reasons.

First, methodology that survives its own application is more credible than methodology that doesn’t. If we hadn’t been willing to publish our own scores, we shouldn’t have published the rubric. Either we believe the assessment is useful or we don’t; you don’t get to claim it’s useful for other people but exempt yourself.

Second, we’re a brand-new studio with no customer case studies yet. The honest thing to do, in lieu of customer cases, is to make ourselves the case. The numbers above are real, the recommendations are real, the next moves are real. When customer pilots ship, those will replace this — but in the meantime, the case study slot isn’t empty.

Third, we wanted to show what the report’s output actually looks like in long form. Most of the buyers who’ll take the assessment will see their own report. But for the buyers who’re still deciding whether to spend ten minutes with us, this case study is a transparent preview of what they’d get.

If the report shape, the recommendations, and the honest discussion above feel useful — the 10-minute version for your own organization is the next step. If you’re a buyer comparing 174 to other vendors, the same comparison will be available to you: take their assessment, take ours, see whose output you’d actually forward to your CHRO.

The full answers, for transparency

For anyone who wants to verify the math themselves against the rubric:

A1. What share of your team uses AI tools in their day-to-day work? → More than 60% (100)
A2. Which AI tools are in active use? (multi) → ChatGPT/Claude (20), Copilot for code (20), Custom prompts/GPTs (20), Internal agents (25) = 85 / 100
A3. How often does your team use AI for real work? → Daily (100)
A4. Satisfaction with current AI tools? → Strong — it’s changed how we work (100)
C1. Prompting skill? → Documented prompts that get reused (66)
C2. Evaluation? → Spot checks by a senior person (33)
C3. Multi-step / agentic workflows? → It’s a core part of how we work (100)
C4. How do you learn? (multi) → YouTube/Twitter (10), Internal Slack (15), Internal lunch-and-learns (25) = 50 / 100
G1. Written AI usage policy? → No (0)
G2. Who owns AI enablement? → No one in particular (0)
G3. Quality controls before customer-facing AI output ships? → Documented review steps (66)
G4. Exec sponsorship? → Funded program with exec accountability (100)

Adoption: (100 + 85 + 100 + 100) / 400 = 96. Capability: (66 + 33 + 100 + 50) / 400 = 62. Governance: (0 + 0 + 66 + 100) / 400 = 42. Overall: (96 + 62 + 42) / 3 = 67.

The same answers, run through the same engine, will produce the same numbers. If anything in the math doesn’t reconcile, that’s a bug we want to know about.

— / Next move

Become the next case study.

A pilot can start with a single seat. We’ll publish what we ship together.

Start a pilot ↗