AI on Item Response Theory (IRT)

I’m learning about and working on Item Response Theory (IRT). Though I’ve heard of it, I don’t know much about it. Gemini created a useful summary for me:

1. The Core Concept: The “High Jump” Analogy

Think of your math course as a stadium full of high-jump bars.

The Item (The Bar): Every math problem in your course is a bar set at a specific height. A “2+2” bar is 1 inch off the ground. A “Multivariable Calculus” bar is 7 feet high.

The Student (The Athlete): Every student has a “vertical leap” score.

The Prediction: If a student has a 5-foot leap and approaches a 4-foot bar, IRT predicts they have a very high chance of clearing it. If they approach a 6-foot bar, they will likely fail.

The IRT Magic: In old-fashioned testing, if you get 10/10 on an easy test, you look like a genius. In IRT, the system says, “You cleared ten 1-inch bars. That doesn’t mean you’re a genius; it just means we know your leap is at least 1 inch.”

2. How to “Embed” Difficulty in Your Course

In IRT, we give everything a numerical value (usually between -3 and +3, but you can use any scale).

The Step-by-Step Design:

A. Calibrate the “Items” For every item in your math course (e.g., “Integration by Parts”), you need a Difficulty Parameter.

Analogy: Think of this as the “Gravity” of that item.

How to get it: You don’t just guess. You look at how thousands of other students performed. If students who are generally good at math keep failing this specific item, its “Gravity” (Difficulty) score goes up.

B. Create the Student “Skill Profile” Instead of a grade (like an A or B), the student’s profile has a single number representing their Latent Ability.

Analogy: This is the student’s “Power Level.”

How it works: Every time a student interacts with a item, their Power Level shifts. If they solve a “Difficulty 2.0” problem, their Power Level rises toward 2.0. If they fail a “Difficulty 0.5” problem, their Power Level sinks.

3. Making the Prediction (The “Probability Curve”)

IRT doesn’t just say “Yes, they will pass” or “No, they will fail.” It gives you a probability.

Imagine a “S-shaped” curve.

If the student’s Skill is exactly equal to the Item’s Difficulty, the IRT model predicts a 50% chance of success.

This is the “Learning Sweet Spot.” If a student has a 50-70% chance of getting a problem right, they are challenged but not frustrated. This is exactly where you want to point them in your course.

4. Pre-requisites Enhance IRT

Traditional IRT treats questions like they are in a vacuum. But you have “Prerequisites.”

The “Bridge” Metaphor: In a course, you know that “Basic Derivatives” is a bridge you must cross to get to “Optimization Problems.”

Recommendation: If a student fails an “Optimization” item (Difficulty 2.5), your system shouldn’t just say “You’re bad at math.” It should look back across the course to the “Derivative” item (Difficulty 1.2). If the student’s Skill is lower than the Derivative item, the system realizes the “bridge” is broken and recommends they go back and fix that specific item first.

5. Practical Implementation Tips for a Non-Technical Designer

The “Start-up” Problem: When you first build the course, you won’t know how hard the items are.

Solution: Give them a “Community Guess” difficulty (e.g., Algebra = 1, Calculus = 3). As real students use the system, the IRT math will automatically “auto-correct” those numbers based on actual success rates.

The Student “Warm-up”: Don’t give a new student a Calculus problem immediately.

Solution: Give them a “Calibration Test” (3 easy, 3 medium, 3 hard). This quickly finds their “Power Level” so the course can light up the items that are in their “Sweet Spot.”

The “Discrimination” Factor (The “Telling” Question): Some questions are better at testing skill than others.

Analogy: A “Trick Question” is a bad test of skill because even geniuses get it wrong. In IRT, you can label items with a “Discrimination” score. High discrimination means: “If you get this right, it almost certainly means you’ve mastered this level.”

Nano Banana, Close Enough

Dr. Philippa Hardman has written an excellent article on using Google’s Nano Banana AI image generation tool to support learning. The piece outlines six evidence-based use cases that go far beyond simple infographics: visualization, analogy, worked examples, contrasting cases, elaboration, and generation. Each strategy is grounded in decades of cognitive and educational research, and Hardman provides concrete prompts that instructional designers can immediately put to use.

The article also reinforces a critical lesson I’ve learned from my own experiences with AI: often it’s close enough, but it’s critical to review the outputs carefully.

Nano Banana generated 5-step visual guide for tying a bowline knot
Nano Banana's worked example for tying a bowline knot (from Hardman's article)

Take, for example, the worked example image that Hardman includes in their article—a 5-step visual guide for tying a bowline knot. The bowline is a fundamental knot used in countless situations, from sailing to rescue operations to everyday tasks. When tied correctly, it’s reliable and secure. When tied incorrectly, it can fail catastrophically.

The Nano Banana-generated image contains errors in the knot-tying sequence. This isn’t a criticism of Hardman’s work. They are using it as an example of the tool’s capabilities, not as a knot-tying tutorial, but rather a reminder that even when AI produces something that looks professional and well-organized, domain expertise and careful review remain essential. As a sailor, I spotted the mistake immediately.

So yes, use Nano Banana to create worked examples, visualizations, and contrasting cases. But always review the outputs with the same professional rigor you’d apply to any instructional material. Because when it comes to teaching and learning, “close enough” isn’t good enough.

AIxED Recap

I attended the AIxED in Boston on November 21, 2025

We’re all in this together

Education is currently navigating a period of Future Shock. Institutions, administrators, and faculty are struggling to keep pace with the acceleration of AI technology. Nobody has the complete answer. The policy landscape is lagging behind student and faculty usage. However, this struggle is universal. The overwhelming message from the conference is that we are all on this journey together, learning from each other.

Courseware is about to get interesting

Artificial Intelligence has fundamentally changed the value chain of content. The lecture hall model is fading, replaced by a focus on durable skills—competencies like ethical use, critical thinking, and lifelong learning. The core job as educators is shifting. They are no longer content creators simply delivering facts. They are experienced content designers who craft impactful learning environments. The curriculum must evolve to teach students how to engage with this new, AI-accelerated knowledge.

Higher-ed is lagging behind

The practical adoption of AI is hampered by two key areas: administrative inertia and a lack of specific training. Students and faculty are rapidly using general-purpose tools, often far ahead of administrative policy. To close this gap, there is an urgent need for two things. First, comprehensive teacher training to ensure AI is used intentionally and ethically. Second, the development of pedagogy-first tools that integrate data and design learning experiences (like AI tutors or “guided learning modes”) rather than simply replacing existing systems.