Back to articles
Comparing AI-generated and handwritten ESL questionsConversation Questions

AI vs. Manual: Which Produces Better ESL Conversation Questions?

Mar 3, 2026·5 min read

I ran an informal experiment with my B1 and B2 classes in Minneapolis. Half the weeks I used AI-generated discussion questions. Half the weeks I wrote them myself. I tracked five things: student talk time, question variety, cultural relevance, vocabulary match, and my own prep time. Here's what happened.

Speed: AI Wins, Obviously

Writing 10 good discussion questions manually takes me 20-30 minutes. Generating them with ChalkLab takes under two minutes including the review scan. This isn't even close. Over a 16-week semester, that's roughly 7 hours saved on conversation questions alone.

Variety: AI Wins (Surprisingly)

When I write questions manually, I fall into patterns. I tend to ask opinion questions starting with "What do you think about..." because that format is comfortable. AI-generated sets gave me more variety -- hypothetical questions, comparison questions, experience-sharing questions, prioritization questions ("If you could only keep three apps on your phone, which would they be?"). My students responded to the variety. Conversations were less predictable.

Cultural Relevance: Manual Wins

This is where hand-written questions still matter. My Minneapolis class includes students from Somalia, Myanmar, and Mexico. I know their backgrounds. I know which topics are sensitive. I know that a question about "your family's holiday traditions" lands differently when some students have lost family members during displacement.

AI can't know your specific students. It generates culturally neutral questions, which is fine most of the time but misses opportunities for culturally responsive teaching. When I write questions myself, I can reference specific student experiences and community contexts.

Vocabulary Match: Tie

Both approaches produced questions with appropriate vocabulary when I specified the level correctly. AI occasionally used words slightly above the target level -- nothing catastrophic, just a word here or there that needed swapping. My own questions sometimes undershot the level because I was being cautious.

Student Talk Time: Tie (With a Caveat)

Average student talk time was roughly equal with both question sources. The caveat: AI-generated questions produced more evenly distributed participation. When I wrote questions, I unconsciously tailored them to my strongest students' interests. AI doesn't have that bias.

The Verdict

Use AI for the bulk of your conversation questions. It's faster, more varied, and surprisingly less biased than hand-written questions. But write your own questions when the topic touches on students' personal or cultural experiences. That's where your knowledge of your specific classroom makes a difference no algorithm can match.

My current ratio is about 80% AI, 20% manual -- and my conversation classes have never been better. For the tools I use, see my full AI tools breakdown.