Ep 67. Cognitive load theory and learning math with John Sweller
This transcript was created with speech-to-text software. It was reviewed before posting but may contain errors. Credit to Canadian Podcasting Productions.
In this episode, Anna is joined by Dr. John Sweller, emeritus professor at the University of New South Wales and the researcher best known for developing Cognitive Load Theory. Together, they explore how cognitive load theory should guide classroom practice, particularly in mathematics.
John explains the limits of working memory, how experts and novices approach problem solving differently, and how strategies like worked examples can help manage cognitive load. They also discuss whether productive failure is supported by research and the strong experimental evidence supporting explicit instruction, particularly when students are learning new content.
This episode will be extremely valuable for educators, especially math teachers, who want to better understand how students become expert problem solvers and what that means for effective instruction.
This episode is also available in video at www.youtube.com/@chalktalk-stokke
SHORT COURSE
La Trobe Short Course: Evidence-informed Mathematics Teaching – An Introduction https://shortcourses.latrobe.edu.au/evidence-informed-mathematics-teaching
TIMESTAMPS
[00:00:22] Introduction [00:03:53] Biologically primary and biologically secondary knowledge [00:09:34] Element Interactivity[00:15:37] Two characteristics of working memory [00:16:52] Understanding long-term memory [00:21:06] Does working memory capacity vary for different people?[00:21:44] Can working memory capacity be altered? [00:22:45] How can you measure working memory? [00:23:49] Explaining cognitive load theory [00:27:55] Can you measure cognitive load? [00:31:51] Sweller’s definition of problem solving [00:37:28] Understanding schemas [00:44:26] The way novices and experts categorize problems differently [00:46:11] The expertise reversal effect [00:50:13] How to identify students are ready for problem solving [00:52:12] Thoughts on productive failure [00:55:40] Why is there still debate about prioritizing inquiry-based approaches in math instruction? [00:00:00] Anna Stokke: Welcome to Chalk & Talk, a podcast about education and math. I'm Anna Stokke, a math professor and your host.
Welcome back to another episode of Chalk & Talk. Today I'm joined by one of the most influential researchers in educational psychology, John Sweller, the originator of cognitive load theory. Cognitive load theory has shaped how we think about working memory, worked examples, and the role of explicit instruction when students are learning new material. His work has had a profound impact on instructional design and on the broader conversation about evidence-based teaching.
This episode is important for educators at all levels, particularly those teaching mathematics who want to understand how human cognitive architecture should guide classroom practice. In this conversation, we talk about biologically primary and secondary knowledge, the limits of working memory, the expertise reversal effect, and what that means for learning and designing instruction. We talk about John Sweller's research on problem solving with a particular focus on math.
For example, how does an expert approach a math problem differently from a novice and what does that mean for teaching? We also discuss the worked example effect for managing cognitive load in the classroom and whether productive failure is supported by research. Finally, we address a core question. If decades of controlled experimental research support explicit instruction when students are learning new content, why does the controversy persist? If you're an educator who wants a clearer understanding of how students become expert problem solvers and what that means for the way we teach, this episode is for you.
I hope you enjoy it. Before we get started, I want to let you know that I'll be co-delivering a four-session short course on evidence-based math teaching through Latrobe University's School of Education starting in April 2026. The course is open to teachers anywhere in the world.
I'll include a link in the show notes for registration. It would be great to see you there. Now, without further ado, let's get started.
I am honoured to be joined by Dr. John Sweller today. He is an emeritus professor at the University of New South Wales in Sydney, Australia. He is best known for formulating cognitive load theory, which is one of the most highly cited educational psychology theories.
He holds a PhD in psychology from the University of Adelaide. He has authored over 180 academic publications with an emphasis on the instructional implications of working memory limitations. And he is also a fellow of the Academy of Social Sciences in Australia.
And I am thrilled to have him here today to talk all about cognitive load theory. Welcome to the podcast.
[00:03:19] John Sweller: Thank you, and thank you for your introduction and for inviting me.
It's a pleasure and a privilege to be here.
[00:03:25] Anna Stokke: Yeah, I'm really thrilled to meet you. And it's an honour to have you on.
Of course, we talk about your work all the time on this podcast. It's really nice to hear from you about it. So, I'm excited about that.
I thought we'd kind of start with the background. So, let's start by talking about cognitive architecture. It seems to me that that's the right place to start.
I hope you agree. And maybe you can explain cognitive architecture for the listeners.
[00:03:53] John Sweller: Sure. I use human cognitive architecture as the base for cognitive load theory, which is an instructional theory. Several basic, I guess you could call them items, that constitute human cognitive architecture. So, let's go through those.
The first item concerns categories of information. We can categorize information in a near infinity of ways. Most of the categorization systems don't really have any consequences for instruction.
But there's one that's really, really important, and that was devised relatively recently, say about 15, 20 years ago by David Deary in the United States. He distinguished between biologically primary and biologically secondary knowledge. Now, biologically primary knowledge is knowledge we've evolved to acquire over thousands of generations.
It can be very, very complex, but we don't find it complex. We find it very, very easy to deal with. I'll give you the most obvious example of biologically primary knowledge.
Our ability to listen to and speak our native language. It's an enormously complex skill. Not taught.
We don't need a teacher. We don't have courses in schools at any level which indicate, look, this is how you learn to, for example, speak. In order to speak, we have to organize our mouth, our tongue, our lips, our breath, our voice.
It's immensely complex, and most of us would have no idea how to teach that. We don't need to. We just pick it up easily and automatically.
That's an example of biologically primary skill. I guess, despite the fact that we want to talk about mathematics, the most obvious example of a biologically secondary skill, because it corresponds to listening and speaking, is reading and writing. Reading and writing is biologically secondary.
We invented reading and writing about 5,000 years ago, and during the vast bulk of that time, most people on earth could not read and write. Only a tiny minority of people could read and write. It was only with the advent of mass education about 150 or so years ago that most people in at least some societies could read and write.
It's a different skill. It's biologically secondary. It's a skill we can pick up, but we don't pick it up in or anywhere near the same way as we pick up a biologically primary skill.
Until that distinction was made between biologically primary and secondary, we had a problem in education because a lot of people said, and this was understandable in a way, look how easy it is for people to learn this particular skill, mathematics. Look how difficult it is for somebody to learn mathematics as opposed to how easy it is to learn something as complex as listening and speaking. We're teaching it all wrong.
What we ought to be doing is simply putting people in a mathematic environment where they will easily and effortlessly pick up the skill just as they pick up that complicated skill of listening and speaking. We had two generations of this. It caused immense problems because learning mathematics is not the same as learning how to listen and speak.
In many ways, learning how to listen and speak in information processing terms is far more complex than even a complex topic like mathematics. But we don't find it complex, not when acquiring our native language. So that distinction is critical.
We invented schools. We invented education in order to teach biologically secondary knowledge, not biologically primary knowledge. So that's the first distinction that needs to be made.
We're dealing with biologically secondary knowledge and that is acquired and we need to assist people in the acquisition in a certain way.
[00:08:35] Anna Stokke: Yeah. So essentially, all of the subjects that are taught in school are biologically secondary knowledge.
[00:08:43] John Sweller: Yeah, exactly. Because if they were biologically primary, we wouldn't need to teach them. We just pick them up easily and effortlessly.
[00:08:52] Anna Stokke: That's correct. You wouldn't learn to do mathematics on your own if no one taught it to you. But if you were just around humans, you would just pick up how to speak your native language.
You would figure out how to walk, right? Without someone actually telling you how to do it. That's biologically primary. So over time, humans have evolved just to be able to do those things.
[00:09:18] John Sweller: Yeah. And we have not evolved to easily automatically pick up mathematics. It's an entirely different subject.
[00:09:27] Anna Stokke: Absolutely. You have to be taught. Agreed.
Okay, let's go to the next part. So that's cognitive architecture.
[00:09:34] John Sweller: Next part of cognitive architecture that we need to discuss, we've labeled in cognitive load theory, element interactivity.
This is what we mean by element interactivity. Some subject matter consists of large numbers of elements. Let's talk about something like learning to acquire a second language as an adult.
But not a primary language, a second language. You need to learn a large number of translations of nouns. But you can learn each of those individually.
You can learn the translation of the word dog without learning the translation of the word cat and vice versa. It's hard to learn the vocabulary of a second language, not because each individual piece that you have to learn is difficult. It's just there are so many items of information.
Huge number. Other categories of information consist of probably far fewer elements, but the elements interact. You can't pick up one without considering a whole lot of other elements simultaneously.
And those areas have high element interactivity, and mathematics is probably the best example of a high element interactivity subject. If you think of an equation like A plus B all over C equals D, solve for A, make A the subject of the equation, you've got a whole lot of elements there. The elements consist of the various symbols.
They also consist of the relevant rules of mathematics and how those relevant rules correspond to the symbols. There's a lot of elements, and you really can't in any adequate way consider them in isolation. You can't change anything on that equation without considering the entire equation.
You have to consider all of the elements simultaneously, and mathematics can be a difficult subject, not because of a huge number of elements. There probably aren't all that many elements, but the elements that are there, they all interact. You have to consider a substantial number of elements simultaneously.
So, element interactivity is important when you're dealing with instruction, and I'll indicate the reasons why it's important in the next few minutes. We've divided up knowledge into biologically primary and secondary. We're only dealing with secondary.
We can now divide up secondary knowledge into low and high element interactivity knowledge. Now, I've categorized them, but element interactivity is not a category, it's a continuum. You go from very low to very high, but when talking it's easier to talk of categories rather than continuous.
Let's talk about high and low element interactivity. In mathematics, we're talking about high element interactivity information, biologically secondary high element interactivity information. The next item of human cognitive architecture, how do we acquire information? And there are two basic ways we can acquire information, and humans are pretty good at both of them.
One way is to solve problems. You acquire new information by solving a problem. There's something you have to find.
How do I go about finding it? What do I do? You can acquire information by problem solving. And we're, as humans compared to other mammalian species, we're quite good at that. But there's another way we can obtain information, a way that is unique to humans.
We can obtain information from other people. We're really, really good at that. In fact, in terms of the amount of information we can obtain from other people, we're unique.
We can obtain enormous amounts of information from other people. That is probably the defining characteristic of being a human. There's no other species.
Other species can obtain information from other members of the species, but tiny amounts. We can obtain enormous amounts of information from other people, and we're unique in that respect. We stand alone amongst the mammalian species, and probably the reason we've become the dominant species is because of that skill.
No other species does that. So, there are those two ways of obtaining information, either by problem solving, which we only use in the normal course of events when we can't obtain information from other people. Problem solving is a slow, inefficient way of obtaining information, and anybody who's conducted research can confirm that.
It's a long, slow, difficult process. It can take us years and years and years to discover something by problem solving, and what we discover can be transmitted to somebody else, literally in a few minutes sometimes, depending on the nature of what it is we're talking about. The easiest, most efficient way of obtaining information, and we've evolved to obtain information in both those ways, but the most efficient way is by getting it from other people.
Irrespective of how you get it, that information has to be processed, and it's processed in a structure called working memory. We're pushing on with human cognitive architecture here. Working memory has two, in some senses, peculiar characteristics when we're dealing with novel information.
First characteristic, it's extremely limited in capacity. We can memorize about seven items of information at any given time. Novel information, I'll be talking about familiar information later, but novel information, about seven items, no more than about that.
We can process no more than possibly two, three, maybe four items of information, where processing means putting them together in some way, relating them, dealing with them simultaneously. Two, three, four items of information, that's it. No more than that.
So, working memory, when dealing with novel information, is extremely limited.
Working memory can hold information for about 18 seconds, and after that, it's gone. We can hold it there longer by repeating it to ourselves. If there's something you need to remember, something new, then you keep saying it to yourself.
In that case, you can hold it indefinitely. If you don't repeat it to yourself, after 18 seconds, it's gone. Let's talk about long-term memory.
We can transfer information from working memory, with its limited duration and limited capacity, to long-term memory. And long-term memory, unlike working memory, it may have limits in duration and capacity, but if it does, we don't know where they are. Enormous capacity, enormous duration.
In effect, we can consider both of them as being unlimited. Now, the last item or aspect of human cognitive architecture, which is the most important one of them all, and which all of the preceding items, aspects, led up to, we can transfer information from long-term memory, which is now familial information, it's not novel anymore, it's in long-term memory, so it's familial information, back to working memory. And when we transfer information from long-term memory back to working memory, a miracle of sorts happens.
Instead of having the capacity and duration limits of working memory that I discussed a few seconds ago, there are no known limits of either capacity or duration for information transferred from long-term memory back to working memory to govern action that's appropriate to the extant context. The limitations disappear. What's the consequence of that? Well, the consequence of that is we're transformed.
We become different people. We can do things. We can think of things we couldn't dream of doing or thinking of previously.
We all know education is transformational. What I've just described indicates the reason for that transformation. If nothing has gone into long-term memory, nothing has been learned, and pretty much nothing happens.
The aim of education is to have information go into long-term memory. Let me just put that into a mathematical context. I described before when dealing with an equation A plus B all over C equals D, solve for A, make A the subject of the equation.
For somebody for whom that equation is novel, they're not familiar with it. Each of those elements are separate elements. They have to try to process all of those elements in a working memory that is extremely limited in capacity and in duration.
It's a very difficult thing to do. In contrast, somebody like you, a mathematician, that equation A plus B all over C equals D, solve for A, the equation is familiar to you. You've seen it before.
It's in long-term memory.
The problem is familiar to you. You've seen that sort of problem before. The solution is familiar to you.
In effect, that entire equation and its various elements and the problem and its solution, for you, they're a single element. Your working memory is not overwhelmed by that equation, by that problem. It's not overwhelmed.
It's simple, straightforward. It's a single element. You can deal with a single element in your working memory without any difficulty whatsoever.
Child who's just beginning to learn algebra, that's a huge number of elements. As I said, we're transformed once information goes into long-term memory. We can do things we otherwise couldn't.
That's pretty much human cognitive architecture.
[00:20:36] Anna Stokke: Okay. So, I have a couple of questions about that.
All right. So, working memory is limited in the amount it can hold, and it's also limited in duration. In other words, things disappear rather quickly.
You said within 18 seconds, if we don't repeat them. Do some people have working memories that can hold more than others? That's my first question. And my second question is, can you alter working memory? Can you make your working memory stronger?
[00:21:06] John Sweller: The first question, do people vary? The answer is probably, it's extremely difficult to measure.
And the reason it's difficult to measure is that, as we discussed when discussing that human cognitive architecture, working memory changes depending what's in long-term memory. If you give people something which they're a little bit more familiar with, their working memory is going to appear to be massively greater than somebody who's not familiar with that material. And that makes it difficult to deal with.
And on the second question, can we change working memory? Some people have suggested we can. I really doubt it. What's really being changed is what's in long-term memory.
Because once something's in long-term memory, your working memory changes. It's like comparing you to somebody who's just starting to learn algebra. You can't use those materials to compare your working memory with a novice algebra student.
Because it's dramatically changed by what's in your long-term memory. So there's no simple answer to your question there.
[00:22:22] Anna Stokke: Yeah. In fact, it would be rather hard to measure in some sense. I guess what you'd have to do is find a topic that the person really knows nothing about and see how much they can hold. Or you could just start throwing numbers out of order that don't make any sense to someone and see how much they could hold at a time.
Could you do something like that to measure working memory?
[00:22:45] John Sweller: Even that sounds as though you ought to be able to. But somebody like you has been dealing with numbers for decades because that's what you do. You're just so familiar with them.
Somebody else who isn't as familiar with the numbers, people's familiarity with numbers varies enormously. You would have to find something where you can be very confident that people don't vary all that much in their knowledge of that material. They're not easy things to find.
We vary in just about everything in terms of our exposure to whatever issue it is that we're talking about.
[00:23:47] Anna Stokke: Yeah, I understand. And, you know, I spend my spare time memorizing the digits of π.
We've talked about cognitive architecture. So, let's talk about cognitive load theory. Like that's the part that really applies to instruction and teaching, right? So, what exactly is it? Like what do people mean when they talk about cognitive load theory?
[00:23:49] John Sweller: Cognitive load theory begins with that architecture that we've just discussed.
And you can use that architecture to perform hypotheses. Concerning instruction, you can hypothesize, well, look, this form of instruction ought to be better than this other form of instruction, given this cognitive architecture. So cognitive load theory, use that architecture to generate hypotheses.
And those hypotheses are tested by randomized controlled trials. So, we get one group of students who are taught one way, another group of students who are taught in a different way. And then we give everybody a test to see which way was better.
So, the theory is used to generate hypotheses, those tested using randomized controlled trials. And if the hypothesis turns out to be correct, we've got a new cognitive load theory effect. And we've developed about one and a half dozen of such effects.
So cognitive load theory is a combination of that cognitive architecture, which we've just described, and the instructional hypotheses that are generated by that architecture, and the results of those hypotheses. Ultimately, the effectiveness or otherwise of the theory is measured by the extent to which it can generate hypotheses that are of interest to people who are instructing, and which can provide us with novel instructional procedures. That's essentially what the theory consists of the cognitive architecture and the consequences of that architecture in terms of empirical evidence for the effectiveness of particular instructional procedures.
[00:25:38] Anna Stokke: Okay. So, there's the cognitive architecture part, which is really a theory. The theory allows you to form hypotheses, which you can then test using experimental research, right? And then that's sort of what we kind of go with, like what do the RCTs show, right?
John Sweller (25:57 – 26:58)
Yeah, that's correct. Let me just make one point clear. We talk about long-term memory and working memory.
There's a lot of data on that as well. That data was derived when, decades ago in the 1950s and 60s, some of it. So, there is data on working memory and its characteristics.
There's some data, less, but there's some data on long-term memory and its characteristics. And look, there's also data on the distinction between biologically primary and biologically secondary knowledge that David Aguirre demonstrated to us. So, the components of human cognitive architecture, there's independent data of those, independent of instructional issues.
Those issues were not discussed with instruction in mind. Cognitive load theory has brought together those issues, but they're components of cognitive architecture for which there's a lot of data.
[00:26:58] Anna Stokke: Okay, perfect. Thank you for making that part clear. And then I wanted to ask about cognitive load. So cognitive load is essentially when your working memory gets overloaded. Is that correct?
[00:27:11] John Sweller: Yeah, we have a cognitive load because of the limitations of working memory when dealing with novel information as discussed before. So, the theory is called cognitive load theory because that's so central. In some way, I wonder whether it shouldn't have been called the long-term memory theory or something, because long-term memory is, in some ways, it's more central than working memory.
But anyway, for historical reasons, it ended up being called cognitive load theory.
[00:27:36] Anna Stokke: Got it. Okay, and so the idea really with cognitive load theory is to kind of minimize cognitive overload, right? So, design instruction to minimize cognitive overload.
And is there a way to measure cognitive load, like to measure when someone's cognitively overloaded?
[00:27:55] John Sweller: We use what are called subjective ratings. It's extremely simple. You give somebody something to learn and then you ask them afterwards on a rating scale from one to nine, how difficult did you find this? Where one is very, very easy and nine is very, very difficult.
You can get a subjective measure of it in that way. And that's also useful because when we put forward a hypothesis, this instructional procedure is going to result in a heavy working memory load, a heavier cognitive load than this other procedure. We can not only determine that from theoretical reasons, that it ought to be heavier working memory load, but we can get some idea of a measurement of it as well by using that subjective rating.
Get several dozen participants in an experiment and ask them, how difficult did you find this? And if you find instructional procedure A gets a rating of one or two indicating it's easy and instructional procedure B gets a rating of seven or eight indicating it's very difficult, that gives you an idea of the difference between the procedures.
[00:29:14] Anna Stokke: And then I was wondering, like, do you think, could you also sort of measure it in the same way? I've talked to some people that do precision teaching and things like that. So, they're really keen on rate measures and fluency. You can measure if someone's fluent with a skill by really seeing how quickly they can do it.
Would that sort of thing help measure cognitive overload? Could you tell if someone's cognitively overloaded if they're taking a really long time to complete a skill that really shouldn't take that long?
[00:29:52] John Sweller: We sometimes use that. If you use this procedure, people take a long time to learn. If you use this other procedure, people learn it much more quickly.
And we attribute that. It's not as direct a measure of cognitive load as a rating scale, but it acts as a very good proxy of working memory load. If somebody's taking a long time to learn something, probably a cognitive load associated with it.
[00:30:18] Anna Stokke: So, I'd like to talk about some of your work on problem solving, particularly in math because actually, a lot of your work has been done on math or science, right? If I remember correctly.
[00:30:31] John Sweller: Yeah, the majority of it has. More recently, we've started working on humanities areas and essay writing, and exactly the same principles apply. But much of the work has been carried out using mathematics here. In a way, that's almost been coincidental.
The PhD students I've had who have come along tended to be people from mathematics backgrounds.
[00:30:52] Anna Stokke: It's true. Greg Ashman has a background in physics, right?
[00:30:56] John Sweller: Yeah, that's right. And as a teacher, he seems to be teaching more maths than physics.
[00:31:01] Anna Stokke: Yeah, I've had him on the show before. So just so listeners know that Greg Ashman was your PhD student, right? And he does great work, especially just bringing this stuff to the public and teachers in general and explaining it in ways that people can understand.
So, we're grateful for his work too.
[00:31:23] John Sweller: Yeah, he's able to do that in a manner better than anybody else I know. He's superb at it.
[00:31:29] Anna Stokke: Let's talk about problem solving. So, before we do that, and I actually read your paper from I think 1988. It's on problem solving, just very recently.
So, you use that term and I just want to make sure I understand what you mean by the phrase problem solving and so that the listeners know what we're talking about too.
[00:31:51] John Sweller: Problem solving occurs when you give somebody a problem and ask them to solve it rather than showing them how to solve it. Perhaps the easiest way to talk about it is to talk about one of the cognitive low theory effects, easily the most commonly studied one called the worked example effect.
And the worked example effect occurs when you get, let's say, two groups, frequently more than two groups, but for simplicity's sake, let's talk about two groups, one group of students and they're learning new material and they've been taught the material in whatever way they're taught. And then they're given a series of problems to solve. That's not an unusual way of teaching mathematics.
You teach people something and then you get them to solve a whole lot of problems on it. That's the first group. Let's call that the problem solving group.
And the second group taught exactly the same material in exactly the same way, but instead of being given a whole lot of problems to practice on, they're given exactly the same problems along with their solutions. And the solution is a worked-out solution. You indicate to people, okay, this is the problem.
Here's how you solve it. And you present the solution in whatever detail is normal in that area. And then you give everybody a test.
Let's see who comes up with, is there any difference? And if there's a difference, who's better at solving the test problems? And the usual result in this experiment has been replicated on, goodness knows how many times it's been replicated all over the world. The group who are shown the worked examples virtually invariably obtain better results on the test than the group who are just given the problems to solve. And that occurs automatically almost every single time an experiment like that is run.
And it occurs not only on problems that are very similar to the original problems, near-transfer problems, not far-transfer, but near-transfer problems that bear some similarity to the original problems, they're solved better as well. So that's the worked example effect.
And most of those results have come from mathematics because mathematics uses problems. But while not directly relevant to this particular discussion, you need to note that the same results are obtained in areas like humanities, where the problem may not be a mathematics problem, might be something like, what are the causes of World War I and discuss them. In other words, essay-type questions.
People are better off if they read ideal answers as worked examples than if they try to answer that themselves in the first instance. Once they've studied a lot of these worked examples, they're much better at dealing with any of the issues associated with that area in education.
[00:35:08] Anna Stokke: OK. So, in math, by problem solving, you mean a type of instruction, really. You're not giving them worked examples that show them how to solve the problems.
[00:35:18] John Sweller: The instruction consists of, well, you've got to teach them something explicitly to begin with. But after that, you give people lots and lots of problems to solve.
It's really the way most of us, depending on the era in which we were educated, most of us would have learned mathematics. You know, the old textbooks would consist of, you'd explain some mathematical principle. You'd give one or two worked examples.
And then at the end of the chapter, you'd have a whole lot of problems. And the variations of that are still being used.
[00:35:58] Anna Stokke: We're not talking about problem solving being constructivism, say.
That would be an extreme form of problem solving. We're mostly just talking about not giving people worked examples or sort of scaffolding the instruction, at least in some way, so that the students know how to solve these examples.
[00:36:19] John Sweller: Absolutely. In fact, your term, extreme form of problem solving or constructivism, that's accurate. Some people take the view you should not really give any sort of form of instruction. All you should do is put somebody into, I guess, problem solving environment and let them sort of work out things for themselves.
And again, that goes right back to what I started with, the biologically primary and biologically secondary, because constructivism arose precisely from that sort of scenario. You know, we learned to listen and speak so easily just being put in a listening and speaking environment. You should learn mathematics in the same way.
Put somebody into mathematics environment and they'll learn mathematics just by looking at problems and solving them and playing around with them. And that's all they have to do. And they'll pick it all up and it works for listening and speaking.
It doesn't work for mathematics.
[00:37:19] Anna Stokke: So now we've established what we mean by problem solving. A really important thing to talk about here are schemas. So, what are schemas?
[00:37:28] John Sweller: I'll say a few words about them. I tend to no longer use the term and I just use knowledge. Schema is just knowledge.
Schema is simply knowledge. But you can think of a schema as being a situation where you see something and you understand the various elements that go to make up that issue that you're looking at. I used the example before that and keep using the same equation.
A plus B all over C equals D, solve for A. You've got a schema for that. As a mathematician, you've got a schema for that. You look at that, you immediately recognize the problem.
You know what sort of a problem it is. You know what the elements are that constitute the problem. You know what the elements are that constitute the solution.
You know the solution. That's the schema. And you've got literally hundreds of thousands of those schemas.
Some of it came from mathematics education. A lot of it came from the game of chess. Decades ago, back in the late 1940s, a Dutch psychologist wanted to know why the chess grandmasters always win when they're playing weekend players.
And he couldn't find any reason for it. They don't look ahead more moves. They don't consider a greater range of moves.
He couldn't find any difference between chess grandmasters and weekend players except for one difference. He gave chess grandmasters a board configuration taken from a real game, showed it to them for five seconds, took it away, asked them, put the pieces back in the way you've just seen. And they would do that with an 80 to 90 percent accuracy.
They were very good at it. The same with weekend players. They had about a 25 percent accuracy.
Now, does that mean that the chess grandmasters have a much better working memory? And this goes back to the question you asked before, how do we measure working memory? Turns out they don't have a better working memory because if you repeat the experiment with random board configurations, everybody's at about 25 percent. Chess grandmasters always win because each board they come to, let's say they're playing a dozen weekend players simultaneously, they come to each board, they look at the configuration, they recognize it, they know the best move for that configuration, and they make that move. That's a schema.
It's probably easier and more sensible to just talk about, they've got knowledge in long-term memory concerning board configurations, and mathematicians like you have exactly the same knowledge but in mathematics. You've got schemas or knowledge concerning hundreds of thousands of problem states. You tend not to be aware of them because we're not aware of what's in long-term memory, we're only aware of what's in working memory.
Working memory is consciousness and most of what's in long-term memory we're not aware of.
[00:40:27] Anna Stokke: It's interesting and I think it's really instructive to think about this, to think about the difference between the way a novice thinks and the way an expert thinks. And I think that sometimes in instruction, people really overestimate what novices can do. And I think they get it wrong.
They give them problems that they don't have the knowledge to solve. And what happens is they start sort of applying these trial and error approaches, right? And then they start making mistakes and those errors get committed to long-term memory, right? The wrong ways of doing things. I've seen this a lot.
And so, the example I was going to mention is just because I think it will be kind of instructive for math is this is just a system. It's a system of equations and two unknowns. So, the first equation is x plus y is five and the second equation is two x minus y is eight.
So, if I look at that system of equations, I will immediately see that the best thing to do is to add the two equations together because the y's are going to cancel. Then I'm going to solve for the x and I'm going to plug it back in to get the y. And there are a number of ways to solve a system like that. But what a novice learner will do if they don't have these techniques in long-term memory, they'll start trying to plug in values for x and y and come up with some solution or do something really complicated.
The first thing I was talking about, when you look at and you can see it, that's kind of like what we call mathematical intuition, which you used to call schemas, right? Which now you're calling knowledge, but that's fine. And I think it's really important because we need to understand that novice learners really need to get the right instruction, like what you're talking about, the worked example, right? So that they can develop those schemas.
[00:42:26] John Sweller: Absolutely. And once they've developed the schema or knowledge for that particular problem, it's a simple problem at that point. It's an impossible problem before and a simple problem afterwards. It makes all the difference.
And that knowledge is held in long-term memory. We hold enormous numbers of those schemas or that knowledge in long-term memory. And you often hear people saying, oh, look, I've got a terrible memory.
They don't. We're just unaware. We just take for granted what's there.
And the other thing we take for granted is how long it can take to acquire that knowledge. In the days when I used to be training teachers, I'd tell the trainees the first time they went out teaching, look, don't go into the classroom and tell the students everything you've ever learned about this particular topic. We forget how long it takes us to acquire all that knowledge.
It takes a long time. There's an enormous amount. And that knowledge that we've got can be an impediment to us transmitting that knowledge to other people because it's so simple for us.
We've forgotten how complex it used to be when we first started learning it.
[00:43:46] Anna Stokke: Yes, exactly. And so, the other thing I found interesting in that paper I was reading of yours is that when you give students a list of math problems, so you can have a list of math problems and maybe some of them involve triangles.
So, you could have trigonometry and you could have a similar triangle. You would solve one by trigonometry, one by similar triangles. One's maybe an area problem.
And then maybe you have some other shapes, maybe you have some ladders and that sort of thing. And the difference between the way that the novices would categorize those problems versus the experts. So, can you talk a little bit about that?
[00:44:26] John Sweller: Novices will almost invariably categorize problems according to, I guess, the visual parameters of the problem.
This problem looks similar to this other problem. And sometimes from a mathematical perspective, two problems are really similar, but look really different. And of course, an expert mathematician knows this problem doesn't look like anything like this other problem, but they're really the same problem.
And alternatively, these two problems look as though they're the same, but they're not the same. They're dramatically different. Once you're highly knowledgeable in an area and you know the solutions, you know the solution to this problem, which looks unlike this other problem, is exactly the same for both problems.
The solution is the same. So, you categorize them differently as a consequence of that.
[00:45:27] Anna Stokke: Yeah. I mean, it's interesting and it really lends to that other thing you were talking about. The experts really think in a different way than the novices. And so, of course, the corollary to that is that you can't teach novices and experts the same way, right?
[00:45:46] John Sweller: Yeah.
[00:45:47] Anna Stokke: Or you shouldn't.
[00:45:58] John Sweller: No, you shouldn't. And even someone who's more expert may wish to become even more expert and the procedures you use would be different.
[00:46:00] Anna Stokke: And I think sometimes the logic goes, though, that if you want to teach someone to solve problems, you give the problems to solve, right?
[00:46:11] John Sweller: Yeah. Look, I should talk about another of the effects because that relates directly to the point you're making, and that's called the expertise reversal effect. If you've got a novice, let's go back to worked examples, you should start by giving them lots of worked examples.
As their expertise increases, they still need to learn more, but they don't need to study problems anymore. They need to solve problems. In other words, you switch from studying problems to solving problems.
And you find that at certain levels of expertise, problem solving is better than studying worked examples. So, you run that worked example experiment with somebody who's a little bit more expert, you find that for those people, what they need is lots and lots of practice at solving problems. Once you've looked at a worked example and solved one or two problems, you now know how to solve those types of problems.
But the word that's used in this area is automatize. You have a certain set of procedures that you've learned. You've now learned them, you've understood them, but you still need to think about them.
You know, A plus B equals C. How do I make A the subject of the equation? Do I divide everything by B? No, that doesn't work. No, no. I subtract B from both sides.
You now understand that. You know how to do it, but you've got to think about it. Now at some point, when you're faced with something like that, you don't think about it.
You know, what do I do with this B? You just know automatically, okay, if I want to make A the subject of the equation, I subtract B from both sides. You don't try to work it out. Well, before you reach that point, you need a lot of practice at those sorts of problems, just so it can become automatic without you having to think about it.
In other words, if you've got to think about something, you're using working memory. If it's automatic, you're no longer using working memory and you can use working memory for other things. So, at a certain point, you need to stop studying worked examples and start practicing solving problems just to make things automatic.
Because until they're automatic, you can't think of other things. Because at some point, you don't just want to know how do I solve a problem like A plus B equals C, solve for A. You have to know the mathematics behind it automatically, effortlessly, so that you can use that on more complicated problems. That way you can use working memory on the more complex issues because you've automated how do I get rid of that addend on the left-hand side and put it on the right-hand side.
Once that's automated, then you know how to solve that sort of problem easily and effortlessly. But that requires a lot of practice at problem solving. You don't have to study worked examples which show you how to subtract B from both sides in order to make A the subject of the equation.
You don't have to do that anymore. You know that, but you still have to think about it. At some point, if you're going to use that information, you need to be able to use it without thinking about it to give you the spare working memory capacity to think about whatever else you're now learning.
[00:49:53] Anna Stokke: Yeah, precisely. And that's like fluency, being able to do something effortlessly and automatically, right? That's a really important stage of the learning process.
How does a teacher know when they can move a student past the worked examples onto problem solving?
[00:50:13] John Sweller: That is a major research question that we've been studying for quite a while. We're making some progress on it. But ultimately, at this point, the best advice I can give is, once you're an experienced teacher, you realize, you understand.
In other words, you're learning to become a teacher, let's say. And then you as a teacher realize, look, at this point, they shouldn't be studying worked examples anymore. They're not learning anything further. They understand this. They need to be making it. They need to be automatizing this.
And you switch over to problem solving. The purpose of problem solving ought to be to automatize the information you've acquired. If you're using problem solving in order to work out what you ought to be doing, you should still be studying worked examples.
[00:51:08] Anna Stokke: And I guess, I mean, you could assess the students to see if you're now at that point where you can move them ahead.
[00:51:14] John Sweller: An answer to your earlier question is, how do you know when to shift? The most common answer we came up with is assessment. And we've worked out some rapid means of assessment, which are there to, in effect, not tell the student anything, but to tell you, to tell the student, this is what you ought to now be doing.
[00:51:36] Anna Stokke: The other thing I wanted to ask about was this theory of productive failure or productive struggle, because your work suggests that instructional design should actually minimize cognitive overload. But there's this work on productive failure by Kapur.
I'm not really sure how to say the name. The idea that getting students to struggle with a problem, which obviously would overburden working memory, and then later providing instruction after that might improve learning. Do you know about that work and do you have any thoughts on it?
[00:52:12] John Sweller: Yeah. Look, we can't get those experiments to work. I think that's the shortest answer. Yeah, we ought to be giving people problems to solve, giving them problems to solve right from the beginning.
It's such a long, slow way of acquiring information. Again, going right back to the beginning, biologically primary, biologically secondary, giving people problems to solve is not an effective way of learning. For most students, it's not motivating.
There's no shortage of students out there who have been problem solving mathematics all the time they've been doing mathematics. And the main consequence of that is, I don't want to ever see mathematics again attitude, which is the last thing we want. It simply doesn't work.
Experiments which demonstrate it working tend to be experiments where people have sufficient knowledge to the point where they really need to practice. And again, that goes back to the expertise reversal effect. Yeah, you need to solve problems, but only once you have sufficient knowledge to be able to solve them and you're practicing at solving them.
Attempting to figure something out for yourself, as I said earlier, we don't need to teach people how to solve problems, biologically primary skill. Everybody solves problems using a particular technique. It's called means-ends analysis.
And that just means you look at where you are, you look at where you have to go, you extract the differences between them, and you find a problem solving operator that will reduce those differences. Nobody teaches us that. Biologically primary, we do it automatically.
And having people do that repeatedly on a problem they can't solve, it gains them nothing.
[00:54:10] Anna Stokke: Okay. So, you're not a fan of productive failure.
[00:54:13] John Sweller: No, I'm afraid I'm not. Nah.
[00:54:15] Anna Stokke: When you said you haven't been able to get those experiments to work, I guess you're being very polite.
But I think if I understood correctly, you've done research on that and it did not demonstrate that it was effective. Is that right?
[00:54:28] John Sweller: Yeah. In fact, you mentioned Greg Ashman before. A lot of his PhD was precisely on that issue.
[00:54:34] Anna Stokke: I think he talked about that when I had him on too.
[00:54:37] John Sweller: He couldn't get it to work. And I've done work with other people after him and we couldn't get it to work either.
[00:54:45] Anna Stokke: Let's move on to our final question. So, your work on cognitive load theory has been around since the 80s.