Wednesday, March 21, 2018

Testing the Untestable: Standardized Testing and Higher-Level Thought

The perpetual thorn in the flesh of modern education is the standardized, high-stakes test. It is deeply flawed, and yet it is necessary.

It is necessary because there are more than 20,000 high schools in the United States, and there is a need to measure and evaluate students across all of them. The admissions process for colleges and universities has a practical need for a practical way to assess students.

Standardized testing is essentially flawed, yet it remains the best option, because any other conceivable procedure would be even more deeply flawed.

The flaw - or, to be more kind, limit - in any form of standardized testing is that it is forced to choose between more accurately measuring low-level knowledge or less accurately measuring high-level knowledge. Bloom’s taxonomy may be used to roughly approximate what is meant by ‘low’ or ‘high’ level knowledge in this context.

It would be a great advantage to accurately measure higher-level thought, but it is exactly this type of thinking which is most difficult to measure.

Todd Farley, an employee at the Educational Testing Corporation (ETC), explained from an insider’s perspective the processes used to write tests and process the millions of answer sheets. ETC is the parent of Educational Testing Service (ETS), which produces a whole alphabet soup of tests: AP, SAT, GRE, PSAT, NMSQT, and others.

Farley explains that, when such tests venture away from low-level knowledge, they don’t accurately measure higher-level thought.

In the words of Joseph Farrell and Gary Lawrence, attempts at assessing higher-level knowledge don’t establish “whether a student had any real competence about the subject.” Instead, such tests merely verify “that certain ‘keywords’ or concepts occured in student responses, whether or not the student actually understood their meaning.”

Standardized tests can do very well at measuring a student’s ability to do arithmetic calculations and master basic facts in history. Such tests can well measure a student’s efforts at simple examples of English spelling and grammar - including punctuation and capitalization.

Questions about analysis, synthesis, application, and evaluation - e.g., the upper levels of Bloom’s taxonomy - are, however, intrinsically ambiguous and arguable. Such questions cannot be structured for large-scale standardized testing.

Joseph Farrell and Gary Lawrence write:

Recall the critiques addressed so far with respect to some questions, namely, that the test-taker - particularly the more informed one - is put into the position of having to read the mind of the test-maker, which, as it turns out, is not one mind, but several. The test-taker, in other words, must guess at a consensus of minds, not just at one mind. Finally, the test is “pre-tested,” or to use Farley’s term, “validated,” as the test is given to a group of people to compile individual statistics for each question. Once these statistics have been compiled, they are gathered into a descriptive manual along with much other descriptive and technical matter about the test: for example, its aims, the formulas used in computing the statistics, instructions to the prospective user on how to administer the test,” and so on.

Farley offers, as do Farrell and Lawrence, numerous examples of individual test questions. Another scholar, Banesh Hoffmann, gives further instances, e.g. a question about ‘intentionally’ and ‘intensionally’ - or various uses of the word ‘emperor.’

It is demonstrable that, in many cases, badly-formed questions favor the less knowledgeable student. The more well-informed student is aware of nuances and counterexamples which speak against the answer which the test’s author hoped to elicit.

Perhaps the best tactic would be to explicitly define standardized testing as limited to lower-level history, mathematics, and English mechanics. A thorough mastery of these would be one goal of secondary education.

A second goal of secondary education would be those higher level skills which do not lend themselves to mass testing. These critical thinking skills are best learned and practiced in extended reading assignments, in classroom discussion, and in essay-writing.

Schools should not abandon teaching these skills merely because they can’t be well tested. University admissions should be based more on the lower-level skills, because they can be more accurately measured.