What is the exam-grading process?
THE GRADING PROCESS (How ASQ examinations are graded)
Just as great care is taken in developing an exam, ASQ goes to great lengths to ensure that the grading process provides an accurate assessment of a candidate’s proficiency.
ASQ uses procedures that meet the Standards for Educational and Psychological Testing, which were developed jointly by the American Educational Research Association (AERA), the American Psychological Association (APA) and the National Council on Measurement in Education (NCME).
Cut Score Process (The process to determine the passing score)
The passing grade, or cut point, is established through a procedure called a “cut score study.” The methodology used on ASQ exams is called a Modified Angoff, and is based on the work of the late William Angoff, a renowned measurement research statistician in Princeton, NJ.
The cut point for an ASQ exam is established each time a body of knowledge (BOK) is created or revised. For this process, a panel of 12 to 15 subject matter experts, also called judges, is convened. The panel’s first task is to set the performance standard for the exam. Through consensus they determine a set of characteristics that they expect of a minimally qualified or "borderline” candidate in relation to the BOK. The distinction regarding borderline candidates is key to understanding the cut score process, as it ultimately draws a very fine line between candidates who are qualified to be certified and those who are not. The expectations for performance, therefore, need to be clearly stated and agreed to by all of the participants in the study.
Once that list of characteristics is developed, the subject matter experts use it as a guide to help them rate each question on the test in terms of what proportion of 100 such borderline candidates will get the answer right. For example, the judges may agree that borderline candidates will know a particular topic in the BOK very well when asked a definition question, and therefore they may estimate that 85% to 90% will get it right. But the same candidates will be much more challenged in that topic when required to apply a specific formula to get the correct answer (resulting in estimates of 35% to 45% correct).
The results of this two-day cut score study are then presented to the Certification Board. Along with the written expectation of performance that the panel developed, the summary of the judges’ combined estimate of the difficulty of the exam is presented as the recommended cut point for the exam. Once that raw cut score point is established by Board approval, it is converted to a scaled score, which becomes the minimum score necessary to earn certification in that BOK.
Although the raw cut score is established for a specific number of questions correct for the first exam under a BOK, the scaled score is what is reported to the candidates. This scale score allows adjustments for exam difficulty on subsequent forms of the test.
The goal of ensuring that two versions of the same exam have the equivalent degree of difficulty is achieved through a process known as common item equating. Here ASQ selects a set of questions from the previous exam and embeds them in the next exam. This set of questions, called equaters, is a kind of mini-exam in that the questions are representative of the previous exam’s difficulty level (some easy, some hard, some in the middle) and cover areas of the BOK proportionately. ASQ then develops the rest of the test with different questions, some new and some previously used. This way ASQ can administer almost entirely new tests each time and still maintain the established standard of performance.
For example, on Test 1, the mean score of the candidates is 111; on Test 2, their mean score is 108. All of which could mean either that Test 1 was a lot easier than Test 2, or that the candidates who took Test 1 were significantly better prepared than the candidates who took Test 2. Before making any adjustments to the cut point based on differences in exam difficulty, more information is needed about the two candidate groups. To gather that information, comparisons are made between the performances of the two groups on the common items (equaters) in the two tests. If the two groups perform equally well on the equaters, then it is safe to conclude that Test 2 is in fact harder than Test 1. Only then is the cut point adjusted to offset the effects of that more difficult exam. Through this method, both tests will fairly assess the candidates’ abilities while maintaining a consistent scaled score to pass.
This means candidates shouldn’t worry about whether they will get a hard test or an easy test. If they get a hard test, they won’t have to get as many questions right to meet the standard. If they get an easy test, they will have to get more of those easy questions right in order to meet the standard.