Computer Adaptive Testing

Join the English Learning Community
Comments
Would you like to comment?

Sign In if already a member, or Join Now for a free account.

Presentation Transcript Presentation Transcript

Slide 1 : 1 by Derya ÇOKAL & Ferit KILIÇKAYA

Terminology : 2 Terminology Computer-assisted testing Computer-based testing Linear Computer-adaptive testing Non-adaptive Computerized assesment

Computer-based Testing : 3 Computer-based Testing Computer-based Testing Linear (Non-adaptive) Partly Adaptive Adaptive

Computer-adaptive Testing (CAT) : 4 Computer-adaptive Testing (CAT) form of computerized assessment a selection of items according to the test taker’s performance on previous items accommodates the test taker’s estimated ability confronts the examinee with items that best measure that ability

Computer-adaptive Testing: How it works? : 5 Computer-adaptive Testing: How it works?

CAT Item Bank : 6 CAT Item Bank pool of items with established content specifications and item parameters intended to measure examinees’ abilities at various levels

Issues to consider in creating Item Banks : 7 Issues to consider in creating Item Banks Identify and describe the L2 aspects being measured (L2 content domain) Test specifications Coverage of content at all ability levels Items and testlets Pilot-testing Calibration (difficulty level, discrimination power and guessing index) Item Response Theory-IRT

Item Response Theory (IRT) vs. Classical True Score Theory (CTS) : 8 Item Response Theory (IRT) vs. Classical True Score Theory (CTS) (Bachman, 1990, p. 210)

Item Characteristic Curves (ICC) : 9 Item Characteristic Curves (ICC) The degree to which the item discriminates among individuals of differing levels of ability (the ‘discrimination’ parameter.(a) The level of difficulty of the item (the ‘difficulty’parameter) (b) The probability that an individual of low ability can answer the item correctly (‘guessing index’) (c) Please visit http://www.assess.com for Psychometric Software and Books and Electronic Tests

Item Selection Algorithm : 10 Item Selection Algorithm A procedure that selects the most appropriate item from the CAT Item pool for each test taker depending on the questions seen and answers given.

CAT Entry & Exit Points : 11 CAT Entry & Exit Points Exit Point *items of average difficulty*self-assesment*sample items and then real ones*demographic information *Test length (variable or fixed)

Test Score Comparability : 12 Test Score Comparability Paper and pencil test scores were often higher than scores from the CBTs. This difference in scores was generally quite small and of little practical significance (Inouye and Olson, 1989). There is no medium effect for carefully constructed power tests. No effect was found for adaptivity (Mead and Drasgow, 1993).

Content balancing : 13 Content balancing Content balancing assures that the content domain is covered adequately and represented appropriately (It provides critical content validity evidence and maintains the primacy of content over all other considerations).

Test Score Comparability : 14 Test Score Comparability Test takers’ previous experiences with and attitudes towards computers, as well as, their backgrounds need to be considered (Fulcher) Results show no practical differences between computer-familiar and computer-unfamiliar test takers on TOEFL and its subparts (Taylor, et all. 1998). However, more research is needed. Local settings should also be considered.

Item Exposure : 15 Item Exposure Item exposure occurs when students see any given item. Exposure means that other students who might take the test in the future may know about the exact content of those items which have been exposed, such items should not be used again. Legal disclosure law (New York state has “truth in testing” disclosure law)

Innovations : 16 Innovations Innovative features that can be used by computer-administered items include sound, graphics, animation and video. COMPASS (writing ability test): Examinees are presented with a writing passage and asked to edit any or all segments of the passage for grammar, organization, or style. E-RATER is trained by being fed samples of open-ended essays that have been previously scored by human raters to duplicate the performance of human raters (GMAT essays.) (http://www.ets.org/research/erater.html)

Innovations : 17 Innovations PhonePASS: Users of PhonePass are given sample tasks in advance, and then have to respond to similar tasks over the telephone in “interaction” with a computer. Tasks include reading aloud, repeating sentences, saying opposite words, and giving short answers to questions. Studies show the test to exhibit a reliability coefficent of 0.91 and a correlation with an ETS Test of Spoken English of 0.88(http://www.ordinate.com)

TOEFL (CAT) : 18 TOEFL (CAT) Test purpose: To evalute the English Proficiency of people whose native language is not English (Educational Testing Service (ETS) Primary uses: As part of the standard admissions procedure for non-native speakers of English seeking admission to undergraduate and postgraduate programmes in colleges and universities where English is the language of instruction (in Canada and the USA) (ETS)

Listening : 19 Listening has two parts, the first consisting of one or two, turn dialogues or casual conversations and the second of a series of mini-lectures and/or group discussions, presented only ones. No note-taking Multiple-choice (selecting one or two options or a visual) Matching exercise (matching or ordering objects or lists.) Materials in the academic domain in two broad categories: academic issues and longer monologues or exchanges in academic listening situations. To identify main ideas, supporting ideas and/or important details or to infer the meaning of one or both of the speakers. Inferences including the pragmatic implications of what one has said Idiomatic expressions and grammatical constructions 15 minutes to answer 30 questions

BREAK!.. : 20 BREAK!..

Structure : 21 Structure Prompts: Single sentences Items: The first type of items consists of incomplete sentences and the second requires identifying the error from among four possibilities. Multiple-choice Prompts from non-specialist expository texts about a variety of topics To check whether the test-takers can recognize appropriate standard written English. 15 minutes 20 questions

Reading : 22 Reading Varied in topic (human and social geograhy, sciences and art history) Most of the texts are expository and quite complex with the limits of their length Multiple choice, selecting one or two options or a visual, inserting additional sentence “gist” questions, ability to make inferences, read for detail and knowledge of specific vocabulary items, undertanding of the principles of discourse structure and textual cohesion. Not-adaptive Returning to previously answered items 70 minutes 44 questions (4 passages/11 questions for each)

Writing : 23 Writing A single prompt Composed on screen or hand-written on a paper Paper for notes but notes are not taken into account. Domain of the task is academic. Test-takers are expected to take a position and then defend it by developing ideas and supporting them with examples or evidence. Organize their ideas logically Use a variety of syntactic forms avaible in standard written English To select appropriate formal vocabulary to express their meaning No word limit 30-minute time limit including planning, organizing and writing the essay.

CAT (Advantages) : 24 CAT (Advantages) Testing considerations Computers are much more accurate at scoring selected-response tests than human beings are. Computers are more accurate at reporting scores. Computers can give immediate feedback in the form of a report of test scores, complete with a printout of basic testing statistics. IRT and computer-adaptive testing allow testers to target the specific ability levels of individual students and can therefore provide more precise estimates of those abilities The use of different tests for each student should minimize any practice effects, studying for the test, and cheating Diagnostic feedback can be provided very quickly to each student on those items answered incorrectly if that is the purpose of the test. Such feedback can even be fairly descriptive if artificial intelligence is used (Brown, 1997).

CAT (Advantages) : 25 CAT (Advantages) Human considerations The use of computers allows students to work at their own pace. CATs generally take less time to finish than traditional paper-and-pencil tests and are therefore more efficient. In CATs, students should experience less frustration than on paper-and-pencil tests because they will be working on test items that are appropriate for their own ability levels. Students may find that CATs are less overwhelming (as compared to equivalent paper-and-pencil tests) because the questions are presented one at a time on the screen rather than in an test booklet with hundreds of test items. Many students like computers and even enjoy the testing process.(??)

CAT (Disadvantages) : 26 CAT (Disadvantages) Physical considerations CAT development is quite involved and costly. A high level of expertise and sophistication in computer technology and psychometrics related to CAT is required as Dunkel (1997) points out “It takes expertise, time, money and persistence to launch and sustain a CAT development project.” Computer equipment may not always be available, or in working order. Reliable sources of electricity are not universally available. Screen capacity is another physical consideration. Such screen size limitations could be a problem, for example, for a group of teachers who wanted to develop a reading test based on relatively long passages. In addition, the graphics capabilities of many computers (especially older ones) may be limited, and even those machines that do have graphics may be slow (especially the cheaper machines). Thus, tests involving even basic graphs or animation may not be feasible at the moment in many language teaching situations.

CAT (Disadvantages) : 27 CAT (Disadvantages) Performance considerations The presentation of a test on computer may lead to different results from those that would be obtained if the same test was administered in a paper-and-pencil format (Henning, 1991). Some limited research indicates that there is little difference for math or verbal items presented on computer as compared with pencil-and-paper version (Green, 1988) or on a medical technology examination (Lunz & Bergstrom, 1994), but much more research needs to be done on various types of language tests and items. Differences in the degree to which students are familiar with using computers or typewriter keyboards may lead to discrepancies in their performances on computer-assisted or computer-adaptive tests (Hicks, 1989; Henning, 1991; Kirsch, Jamieson, Taylor, & Eignor, 1997) Computer anxiety (i.e., the potential effects of computer anxiety on test performance) is another potential disadvantage (Henning, 1991).

CAT : 28 CAT Is computer-adaptive testing just for proficiency exams? TOEFL will be linear (non-adaptive) in September in 2005? Why?

THE NEXT GENERATION OF TOEFL : 29 THE NEXT GENERATION OF TOEFL Presentation of the next generation of TOEFL While watching the presentation, please take limitations/drawbacks we have discussed into account. Are they avoided or do we face new ones? Any innovation/change? Impact?

Related Online Classes

Copyrights © 2009 authorGEN. All rights reserved.