Nathan A. Thompson, Ph.D. Adjunct Faculty, University of Cincinnati Vice President, Assessment Systems Corporation

Similar documents
Practical Questions in Introducing Computerized Adaptive Testing for K-12 Assessments

Answers to Questions about Smarter Balanced 2017 Test Results. March 27, 2018

Comparative Evaluation of Online and Paper & Pencil Forms for the Iowa Assessments ITP Research Series

Design of High Speed Vedic Multiplier Using Carry Select Adder with Brent Kung Adder

The Introduction and Comparability of the. Computer Adaptive GRE General Test

How To... Why weigh eggs?

Multiclass and Multi-label Classification

Identity Management with Petname Systems. Md. Sadek Ferdous 28th May, 2009

Let s Play Poker: Effort and Software Security Risk Estimation in Software. Picture from

TREAT Steward. Antimicrobial Stewardship software with personalized decision support

Allocating Feed to Female Broiler Breeders: Technical Bulletin #2

Building Concepts: Mean as Fair Share

Biology Meets Math. Predator-Prey Relationships in Belowground Ecosystems. US Department of Homeland Security

Dog Years Dilemma. Using as much math language and good reasoning as you can, figure out how many human years old Trina's puppy is?

CS108L Computer Science for All Module 7: Algorithms

Econometric Analysis Dr. Sobel

Dunbia 2017 Dunbia 2017

Development of a Breeding Value for Mastitis Based on SCS-Results

NATURAL SELECTION SIMULATION

Clicker training is training using a conditioned (secondary) reinforcer as an event marker.

A Novel Approach For Error Detection And Correction Using Prefix-Adders

Design of Carry Select Adder with Binary Excess Converter and Brent Kung Adder Using Verilog HDL

9: Coffee Break. 10:00-11: Spatial Risk Mapping (Thomas Van Boekel) 11:00-12: Dynamic Bayesian Network (Yrjo Grohn)

Basic Assistance Harness Pricing & Measuring

MSc in Veterinary Education

Story Points: Estimating Magnitude

Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder

BIOLOGY 1615 ARTICLE ASSIGNMENT #3

Evolution in Action: Graphing and Statistics

Why individually weigh broilers from days onwards?

Handwashing and Habit Formation: A Theory of Behavioral Change

TECHNICAL BULLETIN Claude Toudic Broiler Specialist June 2006

MEETING OF THE ICAR WORKING GROUP ON MILK RECORDING OF SHEEP. Draft minutes

PARADE COLLEGE Mathematics Methods 3&4-CAS Probability Analysis SAC 2

Plan and Manage Breeding Programmes for Animals

Representation, Visualization and Querying of Sea Turtle Migrations Using the MLPQ Constraint Database System

Let s Play Poker: Effort and Software Security Risk Estimation in Software Engineering

INVERCARGILL CITY COUNCIL. Bylaw 2018/2 Dog Control

RELATIONSHIP BETWEEN GROWTH OF SUFFOLK RAMS ON CENTRAL PERFORMANCE TEST AND GROWTH OF THEIR PROGENY

Certificate Program in Canine Hydrotherapy

Controllability of Complex Networks. Yang-Yu Liu, Jean-Jacques Slotine, Albert-Laszlo Barbasi Presented By Arindam Bhattacharya

Turtle Ballet: Simulating Parallel Turtles in a Nonparallel LOGO Version. Erich Neuwirth

Phenotypic and Genetic Variation in Rapid Cycling Brassica Parts III & IV

Jump Start Stewardship

CATS POWER CALIBRATOR BPP

Implementation of 16-Bit Area Efficient Ling Carry Select Adder

Subdomain Entry Vocabulary Modules Evaluation

Dynamic Programming for Linear Time Incremental Parsing

Rethinking RTOs: Identifying and Removing Barriers to Owner Reclaim, Part Two

7. IMPROVING LAMB SURVIVAL

Date of Change. Nature of Change

Veterinarian Assistant

Relationship Between Eye Color and Success in Anatomy. Sam Holladay IB Math Studies Mr. Saputo 4/3/15

6. SPAY/NEUTER: FINANCIAL ASSISTANCE PROGRAMS FOR PET CARETAKERS LIVING IN POVERTY-- WE CAN T GET TO ZERO WITHOUT THEM

Clear Mind Agility International League Rules and Guidelines Sponsored by Clear Mind Agility

Effective Ways to Train a Dog

SHEEP SIRE REFERENCING SCHEMES - NEW OPPORTUNITIES FOR PEDIGREE BREEDERS AND LAMB PRODUCERS a. G. Simm and N.R. Wray

GENETIC DRIFT Carol Beuchat PhD ( 2013)

Effective Vaccine Management Initiative

Application of Peristaltic Filling for Flexibility and Accuracy

How to use Mating Module Pedigree Master

Approximating the position of a hidden agent in a graph

VETERINARY SCIENCE CURRICULUM. Unit 1: Safety and Sanitation

Australian Journal of Basic and Applied Sciences. Performance Analysis of Different Types of Adder Using 3-Transistor XOR Gate

10015NAT Graduate Diploma Veterinary Acupuncture

HSU. Turning Point Cloud

King Fahd University of Petroleum & Minerals College of Industrial Management

STEWARDS IN CKC RALLY OBEDIENCE

VetBact culturing bacteriological knowledge for veterinarians

Pierre-Louis Toutain, Ecole Nationale Vétérinaire National veterinary School of Toulouse, France Wuhan 12/10/2015

A Column Generation Algorithm to Solve a Synchronized Log-Truck Scheduling Problem

Sheep Electronic Identification. Nathan Scott Mike Stephens & Associates

Call of the Wild. Investigating Predator/Prey Relationships

Keeping and Using Flock Records Scott P. Greiner, Ph.D. Extension Animal Scientist, Virginia Tech

JUDGING RABBITS 4-H LEADER MANUAL EM4502E WHY JUDGE? HOW TO JUDGE

Results for: HABIBI 30 MARCH 2017

Dairy Herd Reproductive Records

MODERATING THE CHAT WEBINAR PRESENTERS

Comparative Analysis of Adders Parallel-Prefix Adder for Their Area, Delay and Power Consumption

HEREDITY HOW YOU BECAME YOU!

AGILITY COMMITTEE POLICY & PROCEDURES Measuring

Where have all the Shoulders gone?

Economic analysis of the Zimbabwe Handwashing Campaign Webinar of May 31 st 2018

Handling missing data in matched case-control studies using multiple imputation

What to Look for in a Pet Cremation Service

Purebred Cattle Series Synchronization of Estrus in Cattle

Milk Quality Evaluation Tools for Dairy Farmers

Monthly Webinar. Tuesday 16th January 2018, 16:00. That Was The Year That Was : Selections from the 2017 Antimicrobial Stewardship Literature

Modeling and Control of Trawl Systems

Timing is Everything By Deborah Palman

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET)

INTRODUCTION TO ANIMAL AND VETERINARY SCIENCE CURRICULUM. Unit 1: Animals in Society/Global Perspective

Design of 32 bit Parallel Prefix Adders

& chicken. Antibiotic Resistance

Applicability of Earn Value Management in Sri Lankan Construction Projects

The Future of Antibiotic Alternatives

Indigo Sapphire Bear. Newfoundland. Indigo Sapphire Bear. January. Dog's name: DR. NEALE FRETWELL. R&D Director

Understanding EBV Accuracy

doi: /

Loss Given Default as a Function of the Default Rate

Transcription:

An Introduction to Computerized Adaptive Testing Nathan A. Thompson, Ph.D. Adjunct Faculty, University of Cincinnati Vice President, Assessment Systems Corporation

Welcome! CAT: tests that adapt to each examinee The purpose of this webinar is to provide an introduction to: Item Response Theory as used in CAT CAT algorithms Implementing CAT

Welcome! There will be four parts: Intro to item response theory (IRT) Basic principles of CAT (the five components) Benefits of CAT Implementing CAT

Part 1 Introduction to item response theory

What is IRT? There are two psychometric theories: Classical and IRT IRT offers distinct advantages, the most important with regards to CAT is that items and examinees are on the same scale

What is IRT? IRT assumes that we can specify a mathematical function that models the probability of getting an item correct The item response function The following presents a figure from a classical analysis

Classical item statistics The line for the correct answer (blue) should go up while the distractor lines go down The line for the correct answer is usually of primary importance

Classical item statistics What if we had 10 groups?

Classical item statistics The general idea of IRT is to find a mathematical model for the line of the correct response (previous slide) A special form of regression need a curve rather than a line

The item response function Reflects the probability of a given response as a function of the latent trait Example:

The item response function The x-axis is the standard z score you learned in statistics classes IRFs can slide left or right, which defines item difficulty Left is easy Right is difficult

The item response function The location of an item is where the middle of the IRF is with respect to the x-axis So therefore, both items and examinees are on the z scale

Part 2 Basic principles of CAT (The Five Components)

What is CAT? A Computerized Adaptive Test (CAT) is a test administered by computer that dynamically adjusts itself to the trait level of each examinee as the test is being administered

CAT Components 1. Calibrated item bank 2. Starting rule 3. Item selection rule 4. Scoring rule 5. Stopping rule Given 1 and 2, we repeat 3 and 4 until 5 is satisfied All CAT follows this basic format we just modify the details for whatever testing situation we have

CAT Components 1. Calibrated item bank 2. Starting rule 3. Item selection rule 4. Scoring rule 5. Stopping rule Algorithms inside your testing engine Given 1 and 2, we repeat 3 and 4 until 5 is satisfied All CAT follows this basic format we just modify the details for whatever testing situation we have

1. Calibrated item bank While it is possible to design CATs with classical test theory (Frick, 1992), IRT is more appropriate because it puts items and examinees on the same scale Therefore, the items need to be calibrated with IRT

1. Calibrated item bank CAT algorithms work with any IRT model The choice of the model depends on characteristics of the test and your goals

1. Calibrated item bank The bank for the CAT should be constructed with the purposes of the test in mind Flat or peaked? If peaked, where?

1. Calibrated item bank

2. Starting rule 1. Can start everyone with the same theta estimate (e.g., theta = 0.0) Everyone gets the same first item Could be an exposure problem in a high stakes test 2. Assign a random theta estimate within an interval E.g., between theta = -0.5 and +0.5 Improves exposure levels and has little effect on a properly implemented CAT

2. Starting rule 3. Use prior information available for a given examinee Subjective evaluations, e.g., below average, above average Theta estimates from tests previously administered in the same or a prior test session Theta estimate from same test administered at a previous time

3. Item selection rule Items are selected to maximize information (how good an item is) Information is a function of the slope of the IRF An item provides more information where there is more slope

The item response function

3. Item selection Example 5 items :

3. Item selection Also, there are usually practical constraints in item selection Item exposure Content area (domain) Cognitive level Etc.

4. Scoring rule IRT scores students with a form of maximum likelihood estimation Basically, IRFs are multiplied

4. Scoring rule IRT utilizes the IRFs in scoring examinees It is not done with number-correct scores If an examinee gets a question right, they get the item s IRF If they get the question wrong, they get the (1-IRF) These curves are multiplied for all items to get a final curve called the likelihood function

4. Scoring rule Here s an example IRF

4. Scoring rule A 1-IRF

4. Scoring rule We multiply those to get a curve like this

4. Scoring rule Since we find the highest point of the likelihood function, it is called maximum likelihood estimation There are also two Bayesian methods (MAP, EAP) and weighted MLE

5. Stopping rule Depends primarily on purpose of the test: point estimation or classification? Point estimation: we want an accurate score for each student Classification: we do NOT need an accurate score, just a classification into pass/fail etc.

5. Stopping rule Point estimation methods involve actual scores, and stop when we have zeroed in enough Classification methods check after every item to see if we can make a classification within a certain degree of accuracy

5. Stopping rule For educational tests, this is usually point estimation Common stopping rule: stop the test when examinee reaches a certain level of error of measurement Means all examinees are scored with equal precision

5. Stopping rule Either type of CAT can be designed with a fixed number of items But this is a bad idea from a psychometric perspective Variable-length testing is much more efficient

1 The big picture 2 3 4 5

The big picture Item by item graph:

Part 3 Benefits of CAT

Benefits of CAT Efficiency: CATs are more efficient than conventional tests: they generally reduce test length by 50% or more (Weiss & Kingsbury, 1984) See research for examples Simulations can estimate for you Even more efficient for classification CATs average test length in single digits

Benefits of CAT Control of measurement precision: A properly designed CAT can measure or classify all examinees with the same degree of precision

Benefits of CAT Equal precision is impossible with conventional tests So the question is: is it more fair that all students see the same items, or that they are measured with the same accuracy?

Benefits of CAT Added security If everyone receives a standard test with the same 50 items, the items will become well known This effect is decreased when everyone receives a different set of items We can also make multiple forms, but is that better than CAT? Case by case

Benefits of CAT Immediate score reporting P&P testing requires the question papers to come back and be scored If immediate feedback for students is desirable, then P&P testing is not an option

Disadvantages of CAT Public relations Need to explain to examinees/parents why certain things can happen, like failing after only 10 questions, or passing with a 50% correct score

Disadvantages of CAT Sophistication Requires specially designed software Requires a lot of expertise and effort so often out of reach for small testing programs Some say too expensive, but really: ~$3000 for an administrator and testing center? The major cost in test development is the same for CAT and P&P: item development

Disadvantages of CAT Item Exposure Some items will be used far more often than others, which needs to be addressed Plenty of methods have been suggested, but they decrease the efficiency of the CAT process

Part 4 Implementing CAT

So, you want a CAT? Well, you ve decided to use CAT, and you ve built a nice item bank, what next? You need a test development system and delivery engine that does CAT I ll show you what it looks like in FastTEST Pro Late this year there will be a FastTEST Web

FastTEST Pro Common source of confusion: FastTEST is the item banker and test development system FastTEST Pro is that plus the delivery engine

FastTEST Pro Common source of confusion: FastTEST is the item banker and test development system FastTEST Pro is that plus the delivery engine

FastTEST Pro: 1. Bank items

FastTEST Pro 2. Design pool for your CAT

FastTEST Pro 3. Define CAT modules

FastTEST Pro 3. Define CAT modules

FastTEST Pro 3. Define CAT modules

FastTEST Pro Now I ll show a real CAT with FastTEST Pro You can download and use free for 30 days at http://assess.com/xcart/product.php?productid= 273&cat=1&page=1

Thank you! Questions? Any questions in the future: nthompson@assess.com

Resources CAT on Wikipedia: http://en.wikipedia.org/wiki/computerized_adaptive_testing CAT Tutorial: http://edres.org/scripts/cat/ CAT Central: http://www.psych.umn.edu/psylabs/catcentral/ PARE online: http://pareonline.net/ - see Vol 12, #1 Item Exposure: Georgiadou, E., Triantafillou, E., Economides, A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. Journal of Technology, Learning, and Assessment, 5(8). http://www.jtla.org. Want a book to learn more? I recommend Wainer (2000), Vol. 2.