StaffTrac Lexicon of Critical Terms

A   B   C   D   E   F   G   H   I   J   L   M   N   O   P   Q   R   S   T   U   V   Z   References

Ability - A characteristic that is indicative of competence in a field. (See also aptitude.)

Ability Testing - Use of standardized tests to evaluate an individual’s performance in a specific area (i.e., cognitive, psychomotor, or physical functioning).

Achievement tests - Standardized tests that measure knowledge and skills in academic subject areas (i.e., math, spelling, and reading).

Accommodations - Describe changes in format, response, setting, timing, or scheduling that do not alter in any significant way what the test measures or the comparability of scores. Accommodations are designed to ensure that an assessment measures the intended construct, not the child’s disability. Accommodations affect three areas of testing: 1) the administration of tests, 2) how students are allowed to respond to the items, and 3) the presentation of the tests (how the items are presented to the students on the test instrument).
Accommodations may include Braille forms of a test for blind students or tests in native languages for students whose primary language is other than English.

Age Equivalent - The chronological age in a population for which a score is the median (middle) score. If children who are 10 years and 6 months old have a median score of 17 on a test, the score 17 has an age equivalent of 10-6.

Alternate Forms - Two or more versions of a test that are considered interchangeable, in that they measure the same constructs in the same ways, are intended for the same purposes, and are administered using the same directions.

Aptitude - An individual’s ability to learn or to develop proficiency in an area if provided with appropriate education or training. Aptitude tests include tests of general academic (scholastic) ability; tests of special abilities (i.e., verbal, numerical, mechanical); tests that assess “readiness?for learning; and tests that measure ability and previous learning that are used to predict future performance.

Aptitude tests - Tests that measure an individual’s collective knowledge; often used to predict learning potential. See also ability test.

Accountability - The demand by a community (public officials, employers, and taxpayers) for school officials to prove that money invested in education has led to measurable learning. "Accountability testing" is an attempt to sample what students have learned, or how well teachers have taught, and/or the effectiveness of a school's principal's performance as an instructional leader. School budgets and personnel promotions, compensation, and awards may be affected. Most school districts make this kind of assessment public; it can affect policy and public perception of the effectiveness of taxpayer-supported schools and be the basis for comparison among schools.

Accountability is often viewed as an important factor in education reform. An assessment system connected to accountability can help identify the needs of schools so that resources can be equitably distributed. In this context, accountability assessment can include such indicators as equity, competency of teaching staff, physical infrastructure, curriculum, class size, instructional methods, existence of tracking, number of higher cost students, dropout rates, and parental involvement as well as student test scores. It has been suggested that test scores analyzed in a disaggregated format can help identify instructional problems and point to potential solutions.

Achievement Test - A standardized test designed to efficiently measure the amount of knowledge and/or skill a person has acquired, usually as a result of classroom instruction. Such testing produces a statistical profile used as a measurement to evaluate student learning in comparison with a standard or norm.

Action Research - School and classroom-based studies initiated and conducted by teachers and other school staff. Action research involves teachers, aides, principals, and other school staff as researchers who systematically reflect on their teaching or other work and collect data that will answer their questions. It offers staff an opportunity to explore issues of interest to them in an effort to improve classroom instruction and educational effectiveness. (Source: Bennett, C. K."Promoting teacher reflection through action research: What do teachers think?" Journal of Staff Development, 1994, 15, 34-38.)

Affective - Outcomes of education involving feelings more than understanding; likes, pleasures ideals, dislikes annoyances, values.

Alternative Assessment - Many educators prefer the description "assessment alternatives" to describe alternatives to traditional, standardized, norm- or criterion-referenced traditional paper and pencil testing. An alternative assessment might require students to answer an open-ended question, work out a solution to a problem, perform a demonstration of a skill, or in some way produce work rather than select an answer from choices on a sheet of paper. Portfolios and instructor observation of students are also alternative forms of assessment.

Analytic Scoring - A type of rubric scoring that separates the whole into categories of criteria that are examined one at a time. Student writing, for example, might be scored on the basis of grammar, organization, and clarity of ideas. Useful as a diagnostic tool. An analytic scale is useful when there are several dimensions on which the piece of work will be evaluated. (See Rubric.)

Anchor ?an example of student work at a specific level on a scoring rubric

Aptitude Test - A test intended to measure the test-taker's innate ability to learn, given before receiving instruction.

Assessment - The Latin root assidere means to sit beside. In an educational context, the process of observing learning; describing, collecting, recording, scoring, and interpreting information about a student's or one's own learning. At its most useful, assessment is an episode in the learning process; part of reflection and autobiographical understanding of progress. Traditionally, student assessments are used to determine placement, promotion, graduation, or retention.

In the context of institutional accountability, assessments are undertaken to determine the principal's performance, effectiveness of schools, etc. In the context of school reform, assessment is an essential tool for evaluating the effectiveness of changes in the teaching-learning process.

Assessment for improvement - Assessment that feeds directly, and often immediately, back into revising the course, program or institution to improve student learning results.

Assessment Literacy - The possession of knowledge about the basic principles of sound assessment practice, including terminology, the development and use of assessment methodologies and techniques, familiarity with standards of quality in assessment. Increasingly, familiarity with alternatives to traditional measurements of learning.

Assessment plan - A document that outlines the student learning outcomes and program objectives, the direct and indirect assessment methods used to demonstrate the attainment of each outcome/objective, a brief explanation of the assessment methods, an indication of which outcome(s)/objectives is/are addressed by each method, the intervals at which evidence is collected and reviewed, and the individual(s) responsible for the collection/review of evidence.

Assessment Task - An illustrative task or performance opportunity that closely targets defined instructional aims, allowing students to demonstrate their progress and capabilities.

Authentic Assessment - Evaluating by asking for the behavior the learning is intended to produce. The concept of model, practice, feedback in which students know what excellent performance is and are guided to practice an entire concept rather than bits and pieces in preparation for eventual understanding. A variety of techniques can be employed in authentic assessment.

The goal of authentic assessment is to gather evidence that students can use knowledge effectively and be able to critique their own efforts. Authentic tests can be viewed as "assessments of enablement," in Robert Glaser's words, ideally mirroring and measuring student performance in a "real-world" context. Tasks used in authentic assessment are meaningful and valuable, and are part of the learning process.

Authentic assessment can take place at any point in the learning process. Authentic assessment implies that tests are central experiences in the learning process, and that assessment takes place repeatedly. Patterns of success and failure are observed as learners use knowledge and skills in slightly ambiguous situations that allow the assessor to observe the student applying knowledge and skills in new situations over time.

Backload (--ed, --ing) - Amount of effort required after the data collection.

Benchmark - Student performance standards (the level(s) of student competence in a content area.)

An actual measurement of group performance against an established standard at defined points along the path toward the standard. Subsequent measurements of group performance use the benchmarks to measure progress toward achievement.

Examples of student achievement that illustrate points on a performance scale, used as exemplars. (See Descriptor, Cohort.)

Benchmark Assessment ?a formative (timely) assessment based on a district curriculum and related State learner expectations

Battery - A group or series of tests or subtests administered; the most common test batteries are achievement tests that include subtests in different areas.

Bell curve - See normal distribution curve.

Capstone Courses - could be a senior seminar or designated assessment course. Program learning outcomes can be integrated into assignments.

Case Studies - involve a systematic inquiry into a specific phenomenon, e.g. individual, event, program, or process. Data are collected via multiple methods often utilizing both qualitative and quantitative approaches.

CBM - "Curriculum Based Measurement."

Ceiling - The highest level of performance or score that a test can reliably measure.

Classroom Assessment - is often designed for individual faculty who wish to improve their teaching of a specific course. Data collected can be analyzed to assess student learning outcomes for a program.

Cohort - A group whose progress is followed by means of measurements at different points in time.

Collective Portfolios - Faculty assemble samples of student work from various classes and use the “collective?to assess specific program learning outcomes. Portfolios can be assessed by using scoring rubrics; expectations should be clarified before portfolios are examined.

Competency - (1) Level at which performance is acceptable.

Competency - (2) A group of characteristics, native or acquired, which indicate an individual's ability to acquire skills in a given area.

Competency Test - A test intended to establish that a student has met established minimum standards of skills and knowledge and is thus eligible for promotion, graduation, certification, or other official acknowledgment of achievement.

Composite score - The practice of combining two or more subtest scores to create an average or composite score. For example, a reading performance score may be an average of vocabulary and reading comprehension subtest scores.

Confounded - The situation in which the effect of a controlled variable is inextricably mixed with that of another, uncontrolled variable.

Content Analysis - is a procedure that categorizes the content of written documents. The analysis begins with identifying the unit of observation, such as a word, phrase, or concept, and then creating meaningful categories to which each item can be assigned. For example, a student’s statement that “I learned that I could be comfortable with someone from another culture?could be assigned to the category of “Positive Statements about Diversity.?The number of incidents that this type of response occurred can then be quantified and compared with neutral or negative responses addressing the same category.

Concept - An abstract, general notion -- a heading that characterizes a set of behaviors and beliefs.

Content Standard ?broad statement of learning as it applies to a specific subject area or learning strand

Convergent validity - General agreement among ratings, gathered independently of one another, where measures should be theoretically related.

Conversion table - A chart used to translate test scores into different measures of performance (e.g., grade equivalents and percentile ranks).

Core curriculum - Fundamental knowledge that all students are required to learn in school.

Criteria - Guidelines or rules that are used to judge performance.

Such tests usually cover relatively small units of content and are closely related to instruction. Their scores have meaning in terms of what the student knows or can do, rather than in (or in addition to) their relation to the scores made by some norm group. Frequently, the meaning is given in terms of a cutoff score, for which people who score above that point are considered to have scored adequately (“mastered? the material), while those who score below it are thought to have inadequate scores.

Course-embedded assessment - Course-embedded assessment refers to techniques that can be utilized within the context of a classroom (one class period, several or over the duration of the course) to assess students' learning, as individuals and in groups. When used in conjunction with other assessment tools, course-embedded assessment can provide valuable information at specific points of a program. For example, faculty members teaching multiple sections of an introductory course might include a common pre-test to determine student knowledge, skills and dispositions in a particular field at program admission. There are literally hundreds of classroom assessment techniques, limited only by the instructor's imagination (see also embedded assessment).

Criterion-referenced - Criterion-referenced tests determine what test-takers can do and what they know, not how they compare to others. Criterion-referenced tests report on how well students are doing relative to a predetermined performance level on a specified set of educational goals or outcomes included in the curriculum. For example, student scores on tests as indicators of student performance on standardized exams.

Criterion Referenced Tests - A test in which the results can be used to determine a student's progress toward mastery of a content area. Performance is compared to an expected level of mastery in a content area rather than to other students' scores. Such tests usually include questions based on what the student was taught and are designed to measure the student's mastery of designated objectives of an instructional program. The "criterion" is the standard of performance established as the passing score for the test. Scores have meaning in terms of what the student knows or can do, rather than how the test-taker compares to a reference or norm group. Criterion referenced tests can have norms, but comparison to a norm is not the purpose of the assessment.

Criterion referenced tests have also been used to provide information for program evaluation, especially to track the success or progress of schools and student populations that have been involved in change or that are at risk of inequity. In this case, the tests are not used to compare teachers, teams or buildings within a district but rather to give feedback on progress of groups and individuals.

Curriculum Alignment - The degree to which a curriculum's scope and sequence matches a testing program's evaluation measures, thus ensuring that teachers will use successful completion of the test as a goal of classroom instruction.

Curriculum-embedded or Learning-embedded Assessment - Assessment that occurs simultaneously with learning such as projects, portfolios and "exhibitions." Occurs in the classroom setting, and, if properly designed, students should not be able to tell whether they are being taught or assessed. Tasks or tests are developed from the curriculum or instructional materials.

Curriculum - Instructional plan of skills, lessons, and objectives on a particular subject; may be authored by a state, textbook publisher. A teacher typically executes this plan.

Curriculum-Based Measurement (CBM) - A method to measure student progress in academic areas including math, reading, writing, and spelling. The child is tested briefly (1 to 5 minutes) each week. Scores are recorded on a graph and compared to the expected performance on the content for that year. The graph allows the teacher and parents to see quickly how the child’s performance compares to expectations.

Cut Score - Score used to determine the minimum performance level needed to pass a competency test. (See Descriptor for another type of determiner.)

Descriptor - A set of signs used as a scale against which a performance or product is placed in an evaluation. An example from Grant Wiggins' Glossary of Useful Terms Related to Authentic and Performance Assessments is taken from "the CAP writing test where a 5 out of a possible 6 is described: 'The student describes the problem adequately and argues convincingly for at least one solution . . . without the continual reader awareness of the writer of a 6.'"

Descriptors allow assessment to include clear guidelines for what is and is not valued in student work. Wiggins adds that "[t]he word 'descriptor' reminds us that justifiable value judgments are made by know how to empirically describe the traits of work we do and do not value." (Emphasis his.)

Derived Score - A score to which raw scores are converted by numerical transformation (e.g., conversion of raw scores to percentile ranks or standard scores).

Diagnostic Test - A test used to diagnose, analyze or identify specific areas of weakness and strength; to determine the nature of weaknesses or deficiencies; diagnostic achievement tests are used to measure skills.

Dimension - Aspects or categories in which performance in a domain or subject area will be judged. Separate descriptors or scoring methods may apply to each dimension of the student's performance assessment.

Direct assessment methods - These methods involve students' displays of knowledge and skills (e.g. test results, written assignments, presentations, classroom assignments) resulting from learning experiences in the class/program.

Embedded assessment - A means of gathering information about student learning that is built into and a natural part of the teaching learning process. Often used for assessment purposes in classroom assignments that are evaluated to assign students a grade. Can assess individual student performance or aggregate the information to provide information about the course or program; can be formative or summative, quantitative or qualitative. Example: as part of a course, expecting each senior to complete a research paper that is graded for content and style, but is also assessed for advanced ability to locate and evaluate Web-based information (as part of a college-wide outcome to demonstrate information literacy).

eportfolio (electronic portfolio) - An electronic format of a collection of work developed across varied contexts over time. The eportfolio can advance learning by providing students and/or faculty with a way to organize, archive and display pieces of work. The electronic format allows faculty and other professionals to evaluate student portfolios using technology, which may include the Internet, CD-ROM, video, animation or audio. Electronic portfolios are becoming a popular alternative to traditional paper-based portfolios because they offer practitioners and peers the opportunity to review, communicate and assess portfolios in an asynchronous manner (see also portfolios also called course-embedded assessment).

Essay Test - A test that requires students to answer questions in writing. Responses can be brief or extensive. Tests for recall, ability to apply knowledge of a subject to questions about the subject, rather than ability to choose the least incorrect answer from a menu of options.

Expected Growth - The average change in test scores that occurs over a specific time for individuals at age or grade levels.

Evaluation - Both qualitative and quantitative descriptions of pupil behavior plus value judgments concerning the desirability of that behavior. Using collected information (assessments) to make informed decisions about continued instruction, programs, activities. Exemplar Model of excellence. (See Benchmark, Norm, Rubric, Standard.)

Evaluation - (1) Depending on the context, evaluation may mean either assessment or test. Many test manufacturers and teachers use these three terms interchangeably which means you have to pay close attention to how the terms are being used and why they are being used that way. For instance, tests that do not provide any immediate, helpful feedback to students and teachers should never be called “assessments,?but many testing companies and some administrators use this term to describe tests that return only score numbers to students and/or teachers.

Evaluation - (2) When used for most educational settings, evaluation means to measure, compare, and judge the quality of student work, schools, or specific educational programs.

Evaluation - (3) A value judgment about the results of assessment data. For example, evaluation of student learning requires that educators compare student performance to a standard to determine how the student measures up. Depending on the result, decisions are made regarding whether and how to improve student performance.

Exit and other interviews - Asking individuals to share their perceptions of their own attitudes and/or behaviors or those of others, evaluating student reports of their attitudes and/or behaviors in a face-to-face-dialogue.

External Assessment - Use of criteria (rubric) or an instrument developed by an individual or organization external to the one being assessed.

External examiner - Using an expert in the field from outside your program, usually from a similar program at another institution to conduct, evaluate, or supplement assessment of your students. Information can be obtained from external evaluators using many methods including surveys, interviews, etc.

External validity - External validity refers to the extent to which the results of a study are generalizable or transferable to other settings. Generalizibality is the extent to which assessment findings and conclusions from a study conducted on a sample population can be applied to the population at large. Transferability is the ability to apply the findings in one context to another similar context.

Fairness - (1) Assessment or test that provides an even playing field for all students. Absolute fairness is an impossible goal because all tests privilege some test takers over others; standardized tests provide one kind of fairness while performance tests provide another. The highest degree of fairness can be achieved when students can demonstrate their understanding in a variety of ways.

Fairness - (2) Teachers, students, parents and administrators agree that the instrument has validity, reliability, and authenticity, and they therefore have confidence in the instrument and its results.

Floor - The lowest score that a test can reliably measure.

Frequency distribution - A method of displaying test scores.

Forced-choice - The respondent only has a choice among given responses (e.g., very poor, poor, fair, good, very good).

Formative Assessment - Observations which allow one to determine the degree to which students know or are able to do a given learning task, and which identifies the part of the task that the student does not know or is unable to do. Outcomes suggest future steps for teaching and learning. (See

Summative Assessment.)

Frontload (--ed, --ing) - Amount of effort required in the early stage of assessment method development or data collection

Generalization (generalizability) - The extent to which assessment findings and conclusions from a study conducted on a sample population can be applied to the population at large.

Goal-free evaluation - Goal-free evaluation focuses on actual outcomes rather than intended program outcomes. Evaluation is done without prior knowledge of the goals of the program.

Grade Equivalent - A score that describes student performance in terms of the statistical performance of an average student at a given grade level. A grade equivalent score of 5.5, for example, might indicate that the student's score is what could be expected of a average student doing average work in the fifth month of the fifth grade. This score allows for a theoretical or approximate comparison across grades. It ranges from September of the kindergarten year (K. O.) to June of the senior year in high school (12.9) Useful as a ranking score, grade equivalents are only a theoretical or approximate comparison across grades. In this case, it may not indicate what the student would actually score on a test given to a midyear fifth grade class.

High stakes test - A test whose results have important, direct consequences for examinees, program, or institutions tested.

“High stakes?use of assessment - The decision to use the results of assessment to set a hurdle that needs to be cleared for completing a program of study, receiving certification, or moving to the next level. Most often the assessment so used is externally developed, based on set standards, carried out in a secure testing situation, and administered at a single point in time. Examples: at the secondary school level, statewide exams required for graduation; in postgraduate education, the bar exam.

Holistic Method - In assessment, assigning a single score based on an overall assessment of performance rather than by scoring or analyzing dimensions individually. The product is considered to be more than the sum of its parts and so the quality of a final product or performance is evaluated rather than the process or dimension of performance. A holistic scoring rubric might combine a number of elements on a single scale. Focused holistic scoring may be used to evaluate a limited portion of a learner's performance.

Indirect assessment of learning - Gathers reflection about the learning or secondary evidence of its existence. Example: a student survey about whether a course or program helped develop a greater sensitivity to issues of diversity.

Intelligence tests - Tests that measure aptitude or intellectual capacities (Examples: Wechsler Intelligence Scale for Children (WISC-III-R) and Stanford-Binet (SB:IV).

Intelligence quotient (IQ) - Score achieved on an intelligence test that identifies learning potential.

Item - A question or exercise in a test or assessment.

Inter-rater reliability - The degree to which different raters/observers give consistent estimates of the same phenomenon.

Internal validity - Internal validity refers to (1) the rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore.

Interviews - are conversations or direct questioning with an individual or group of people. The interviews can be conducted in person or on the telephone. The length of an interview can vary from 20 minutes to over an hour. Interviewers should be trained to follow agreed-upon procedures (protocols).

Interpreting ?An approach to analysis that explains the meaning or significance of student assessment data (See analysis.).

I. Q. Tests - The first of the standardized norm-referenced tests, developed during the nineteenth century. Traditional psychologists believe that neurological and genetic factors underlie "intelligence" and that scoring the performance of certain intellectual tasks can provide assessors with a measurement of general intelligence. There is a substantial body of research that suggests that I.Q. tests measure only certain analytical skills, missing many areas of human endeavor considered to be intelligent behavior. I. Q is considered by some to be fixed or static; whereas an increasing number of researchers are finding that intelligence is an ongoing process that continues to change throughout life.

Item Analysis - Analyzing each item on a test to determine the proportions of students selecting each answer. Can be used to evaluate student strengths and weaknesses; may point to problems with the test's validity and to possible bias.

Item Difficulty - Item difficulty is simply the percentage of students taking the test who answered the item correctly. The larger the percentage getting an item right, the easier the item. The higher the difficulty index, the easier the item is understood to be. To compute the item difficulty, divide the number of people answering the item correctly by the total number of people answering item. The proportion for the item is usually denoted as p and is called item difficulty. An item answered correctly by 85% of the examinees would have an item difficulty, or p value, of .85, whereas an item answered correctly by 50% of the examinees would have a lower item difficulty, or p value, of .50.

Journals - Students' personal records and reactions to various aspects of learning and developing ideas. A reflective process often found to consolidate and enhance learning.

Large Scale Assessment ? Standardized assessment program designed to evaluate the achievement of large groups of students.

Longitudinal studies - Data collected from the same population at different points in time.

Mastery Level - The cutoff score on a criterion-referenced or mastery test; people who score at or above the cutoff score are considered to have mastered the material; mastery may be an arbitrary judgment.

Mastery Test - A test that determines whether an individual has mastered a unit of instruction or skill; a test that provides information about what an individual knows, not how his or her performance compares to the norm group.

Mean - Average score; sum of individual scores divided by the total number of scores.

Median - The middle score in a distribution or set of ranked scores; the point (score) that divides a group into two equal parts; the 50th percentile. Half the scores are below the median, and half are above it.

Miscue Analysis ?Analysis of the test item options selected as the basis for determining student control of related processes and concepts.

Mode - The score or value that occurs most often in a distribution.

Modifications - Changes in the content, format, and/or administration of a test to accommodate test takers who are unable to take the test under standard test conditions. Modifications alter what the test is designed to measure or the comparability of scores.

Matrices - are used to summarize the relationship between program objectives and courses, course assignments, or course syllabus objectives to examine congruence and to ensure that all objectives have been sufficiently structured into the curriculum.

Mean - One of several ways of representing a group with a single, typical score. It is figured by adding up all the individual scores in a group and dividing them by the number of people in the group. Can be affected by extremely low or high scores.

Measurement - Quantitative description of student learning and qualitative description of student attitude.

Median - The point on a scale that divides a group into two equal subgroups. Another way to represent a group's scores with a single, typical score. The median is not affected by low or high scores as is the mean. (See Norm.)

Metacognition - The knowledge of one's own thinking processes and strategies, and the ability to consciously reflect and act on the knowledge of cognition to modify those processes and strategies.

Multidimensional Assessment - Assessment that gathers information about a broad spectrum of abilities and skills (as in Howard Gardner's theory of Multiple Intelligences.

Multiple Choice Tests - A test in which students are presented with a question or an incomplete sentence or idea. The students are expected to choose the correct or best answer/completion from a menu of alternatives.

National percentile rank - Indicates the relative standing of one child when compared with others in the same grade; percentile ranks range from a low score of 1 to a high score of 99.

Normal distribution curve - A distribution of scores used to scale a test. Normal distribution curve is a bell-shaped curve with most scores in the middle and a small number of scores at the low and high ends.

Norm - A distribution of scores obtained from a norm group. The norm is the midpoint (or median) of scores or performance of the students in that group. Fifty percent will score above and fifty percent below the norm.

Norm Group - A random group of students selected by a test developer to take a test to provide a range of scores and establish the percentiles of performance for use in establishing scoring standards.

Norm Referenced Tests - A test in which a student or a group's performance is compared to that of a norm group. The student or group scores will not fall evenly on either side of the median established by the original test takers. The results are relative to the performance of an external group and are designed to be compared with the norm group providing a performance standard. Often used to measure and compare students, schools, districts, and states on the basis of norm-established scales of achievement.

Normal Curve Equivalent - A score that ranges from 1-99, often used by testers to manipulate data arithmetically. Used to compare different tests for the same student or group of students and between different students on the same test. An NCE is a normalized test score with a mean of 50 and a standard deviation of 21.06. NCEs should be used instead of percentiles for comparative purposes. Required by many categorical funding agencies, e.g., Chapter I or Title I.

Objective Test - A test for which the scoring procedure is completely specified enabling agreement among different scorers. A correct-answer test.

Observations - can be of any social phenomenon, such as student presentations, students working in the library, or interactions at student help desks. Observations can be recorded as a narrative or in a highly structured format, such as a checklist, and they should be focused on specific program objectives.

Observer effect - The degree to which the assessment results are affected by the presence of an observer.

On-Demand Assessment - An assessment process that takes place as a scheduled event outside the normal routine. An attempt to summarize what students have learned that is not embedded in classroom activity.

Open-ended - Assessment questions that are designed to permit spontaneous and unguided responses.

Operational (--ize) - Defining a term or object so that it can be measured. Generally states the operations or procedures used that distinguish it from others.

Oral examination - An assessment of student knowledge levels through a face-to-face dialogue between the student and examiner-usually faculty.

Outcome - An operationally defined educational goal, usually a culminating activity, product, or performance that can be measured.

Objectives - tated, desirable outcomes of education.

Out-of-Level Testing - Means assessing students in one grade level using versions of tests that were designed for students in other (usually lower) grade levels; may not assess the same content standards at the same levels as are assessed in the grade-level assessment.

P Value - A p value is basically a behavioral measure. Rather than defining difficulty in terms of some intrinsic characteristic of the item, difficulty is defined in terms of the relative frequency with which those taking the test choose the correct response (Thorndike et al, 1991).

Percentile - A ranking scale ranging from a low of 1 to a high of 99 with 50 as the median score. A percentile rank indicates the percentage of a reference or norm group obtaining scores equal to or less than the test-taker's score. A percentile score does not refer to the percentage of questions answered correctly, it indicates the test-taker's standing relative to the norm group standard.

Performance appraisals - A competency-based method whereby abilities are measured in most direct, real-world approach. Systematic measurement of overt demonstration of acquired skills.

Performance-Based Assessment - Direct, systematic observation and rating of student performance of an educational objective, often an ongoing observation over a period of time, and typically involving the creation of products. The assessment may be a continuing interaction between teacher and student and should ideally be part of the learning process. The assessment should be a real-world performance with relevance to the student and learning community. Assessment of the performance is done using a rubric, or analytic scoring guide to aid in objectivity. Performance-based assessment is a test of the ability to apply knowledge in a real-life setting. Performance of exemplary tasks in the demonstration of intellectual ability.

Evaluation of the product of a learning experience can also be used to evaluate the effectiveness of teaching methods.

Stiggins defines performance-based assessment as the use of performance criteria to determine the degree to which a student has met an achievement target. Important elements of performance-based assessment include clear goals or performance criteria clearly articulated and communicated to the learner; the establishment of a sound sampling that clearly envisions the scope of an achievement target and the type of learning that is involved (use of problem-solving skills, knowledge acquisition, etc.) Attention to extraneous interference (cultural biases, language barriers, testing environment, tester biases) and establishment of a clear purpose for the data collected during the assessment before the assessment is undertaken, keeping in mind the needs of the groups involved (teachers, students, parents, etc.) (from an article by Richard J. Stiggins, "The Key to Unlocking High-Quality Performance Assessments." Assessment: How Do We Know What They Know? ASCD, 1992.

Performance Criteria - The standards by which student performance is evaluated. Performance criteria help assessors maintain objectivity and provide students with important information about expectations, giving them a target or goal to strive for.

Performance Expectation ?specific student task to achieve a unit result

Performance Indicator (PI) ?a State prescribed statement of performance related to a specific content or process standard

Portfolio - A systematic and organized collection of a student's work that exhibits to others the direct evidence of a student's efforts, achievements, and progress over a period of time. The collection should involve the student in selection of its contents, and should include information about the performance criteria, the rubric or criteria for judging merit, and evidence of student self-reflection or evaluation. It should include representative work, providing a documentation of the learner's performance and a basis for evaluation of the student's progress. Portfolios may include a variety of demonstrations of learning and have been gathered in the form of a physical collection of materials, videos, CD-ROMs, reflective journals, etc.

Portfolio Assessment - Portfolios may be assessed in a variety of ways. Each piece may be individually scored, or the portfolio might be assessed merely for the presence of required pieces, or a holistic scoring process might be used and an evaluation made on the basis of an overall impression of the student's collected work. It is common that assessors work together to establish consensus of standards or to ensure greater reliability in evaluation of student work. Established criteria are often used by reviewers and students involved in the process of evaluating progress and achievement of objectives.

Primary Trait Method - A type of rubric scoring constructed to assess a specific trait, skill, behavior, or format, or the evaluation of the primary impact of a learning process on a designated audience.

Process - A generalizable method of doing something, generally involving steps or operations which are usually ordered and/or interdependent. Process can be evaluated as part of an assessment, as in the example of evaluating a student's performance during prewriting exercises leading up to the final production of an essay or paper.

Process Standard - broad statement addressing procedure and connections as it applies to a subject

Product - The tangible and stable result of a performance or task. An assessment is made of student performance based on evaluation of the product of a demonstration of learning.

Profile - A graphic compilation of the performance of an individual on a series of assessments.

Power Test - Measures performance unaffected by speed of response; time not critical; items usually arranged in order of increasing difficulty.

Profile - A graphic representation of an individual’s scores on several tests or subtests; allows for easy identification of strengths or weaknesses across different tests or subtests.

Progress monitoring. A scientifically based practice used to assess students' academic performance and evaluate the effectiveness of instruction; can be implemented with individual students or an entire class.

Project - A complex assignment involving more than one type of activity and production. Projects can take a variety of forms, some examples are a mural construction, a shared service project, or other collaborative or individual effort.

Qualitative methods of assessment - Methods that rely on descriptions rather than numbers. Examples: ethnographic field studies, logs, journals, participant observations, open-ended questions on interviews and surveys.

Quantitative methods of assessment - Methods that rely on numerical scores or ratings. Examples: surveys, inventories, institutional/departmental data, departmental/course-level exams (locally constructed, standardized, etc.)

Quartile - The breakdown of an aggregate of percentile rankings into four categories: the 0-25th percentile, 26-50th percentile, etc.

Quintile - The breakdown of an aggregate of percentile rankings into five categories: the 0-20th percentile, 21-40th percentile, etc.

Rating Scale - A scale based on descriptive words or phrases that indicate performance levels. Qualities of a performance are described (e.g., advanced, intermediate, novice) in order to designate a level of achievement. The scale may be used with rubrics or descriptions of each level of performance.

Raw score - A raw score is the number of questions answered correctly on a test or subtest. For example, if a test has 59 items and the student gets 23 items correct, the raw score would be 23. Raw scores are converted to percentile ranks, standard scores, grade equivalent and age equivalent scores.

Reflective Essays - generally are brief (five to ten minute) essays on topics related to identified learning outcomes, although they may be longer when assigned as homework. Students are asked to reflect on a selected issue. Content analysis is used to analyze results.

Regression Analysis - In statistics, regression analysis refers to techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps us understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables ?that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.

Reliability - The measure of consistency for an assessment instrument. The instrument should yield similar results over time with similar populations in similar circumstances.

Response to Intervention (RTI) - Use of research-based instruction and interventions to students who are at risk and who are suspected of having specific learning disabilities.

Rigor ?academic or cognitive challenge of a test item or task with four levels: 1=prior knowledge/recall; 2=routine task for the grade/course; 3=reasoning task; 4=stretch

Rubric - Some of the definitions of rubric are contradictory. In general a rubric is a scoring guide used in subjective assessments. A rubric implies that a rule defining the criteria of an assessment system is followed in evaluation. A rubric can be an explicit description of performance characteristics corresponding to a point on a rating scale. A scoring rubric makes explicit expected qualities of performance on a rating scale or the definition of a single scoring point on a scale.

Salience - A striking point or feature.

Sampling - A way to obtain information about a large group by examining a smaller, randomly chosen selection (the sample) of group members. If the sampling is conducted correctly, the results will be representative of the group as a whole. Sampling may also refer to the choice of smaller tasks or processes that will be valid for making inferences about the student's performance in a larger domain. "Matrix sampling" asks different groups to take small segments of a test; the results will reflect the ability of the larger group on a complete range of tasks.

Scale - A classification tool or counting system designed to indicate and measure the degree to which an event or behavior has occurred.

Scale Scores - Scores based on a scale ranging from 001 to 999. Scale scores are useful in comparing performance in one subject area across classes, schools, districts, and other large populations, especially in monitoring change over time.

Score - A rating of performance based on a scale or classification.

Scoring Criteria - Rules for assigning a score or the dimensions of proficiency in performance used to describe a student's response to a task. May include rating scales, checklists, answer keys, and other scoring tools. In a subjective assessment situation, a rubric.

Scoring - A package of guidelines intended for people scoring performance assessments. May include instructions for raters, notes on training raters, rating scales, samples of student work exemplifying various levels of performance.

Self-Assessment - A process in which a student engages in a systematic review of a performance, usually for the purpose of improving future performance. May involve comparison with a standard, established criteria. May involve critiquing one's own work or may be a simple description of the performance. Reflection, self-evaluation, metacognition, are related terms.

Senior Project - Extensive projects planned and carried out during the senior year of high school as the culmination of the secondary school experience, senior projects require higher-level thinking skills, problem-solving, and creative thinking. They are often interdisciplinary, and may require extensive research. Projects culminate in a presentation of the project to a panel of people, usually faculty and community mentors, sometimes students, who evaluate the student's work at the end of the year.

Simulations - A competency-based measure where a person's abilities are measured in a situation that approximates a "real world" setting. Simulation is primarily used when it is impractical to observe a person performing a task in a real world situation (e.g. on the job).

Speed Test - A test in which performance is measured by the number of tasks performed in a given time. Examples are tests of typing speed and reading speed.

Standard score - Score on norm-referenced tests that are based on the bell curve and its equal distribution of scores from the average of the distribution. Standard scores are especially useful because they allow for comparison between students and comparisons of one student over time.

Standard deviation (SD) - A measure of the variability of a distribution of scores. The more the scores cluster around the mean, the smaller the standard deviation. In a normal distribution, 68% of the scores fall within one standard deviation above and one standard deviation below the mean.

Stakeholder - Anyone who has a vested interest in the outcome of the program/project. In a high stakes standardized test (a graduation requirement, for example), when students' scores are aggregated and published in the paper by school, the stakeholders include students, teachers, parents, school and district administrators, lawmakers (including the governor), and even real estate agents. It is always interesting to note which stakeholders seem to have the most at risk and which stakeholders seem to have the most power; these groups are seldom the same.

Standard: - The performance level associated with a particular rating or grade on a test. For instance, 90% may be the standard for an A in a particular course; on a standardized test, a cutting score or cut point is used to determine the difference between one standard and the next.

Standard-based assessment - A standard-based assessment assesses learner achievement in relation to set standards.

Standardized Test - An objective test that is given and scored in a uniform manner. Standardized tests are carefully constructed and items are selected after trials for appropriateness and difficulty. Tests are issued with a manual giving complete guidelines for administration and scoring. The guidelines attempt to eliminate extraneous interference that might influence test results. Scores are often are often norm-referenced.

Standards - Agreed upon values used to measure the quality of student performance, instructional methods, curriculum, etc.

Status report - A description of the implementation of the plan's assessment methods, the findings (evidence) from assessment methods, how the findings were used in decisions to maintain or improve student learning (academic programs) or unit outcomes (support units), the results of previous changes to improve outcomes, and the need for additional information and/or resources to implement an approved assessment plan or gather additional evidence.

Summative assessment - Assessment that is done at the conclusion of a course or some larger instructional period (e.g., at the end of the program). The purpose is to determine success or to what extent the program/project/course met its goals.

Subtest - A group of test items that measure a specific area (i.e., math calculation and reading comprehension). Several subtests make up a test.

Surveys - are commonly used with open-ended and closed-ended questions. Closed ended questions require respondents to answer the question from a provided list of responses. Typically, the list is a progressive scale ranging from low to high, or strongly agree to strongly disagree.

Subjective Test - A test in which the impression or opinion of the assessor determines the score or evaluation of performance. A test in which the answers cannot be known or prescribed in advance.

Summative Assessment - Evaluation at the conclusion of a unit or units of instruction or an activity or plan to determine or judge student skills and knowledge or effectiveness of a plan or activity. Outcomes are the culmination of a teaching/learning process for a unit, subject, or year's study. (See Formative Assessment.)

Test - A formal assessment of student achievement. Teacher made tests can take many forms; external tests are always standardized. A portfolio can be used as a test, as can a project or exhibition.

Third party - Person(s) other than those directly involved in the educational process (e.g., employers, parents, consultants).

Topology - Mapping of the relationships among subjects.

Transcript Analysis - are examined to see if students followed expected enrollment patterns or to examine specific research questions, such as to explore differences between transfer and freshmen enrolled students.

Triangulate (triangulation) - The use of a combination of assessment methods in a study. An example of triangulation would be an assessment that incorporated surveys, interviews, and observations.

T-Score - A standard score with a mean of 50 and a standard deviation of 10. A T-score of 60 represents a score that is 1 standard deviation above the mean.

Test bias - The difference in test scores that is attributable to demographic variables (e.g., gender, ethnicity, and age).

Utility - (1) Usefulness of assessment results.

Utility - (2) The relative value of an outcome with respect to a set of other possible outcomes. Hence test utility refers to an evaluation, often in cost-benefit form, of the relative value of using a test vs. not using it, of using a test in one manner vs. another, or of using one test vs. another test.

Validity - Validity refers to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure. Validity has three components:

     • relevance - the option measures your educational objective as directly as possible

     • accuracy - the option measures your educational objective as precisely as possible

     • utility - the option provides formative and summative results with clear implications for educational program evaluation and improvement

Value added - The increase in learning that occurs during a course, program, or undergraduate education. Can either focus on the individual student (how much better a student can write, for example, at the end than at the beginning) or on a cohort of students (whether senior papers demonstrate more sophisticated writing skills-in the aggregate-than freshmen papers). Requires a baseline measurement for comparison.

Variable (variability) - Observable characteristics that vary among individuals responses.

Voluntary System of Accountability (VSA) - a joint accountability initiative by the American Association of State Colleges and Universities (AASCU) and the Association of Public and Land Grand Universities (APLU) aimed at making institutional data transparent.

z-Score - A standard score with a mean of 0 (zero) and a standard deviation of 1.

The definitions of most terms in this glossary were derived from several sources, including:

  • Glossary of Useful Terms Related to Authentic and Performance Assessments. Grant Wiggins
  • SCASS Arts Assessment Project Glossary of Assessment Terms
  • Working Definitions for Assessment Technology
  • A True Test: Toward a More Authentic and Equitable Assessment. Grant Wiggins. Phi Delta Kappan, 5/89. (703-713)
  • The ERIC Review: Performance-Based Assessment. Vol. 3 Issue 1, Winter, 1994.
  • Assessment: How Do We Know What They Know? ASCD. 1992.
  • Assessment as an episode of learning. Dennie Palmer Wolf
  • Dissolving the Boundaries: Assessment that Enhances Learning. Dee Dickinson
  • Results-Based Education Model (R-BEM?, Bruce H. Crowder