Ability - A characteristic that is indicative of competence in a
field. (See also aptitude.)
Ability Testing - Use of standardized tests to evaluate an individual’s
performance in a specific area (i.e., cognitive, psychomotor, or physical
functioning).
Achievement tests - Standardized tests that measure knowledge and skills in
academic subject areas (i.e., math, spelling, and reading).
Accommodations - Describe changes in format, response, setting, timing,
or scheduling that do not alter in any significant way what the test measures
or the comparability of scores. Accommodations are designed to ensure that an
assessment measures the intended construct, not the child’s disability.
Accommodations affect three areas of testing: 1) the administration of tests,
2) how students are allowed to respond to the items, and 3) the presentation of
the tests (how the items are presented to the students on the test instrument).
Accommodations may include Braille forms of a test for blind students or tests
in native languages for students whose primary language is other than English.
Age Equivalent - The chronological age in a population for which a score
is the median (middle) score. If children who are 10 years and 6 months old
have a median score of 17 on a test, the score 17 has an age equivalent of
10-6.
Alternate Forms - Two or more versions of a test that are considered
interchangeable, in that they measure the same constructs in the same ways, are
intended for the same purposes, and are administered using the same directions.
Aptitude - An individual’s ability to learn or to develop
proficiency in an area if provided with appropriate education or training.
Aptitude tests include tests of general academic (scholastic) ability; tests of
special abilities (i.e., verbal, numerical, mechanical); tests that assess
“readiness?for learning; and tests that measure ability and previous learning
that are used to predict future performance.
Aptitude tests - Tests that measure an individual’s collective knowledge;
often used to predict learning potential. See also ability test.
Accountability - The
demand by a community (public officials, employers, and taxpayers) for school
officials to prove that money invested in education has led to measurable
learning. "Accountability testing" is an attempt to sample what
students have learned, or how well teachers have taught, and/or the
effectiveness of a school's principal's performance as an instructional leader.
School budgets and personnel promotions, compensation, and awards may be
affected. Most school districts make this kind of assessment public; it can
affect policy and public perception of the effectiveness of taxpayer-supported
schools and be the basis for comparison among schools.
Accountability is often viewed as
an important factor in education reform. An assessment system connected to
accountability can help identify the needs of schools so that resources can be
equitably distributed. In this context, accountability assessment can include
such indicators as equity, competency of teaching staff, physical
infrastructure, curriculum, class size, instructional methods, existence of
tracking, number of higher cost students, dropout rates, and parental
involvement as well as student test scores. It has been suggested that test
scores analyzed in a disaggregated format can help identify instructional
problems and point to potential solutions.
Achievement Test - A
standardized test designed to efficiently measure the amount of knowledge
and/or skill a person has acquired, usually as a result of classroom
instruction. Such testing produces a statistical profile used as a measurement
to evaluate student learning in comparison with a standard or norm.
Action Research - School
and classroom-based studies initiated and conducted by teachers and other
school staff. Action research involves teachers, aides, principals, and other
school staff as researchers who systematically reflect on their teaching or
other work and collect data that will answer their questions. It offers staff
an opportunity to explore issues of interest to them in an effort to improve
classroom instruction and educational effectiveness. (Source: Bennett, C.
K."Promoting teacher reflection through action research: What do teachers
think?" Journal of Staff Development, 1994, 15, 34-38.)
Affective - Outcomes
of education involving feelings more than understanding; likes, pleasures
ideals, dislikes annoyances, values.
Alternative Assessment
- Many educators prefer the description "assessment
alternatives" to describe alternatives to traditional, standardized, norm-
or criterion-referenced traditional paper and pencil testing. An alternative
assessment might require students to answer an open-ended question, work out a
solution to a problem, perform a demonstration of a skill, or in some way
produce work rather than select an answer from choices on a sheet of paper.
Portfolios and instructor observation of students are also alternative forms of
assessment.
Analytic Scoring - A
type of rubric scoring that separates the whole into categories of criteria
that are examined one at a time. Student writing, for example, might be scored
on the basis of grammar, organization, and clarity of ideas. Useful as a
diagnostic tool. An analytic scale is useful when there are several dimensions
on which the piece of work will be evaluated. (See Rubric.)
Anchor ?an
example of student work at a specific level on a scoring rubric
Aptitude Test - A
test intended to measure the test-taker's innate ability to learn, given before
receiving instruction.
Assessment - The
Latin root assidere means to sit beside. In an educational context, the
process of observing learning; describing, collecting, recording, scoring, and
interpreting information about a student's or one's own learning. At its most
useful, assessment is an episode in the learning process; part of reflection
and autobiographical understanding of progress. Traditionally, student
assessments are used to determine placement, promotion, graduation, or
retention.
In the context of institutional
accountability, assessments are undertaken to determine the principal's
performance, effectiveness of schools, etc. In the context of school reform,
assessment is an essential tool for evaluating the effectiveness of changes in
the teaching-learning process.
Assessment for
improvement - Assessment that feeds directly, and often immediately,
back into revising the course, program or institution to improve student
learning results.
Assessment Literacy - The
possession of knowledge about the basic principles of sound assessment
practice, including terminology, the development and use of assessment
methodologies and techniques, familiarity with standards of quality in
assessment. Increasingly, familiarity with alternatives to traditional
measurements of learning.
Assessment plan - A document that outlines
the student learning outcomes and program objectives, the direct and indirect
assessment methods used to demonstrate the attainment of each outcome/objective,
a brief explanation of the assessment methods, an indication of which
outcome(s)/objectives is/are addressed by each method, the intervals at which
evidence is collected and reviewed, and the individual(s) responsible for the
collection/review of evidence.
Assessment Task - An
illustrative task or performance opportunity that closely targets defined
instructional aims, allowing students to demonstrate their progress and
capabilities.
Authentic Assessment -
Evaluating by asking for the behavior the learning is intended to
produce. The concept of model, practice, feedback in which students know what
excellent performance is and are guided to practice an entire concept rather
than bits and pieces in preparation for eventual understanding. A variety of techniques
can be employed in authentic assessment.
The goal of authentic assessment is
to gather evidence that students can use knowledge effectively and be able to
critique their own efforts. Authentic tests can be viewed as "assessments
of enablement," in Robert Glaser's words, ideally mirroring and measuring
student performance in a "real-world" context. Tasks used in
authentic assessment are meaningful and valuable, and are part of the learning
process.
Authentic assessment can take place
at any point in the learning process. Authentic assessment implies that tests
are central experiences in the learning process, and that assessment takes
place repeatedly. Patterns of success and failure are observed as learners use
knowledge and skills in slightly ambiguous situations that allow the assessor
to observe the student applying knowledge and skills in new situations over
time.
Backload (--ed, --ing) - Amount of effort
required after the data collection.
Benchmark - Student performance standards (the
level(s) of student competence in a content area.)
An actual measurement of group
performance against an established standard at defined points along the path
toward the standard. Subsequent measurements of group performance use the
benchmarks to measure progress toward achievement.
Examples of student achievement
that illustrate points on a performance scale, used as exemplars. (See Descriptor,
Cohort.)
Benchmark Assessment
?a formative (timely) assessment based on a district curriculum and related State
learner expectations
Battery - A group or series of tests or subtests administered; the
most common test batteries are achievement tests that include subtests in
different areas.
Bell curve - See normal
distribution curve.
Capstone Courses -
could be a senior seminar or designated assessment course. Program learning
outcomes can be integrated into assignments.
Case Studies - involve
a systematic inquiry into a specific phenomenon, e.g. individual, event, program,
or process. Data are collected via multiple methods often utilizing both
qualitative and quantitative approaches.
CBM - "Curriculum Based
Measurement."
Ceiling - The highest level
of performance or score that a test can reliably measure.
Classroom Assessment
- is often designed for individual faculty who wish to improve their teaching
of a specific course. Data collected can be analyzed to assess student learning
outcomes for a program.
Cohort - A group whose progress is followed by means
of measurements at different points in time.
Collective Portfolios
- Faculty assemble samples of student work from various classes and use the
“collective?to assess specific program learning outcomes. Portfolios can be
assessed by using scoring rubrics; expectations should be clarified before
portfolios are examined.
Competency
- (1) Level at which performance is acceptable.
Competency - (2) A
group of characteristics, native or acquired, which indicate an individual's ability
to acquire skills in a given area.
Competency Test - A test intended to establish that a
student has met established minimum standards of skills and knowledge and is
thus eligible for promotion, graduation, certification, or other official
acknowledgment of achievement.
Composite score - The practice of combining two or more subtest scores to
create an average or composite score. For example, a reading performance score
may be an average of vocabulary and reading comprehension subtest scores.
Confounded - The
situation in which the effect of a controlled variable is inextricably mixed
with that of another, uncontrolled variable.
Content Analysis -
is a procedure that categorizes the content of written documents. The analysis
begins with identifying the unit of observation, such as a word, phrase, or
concept, and then creating meaningful categories to which each item can be
assigned. For example, a student’s statement that “I learned that I could be
comfortable with someone from another culture?could be assigned to the
category of “Positive Statements about Diversity.?The number of incidents that
this type of response occurred can then be quantified and compared with neutral
or negative responses addressing the same category.
Concept - An abstract, general notion -- a heading
that characterizes a set of behaviors and beliefs.
Content Standard
?broad statement of learning as it applies to a specific subject area or
learning strand
Convergent validity
- General agreement among ratings, gathered independently of one another, where
measures should be theoretically related.
Conversion table - A chart used to translate test scores into different
measures of performance (e.g., grade equivalents and percentile ranks).
Core curriculum - Fundamental knowledge that all students are required to
learn in school.
Criteria - Guidelines or rules that are used to judge performance.
Such tests usually cover
relatively small units of content and are closely related to instruction. Their
scores have meaning in terms of what the student knows or can do, rather than
in (or in addition to) their relation to the scores made by some norm group.
Frequently, the meaning is given in terms of a cutoff score, for which people
who score above that point are considered to have scored adequately (“mastered?
the material), while those who score below it are thought to have inadequate
scores.
Course-embedded
assessment - Course-embedded assessment refers to techniques that can
be utilized within the context of a classroom (one class period, several or
over the duration of the course) to assess students' learning, as individuals
and in groups. When used in conjunction with other assessment tools,
course-embedded assessment can provide valuable information at specific points
of a program. For example, faculty members teaching multiple sections of an
introductory course might include a common pre-test to determine student
knowledge, skills and dispositions in a particular field at program admission.
There are literally hundreds of classroom assessment techniques, limited only
by the instructor's imagination (see also embedded assessment).
Criterion-referenced
- Criterion-referenced tests determine what test-takers can do and what they
know, not how they compare to others. Criterion-referenced tests report on how
well students are doing relative to a predetermined performance level on a
specified set of educational goals or outcomes included in the curriculum. For
example, student scores on tests as indicators of student performance on
standardized exams.
Criterion Referenced Tests - A test in which the
results can be used to determine a student's progress toward mastery of a
content area. Performance is compared to an expected level of mastery in a
content area rather than to other students' scores. Such tests usually include
questions based on what the student was taught and are designed to measure the
student's mastery of designated objectives of an instructional program. The
"criterion" is the standard of performance established as the passing
score for the test. Scores have meaning in terms of what the student knows or
can do, rather than how the test-taker compares to a reference or norm group.
Criterion referenced tests can have norms, but comparison to a norm is not the
purpose of the assessment.
Criterion referenced tests have
also been used to provide information for program evaluation, especially to
track the success or progress of schools and student populations that have been
involved in change or that are at risk of inequity. In this case, the tests are
not used to compare teachers, teams or buildings within a district but rather
to give feedback on progress of groups and individuals.
Curriculum Alignment - The degree to which a
curriculum's scope and sequence matches a testing program's evaluation
measures, thus ensuring that teachers will use successful completion of the
test as a goal of classroom instruction.
Curriculum-embedded or Learning-embedded Assessment -
Assessment that occurs simultaneously with learning such as projects,
portfolios and "exhibitions." Occurs in the classroom setting, and,
if properly designed, students should not be able to tell whether they are
being taught or assessed. Tasks or tests are developed from the curriculum or
instructional materials.
Curriculum - Instructional plan of skills, lessons,
and objectives on a particular subject; may be authored by a state, textbook
publisher. A teacher typically executes this plan.
Curriculum-Based Measurement (CBM) - A method to measure student
progress in academic areas including math, reading, writing, and spelling. The
child is tested briefly (1 to 5 minutes) each week. Scores are recorded on a
graph and compared to the expected performance on the content for that year.
The graph allows the teacher and parents to see quickly how the child’s
performance compares to expectations.
Cut Score - Score used to determine the minimum
performance level needed to pass a competency test. (See Descriptor for
another type of determiner.)
Descriptor - A set of signs used as a scale against
which a performance or product is placed in an evaluation. An example from
Grant Wiggins' Glossary of Useful Terms Related to Authentic and Performance
Assessments is taken from "the CAP writing test where a 5 out of a
possible 6 is described: 'The student describes the problem adequately and argues
convincingly for at least one solution . . . without the continual reader
awareness of the writer of a 6.'"
Descriptors allow assessment to
include clear guidelines for what is and is not valued in student work. Wiggins
adds that "[t]he word 'descriptor' reminds us that justifiable value
judgments are made by know how to empirically describe the traits of work we do
and do not value." (Emphasis his.)
Derived Score - A score to which raw scores are converted by numerical
transformation (e.g., conversion of raw scores to percentile ranks or standard
scores).
Diagnostic Test - A test used to diagnose, analyze or identify specific
areas of weakness and strength; to determine the nature of weaknesses or
deficiencies; diagnostic achievement tests are used to measure skills.
Dimension - Aspects or categories in which
performance in a domain or subject area will be judged. Separate descriptors or
scoring methods may apply to each dimension of the student's performance assessment.
Direct
assessment methods - These methods involve students' displays of
knowledge and skills (e.g. test results, written assignments, presentations,
classroom assignments) resulting from learning experiences in the
class/program.
Embedded
assessment - A means of gathering information about student learning
that is built into and a natural part of the teaching learning process. Often
used for assessment purposes in classroom assignments that are evaluated to
assign students a grade. Can assess individual student performance or aggregate
the information to provide information about the course or program; can be
formative or summative, quantitative or qualitative. Example: as part of a
course, expecting each senior to complete a research paper that is graded for
content and style, but is also assessed for advanced ability to locate and
evaluate Web-based information (as part of a college-wide outcome to
demonstrate information literacy).
eportfolio (electronic
portfolio) - An electronic format of a collection of work developed
across varied contexts over time. The eportfolio can advance learning by
providing students and/or faculty with a way to organize, archive and display
pieces of work. The electronic format allows faculty and other professionals to
evaluate student portfolios using technology, which may include the Internet,
CD-ROM, video, animation or audio. Electronic portfolios are becoming a popular
alternative to traditional paper-based portfolios because they offer
practitioners and peers the opportunity to review, communicate and assess
portfolios in an asynchronous manner (see also portfolios also called
course-embedded assessment).
Essay Test - A test that requires students to answer
questions in writing. Responses can be brief or extensive. Tests for recall,
ability to apply knowledge of a subject to questions about the subject, rather
than ability to choose the least incorrect answer from a menu of options.
Expected Growth - The average change in test scores that occurs over a
specific time for individuals at age or grade levels.
Evaluation - Both qualitative and quantitative
descriptions of pupil behavior plus value judgments concerning the desirability
of that behavior. Using collected information (assessments) to make informed
decisions about continued instruction, programs, activities. Exemplar Model of
excellence. (See Benchmark, Norm, Rubric, Standard.)
Evaluation
- (1) Depending on the context, evaluation may mean either assessment or test.
Many test manufacturers and teachers use these three terms interchangeably which
means you have to pay close attention to how the terms are being used and why
they are being used that way. For instance, tests that do not provide any
immediate, helpful feedback to students and teachers should never be called
“assessments,?but many testing companies and some administrators use this term
to describe tests that return only score numbers to students and/or teachers.
Evaluation - (2)
When used for most educational settings, evaluation means to measure, compare,
and judge the quality of student work, schools, or specific educational
programs.
Evaluation - (3) A
value judgment about the results of assessment data. For example, evaluation of
student learning requires that educators compare student performance to a
standard to determine how the student measures up. Depending on the result,
decisions are made regarding whether and how to improve student performance.
Exit and other interviews
- Asking individuals to share their perceptions of their own attitudes and/or
behaviors or those of others, evaluating student reports of their attitudes
and/or behaviors in a face-to-face-dialogue.
External
Assessment - Use of criteria (rubric) or an instrument developed by an
individual or organization external to the one being assessed.
External examiner
- Using an expert in the field from
outside your program, usually from a similar program at another institution to
conduct, evaluate, or supplement assessment of your students. Information can
be obtained from external evaluators using many methods including surveys,
interviews, etc.
External validity
- External validity refers to the extent to which the results of a study are
generalizable or transferable to other settings. Generalizibality is the extent
to which assessment findings and conclusions from a study conducted on a sample
population can be applied to the population at large. Transferability is the
ability to apply the findings in one context to another similar context.
Fairness
- (1) Assessment or test that provides an even playing field for all students.
Absolute fairness is an impossible goal because all tests privilege some test
takers over others; standardized tests provide one kind of fairness while
performance tests provide another. The highest degree of fairness can be achieved
when students can demonstrate their understanding in a variety of ways.
Fairness - (2)
Teachers, students, parents and administrators agree that the instrument has
validity, reliability, and authenticity, and they therefore have confidence in
the instrument and its results.
Floor - The lowest score that a test can reliably measure.
Frequency distribution - A method of displaying test scores.
Forced-choice - The
respondent only has a choice among given responses (e.g., very poor, poor,
fair, good, very good).
Formative Assessment - Observations which allow one
to determine the degree to which students know or are able to do a given
learning task, and which identifies the part of the task that the student does
not know or is unable to do. Outcomes suggest future steps for teaching and
learning. (See
Summative Assessment.)
Frontload (--ed, --ing) - Amount of effort
required in the early stage of assessment method development or data collection
Generalization
(generalizability) - The extent to which assessment findings and
conclusions from a study conducted on a sample population can be applied to the
population at large.
Goal-free evaluation
- Goal-free evaluation focuses on actual outcomes rather than intended program
outcomes. Evaluation is done without prior knowledge of the goals of the
program.
Grade Equivalent - A score that describes student
performance in terms of the statistical performance of an average student at a
given grade level. A grade equivalent score of 5.5, for example, might indicate
that the student's score is what could be expected of a average student doing
average work in the fifth month of the fifth grade. This score allows for a
theoretical or approximate comparison across grades. It ranges from September
of the kindergarten year (K. O.) to June of the senior year in high school
(12.9) Useful as a ranking score, grade equivalents are only a theoretical or
approximate comparison across grades. In this case, it may not indicate what
the student would actually score on a test given to a midyear fifth grade
class.
High
stakes test - A test whose results have important, direct consequences
for examinees, program, or institutions tested.
“High
stakes?use of assessment - The decision to use the results of
assessment to set a hurdle that needs to be cleared for completing a program of
study, receiving certification, or moving to the next level. Most often the
assessment so used is externally developed, based on set standards, carried out
in a secure testing situation, and administered at a single point in time.
Examples: at the secondary school level, statewide exams required for
graduation; in postgraduate education, the bar exam.
Holistic Method - In assessment, assigning a single
score based on an overall assessment of performance rather than by scoring or
analyzing dimensions individually. The product is considered to be more than
the sum of its parts and so the quality of a final product or performance is
evaluated rather than the process or dimension of performance. A holistic
scoring rubric might combine a number of elements on a single scale. Focused
holistic scoring may be used to evaluate a limited portion of a learner's
performance.
Indirect
assessment of learning - Gathers reflection about the learning or
secondary evidence of its existence. Example: a student survey about whether a
course or program helped develop a greater sensitivity to issues of diversity.
Intelligence tests - Tests that measure aptitude or intellectual capacities
(Examples: Wechsler Intelligence Scale for Children (WISC-III-R) and
Stanford-Binet (SB:IV).
Intelligence quotient (IQ) - Score achieved on an intelligence test that identifies
learning potential.
Item - A question or exercise in a test or assessment.
Inter-rater reliability
- The degree to which different raters/observers give consistent estimates of
the same phenomenon.
Internal validity
- Internal validity refers to (1) the rigor with which the study was conducted
(e.g., the study's design, the care taken to conduct measurements, and
decisions concerning what was and wasn't measured) and (2) the extent to which
the designers of a study have taken into account alternative explanations for
any causal relationships they explore.
Interviews - are
conversations or direct questioning with an individual or group of people. The
interviews can be conducted in person or on the telephone. The length of an
interview can vary from 20 minutes to over an hour. Interviewers should be
trained to follow agreed-upon procedures (protocols).
Interpreting ?An
approach to analysis that explains the meaning or significance of student
assessment data (See analysis.).
I. Q. Tests - The first of the standardized
norm-referenced tests, developed during the nineteenth century. Traditional
psychologists believe that neurological and genetic factors underlie
"intelligence" and that scoring the performance of certain
intellectual tasks can provide assessors with a measurement of general
intelligence. There is a substantial body of research that suggests that I.Q.
tests measure only certain analytical skills, missing many areas of human
endeavor considered to be intelligent behavior. I. Q is considered by some to
be fixed or static; whereas an increasing number of researchers are finding
that intelligence is an ongoing process that continues to change throughout
life.
Item Analysis - Analyzing each item on a test to
determine the proportions of students selecting each answer. Can be used to
evaluate student strengths and weaknesses; may point to problems with the
test's validity and to possible bias.
Item Difficulty - Item difficulty is simply the percentage
of students taking the test who answered the item correctly. The larger the
percentage getting an item right, the easier the item. The higher the difficulty
index, the easier the item is understood to be. To compute the item difficulty,
divide the number of people answering the item correctly by the total number of
people answering item. The proportion for the item is usually denoted as p
and is called item difficulty. An item answered correctly by 85% of the
examinees would have an item difficulty, or p value, of .85, whereas an
item answered correctly by 50% of the examinees would have a lower item
difficulty, or p value, of .50.
Journals - Students' personal records and reactions
to various aspects of learning and developing ideas. A reflective process often
found to consolidate and enhance learning.
Large Scale Assessment ?
Standardized assessment program designed to evaluate the achievement of large
groups of students.
Longitudinal
studies - Data collected from the same population at different points
in time.
Mastery Level - The cutoff score on a criterion-referenced or mastery
test; people who score at or above the cutoff score are considered to have
mastered the material; mastery may be an arbitrary judgment.
Mastery Test - A test that determines whether an individual has
mastered a unit of instruction or skill; a test that provides information about
what an individual knows, not how his or her performance compares to the norm
group.
Mean - Average score; sum of individual scores divided by the
total number of scores.
Median - The middle score in a distribution or set of ranked
scores; the point (score) that divides a group into two equal parts; the 50th
percentile. Half the scores are below the median, and half are above it.
Miscue Analysis ?Analysis of the test item options selected as the basis
for determining student control of related processes and concepts.
Mode - The score or value that occurs most often in a
distribution.
Modifications - Changes in the content, format, and/or administration of
a test to accommodate test takers who are unable to take the test under
standard test conditions. Modifications alter what the test is designed to
measure or the comparability of scores.
Matrices - are
used to summarize the relationship between program objectives and courses,
course assignments, or course syllabus objectives to examine congruence and to
ensure that all objectives have been sufficiently structured into the
curriculum.
Mean - One of several ways of representing a group
with a single, typical score. It is figured by adding up all the individual
scores in a group and dividing them by the number of people in the group. Can
be affected by extremely low or high scores.
Measurement - Quantitative description of student
learning and qualitative description of student attitude.
Median - The point on a scale that divides a group
into two equal subgroups. Another way to represent a group's scores with a
single, typical score. The median is not affected by low or high scores as is
the mean. (See Norm.)
Metacognition - The knowledge of one's own thinking
processes and strategies, and the ability to consciously reflect and act on the
knowledge of cognition to modify those processes and strategies.
Multidimensional Assessment - Assessment that gathers
information about a broad spectrum of abilities and skills (as in Howard
Gardner's theory of Multiple Intelligences.
Multiple Choice Tests - A test in which students are
presented with a question or an incomplete sentence or idea. The students are
expected to choose the correct or best answer/completion from a menu of
alternatives.
National percentile rank - Indicates the relative standing of one child when
compared with others in the same grade; percentile ranks range from a low score
of 1 to a high score of 99.
Normal distribution curve - A distribution of scores used to scale a test. Normal
distribution curve is a bell-shaped curve with most scores in the middle and a
small number of scores at the low and high ends.
Norm - A distribution of scores obtained from a norm
group. The norm is the midpoint (or median) of scores or performance of the
students in that group. Fifty percent will score above and fifty percent below
the norm.
Norm Group - A random group of students selected by a
test developer to take a test to provide a range of scores and establish the
percentiles of performance for use in establishing scoring standards.
Norm Referenced Tests - A test in which a student or
a group's performance is compared to that of a norm group. The student or group
scores will not fall evenly on either side of the median established by the
original test takers. The results are relative to the performance of an
external group and are designed to be compared with the norm group providing a
performance standard. Often used to measure and compare students, schools,
districts, and states on the basis of norm-established scales of achievement.
Normal Curve Equivalent - A score that ranges from
1-99, often used by testers to manipulate data arithmetically. Used to compare
different tests for the same student or group of students and between different
students on the same test. An NCE is a normalized test score with a mean of 50
and a standard deviation of 21.06. NCEs should be used instead of percentiles
for comparative purposes. Required by many categorical funding agencies, e.g.,
Chapter I or Title I.
Objective Test - A test for which the scoring
procedure is completely specified enabling agreement among different scorers. A
correct-answer test.
Observations - can
be of any social phenomenon, such as student presentations, students working in
the library, or interactions at student help desks. Observations can be
recorded as a narrative or in a highly structured format, such as a checklist,
and they should be focused on specific program objectives.
Observer effect - The
degree to which the assessment results are affected by the presence of an
observer.
On-Demand Assessment - An assessment process that
takes place as a scheduled event outside the normal routine. An attempt to
summarize what students have learned that is not embedded in classroom
activity.
Open-ended - Assessment
questions that are designed to permit spontaneous and unguided responses.
Operational (--ize)
- Defining a term or object so that it can be measured. Generally states the
operations or procedures used that distinguish it from others.
Oral examination -
An assessment of student knowledge levels through a face-to-face dialogue
between the student and examiner-usually faculty.
Outcome - An operationally defined educational goal,
usually a culminating activity, product, or performance that can be measured.
Objectives - tated, desirable outcomes of education.
Out-of-Level Testing - Means assessing students in one grade level using
versions of tests that were designed for students in other (usually lower)
grade levels; may not assess the same content standards at the same levels as
are assessed in the grade-level assessment.
P Value - A p
value is basically a behavioral measure. Rather than defining difficulty in
terms of some intrinsic characteristic of the item, difficulty is defined in
terms of the relative frequency with which those taking the test choose the
correct response (Thorndike et al, 1991).
Percentile - A
ranking scale ranging from a low of 1 to a high of 99 with 50 as the median
score. A percentile rank indicates the percentage of a reference or norm group
obtaining scores equal to or less than the test-taker's score. A percentile
score does not refer to the percentage of questions answered correctly, it
indicates the test-taker's standing relative to the norm group standard.
Performance
appraisals - A competency-based method whereby abilities are measured
in most direct, real-world approach. Systematic measurement of overt
demonstration of acquired skills.
Performance-Based Assessment - Direct, systematic
observation and rating of student performance of an educational objective,
often an ongoing observation over a period of time, and typically involving the
creation of products. The assessment may be a continuing interaction between
teacher and student and should ideally be part of the learning process. The
assessment should be a real-world performance with relevance to the student and
learning community. Assessment of the performance is done using a rubric, or
analytic scoring guide to aid in objectivity. Performance-based assessment is a
test of the ability to apply knowledge in a real-life setting. Performance of
exemplary tasks in the demonstration of intellectual ability.
Evaluation of the product of a
learning experience can also be used to evaluate the effectiveness of teaching
methods.
Stiggins defines performance-based
assessment as the use of performance criteria to determine the degree to which
a student has met an achievement target. Important elements of
performance-based assessment include clear goals or performance criteria
clearly articulated and communicated to the learner; the establishment of a
sound sampling that clearly envisions the scope of an achievement target and
the type of learning that is involved (use of problem-solving skills, knowledge
acquisition, etc.) Attention to extraneous interference (cultural biases,
language barriers, testing environment, tester biases) and establishment of a
clear purpose for the data collected during the assessment before the
assessment is undertaken, keeping in mind the needs of the groups involved
(teachers, students, parents, etc.) (from an article by Richard J. Stiggins, "The
Key to Unlocking High-Quality Performance Assessments." Assessment: How Do
We Know What They Know? ASCD, 1992.
Performance Criteria - The standards by which student
performance is evaluated. Performance criteria help assessors maintain
objectivity and provide students with important information about expectations,
giving them a target or goal to strive for.
Performance
Expectation ?specific student task to achieve a unit result
Performance Indicator
(PI) ?a State prescribed statement of performance related to a specific
content or process standard
Portfolio - A systematic and organized collection of
a student's work that exhibits to others the direct evidence of a student's
efforts, achievements, and progress over a period of time. The collection
should involve the student in selection of its contents, and should include
information about the performance criteria, the rubric or criteria for judging
merit, and evidence of student self-reflection or evaluation. It should include
representative work, providing a documentation of the learner's performance and
a basis for evaluation of the student's progress. Portfolios may include a
variety of demonstrations of learning and have been gathered in the form of a
physical collection of materials, videos, CD-ROMs, reflective journals, etc.
Portfolio Assessment - Portfolios may be assessed in
a variety of ways. Each piece may be individually scored, or the portfolio might
be assessed merely for the presence of required pieces, or a holistic scoring
process might be used and an evaluation made on the basis of an overall
impression of the student's collected work. It is common that assessors work
together to establish consensus of standards or to ensure greater reliability
in evaluation of student work. Established criteria are often used by reviewers
and students involved in the process of evaluating progress and achievement of
objectives.
Primary Trait Method - A type of rubric scoring
constructed to assess a specific trait, skill, behavior, or format, or the
evaluation of the primary impact of a learning process on a designated
audience.
Process - A generalizable method of doing something,
generally involving steps or operations which are usually ordered and/or
interdependent. Process can be evaluated as part of an assessment, as in the
example of evaluating a student's performance during prewriting exercises
leading up to the final production of an essay or paper.
Process Standard
- broad statement addressing procedure and connections as it applies to a
subject
Product - The tangible and stable result of a
performance or task. An assessment is made of student performance based on
evaluation of the product of a demonstration of learning.
Profile - A graphic compilation of the performance of
an individual on a series of assessments.
Power Test - Measures performance unaffected by speed of response;
time not critical; items usually arranged in order of increasing difficulty.
Profile - A graphic representation of an individual’s scores on
several tests or subtests; allows for easy identification of strengths or
weaknesses across different tests or subtests.
Progress monitoring. A scientifically based practice used to assess
students' academic performance and evaluate the effectiveness of instruction;
can be implemented with individual students or an entire class.
Project - A complex assignment involving more than
one type of activity and production. Projects can take a variety of forms, some
examples are a mural construction, a shared service project, or other
collaborative or individual effort.
Qualitative
methods of assessment - Methods that rely on descriptions rather than
numbers. Examples: ethnographic field studies, logs, journals, participant
observations, open-ended questions on interviews and surveys.
Quantitative methods of
assessment - Methods that rely on numerical scores or ratings.
Examples: surveys, inventories, institutional/departmental data, departmental/course-level
exams (locally constructed, standardized, etc.)
Quartile - The breakdown of an aggregate of
percentile rankings into four categories: the 0-25th percentile, 26-50th
percentile, etc.
Quintile - The breakdown of an aggregate of percentile
rankings into five categories: the 0-20th percentile, 21-40th percentile, etc.
Rating Scale - A scale based on descriptive words or
phrases that indicate performance levels. Qualities of a performance are
described (e.g., advanced, intermediate, novice) in order to designate a level
of achievement. The scale may be used with rubrics or descriptions of each
level of performance.
Raw score - A raw score is the number of questions answered
correctly on a test or subtest. For example, if a test has 59 items and the
student gets 23 items correct, the raw score would be 23. Raw scores are
converted to percentile ranks, standard scores, grade equivalent and age
equivalent scores.
Reflective Essays
- generally are brief (five to ten minute) essays on topics related to
identified learning outcomes, although they may be longer when assigned as
homework. Students are asked to reflect on a selected issue. Content analysis
is used to analyze results.
Regression Analysis - In statistics, regression analysis refers to techniques for modeling
and analyzing several variables, when the focus is on the relationship between
a dependent variable
and one or more independent
variables. More specifically, regression analysis helps us
understand how the typical value of the dependent variable changes when any one
of the independent variables is varied, while the other independent variables
are held fixed. Most commonly, regression analysis estimates the conditional
expectation of the dependent variable given the independent variables ?that
is, the average value of the dependent variable when the independent variables
are held fixed. Less commonly, the focus is on a quantile,
or other location
parameter of the conditional distribution of the dependent variable
given the independent variables. In all cases, the estimation target is a
function of the independent variables called the regression function. In
regression analysis, it is also of interest to characterize the variation of
the dependent variable around the regression function, which can be described
by a probability
distribution.
Reliability - The measure of consistency for an
assessment instrument. The instrument should yield similar results over time
with similar populations in similar circumstances.
Response to Intervention (RTI) - Use of research-based instruction and interventions to
students who are at risk and who are suspected of having specific learning
disabilities.
Rigor ?academic
or cognitive challenge of a test item or task with four levels: 1=prior
knowledge/recall; 2=routine task for the grade/course; 3=reasoning task;
4=stretch
Rubric - Some of the definitions of rubric are
contradictory. In general a rubric is a scoring guide used in subjective assessments.
A rubric implies that a rule defining the criteria of an assessment system is
followed in evaluation. A rubric can be an explicit description of performance
characteristics corresponding to a point on a rating scale. A scoring rubric
makes explicit expected qualities of performance on a rating scale or the
definition of a single scoring point on a scale.
Salience
- A striking point or feature.
Sampling - A way to obtain information about a large
group by examining a smaller, randomly chosen selection (the sample) of group
members. If the sampling is conducted correctly, the results will be
representative of the group as a whole. Sampling may also refer to the choice
of smaller tasks or processes that will be valid for making inferences about
the student's performance in a larger domain. "Matrix sampling" asks
different groups to take small segments of a test; the results will reflect the
ability of the larger group on a complete range of tasks.
Scale - A classification tool or counting system
designed to indicate and measure the degree to which an event or behavior has
occurred.
Scale Scores - Scores based on a scale ranging from
001 to 999. Scale scores are useful in comparing performance in one subject
area across classes, schools, districts, and other large populations,
especially in monitoring change over time.
Score - A rating of performance based on a scale or
classification.
Scoring Criteria - Rules for assigning a score or the
dimensions of proficiency in performance used to describe a student's response
to a task. May include rating scales, checklists, answer keys, and other
scoring tools. In a subjective assessment situation, a rubric.
Scoring - A package of guidelines intended for people
scoring performance assessments. May include instructions for raters, notes on
training raters, rating scales, samples of student work exemplifying various
levels of performance.
Self-Assessment - A process in which a student
engages in a systematic review of a performance, usually for the purpose of
improving future performance. May involve comparison with a standard,
established criteria. May involve critiquing one's own work or may be a simple
description of the performance. Reflection, self-evaluation, metacognition, are
related terms.
Senior Project - Extensive projects planned and
carried out during the senior year of high school as the culmination of the
secondary school experience, senior projects require higher-level thinking
skills, problem-solving, and creative thinking. They are often
interdisciplinary, and may require extensive research. Projects culminate in a
presentation of the project to a panel of people, usually faculty and community
mentors, sometimes students, who evaluate the student's work at the end of the
year.
Simulations
- A competency-based measure where a person's abilities are measured in a
situation that approximates a "real world" setting. Simulation is
primarily used when it is impractical to observe a person performing a task in
a real world situation (e.g. on the job).
Speed Test - A test in which performance is measured by the number of
tasks performed in a given time. Examples are tests of typing speed and reading
speed.
Standard score - Score on norm-referenced tests that are based on the
bell curve and its equal distribution of scores from the average of the
distribution. Standard scores are especially useful because they allow for
comparison between students and comparisons of one student over time.
Standard deviation (SD) - A measure of the variability of a distribution of
scores. The more the scores cluster around the mean, the smaller the standard
deviation. In a normal distribution, 68% of the scores fall within one standard
deviation above and one standard deviation below the mean.
Stakeholder -
Anyone who has a vested interest in the outcome of the program/project. In a
high stakes standardized test (a graduation requirement, for example), when
students' scores are aggregated and published in the paper by school, the
stakeholders include students, teachers, parents, school and district
administrators, lawmakers (including the governor), and even real estate
agents. It is always interesting to note which stakeholders seem to have the
most at risk and which stakeholders seem to have the most power; these groups
are seldom the same.
Standard: - The
performance level associated with a particular rating or grade on a test. For
instance, 90% may be the standard for an A in a particular course; on a
standardized test, a cutting score or cut point is used to determine the
difference between one standard and the next.
Standard-based assessment - A
standard-based assessment assesses learner achievement in relation to set
standards.
Standardized Test - An objective test that is given
and scored in a uniform manner. Standardized tests are carefully constructed
and items are selected after trials for appropriateness and difficulty. Tests
are issued with a manual giving complete guidelines for administration and
scoring. The guidelines attempt to eliminate extraneous interference that might
influence test results. Scores are often are often norm-referenced.
Standards - Agreed upon values used to measure the
quality of student performance, instructional methods, curriculum, etc.
Status
report - A description of the implementation of the plan's assessment
methods, the findings (evidence) from assessment methods, how the findings were
used in decisions to maintain or improve student learning (academic programs)
or unit outcomes (support units), the results of previous changes to improve
outcomes, and the need for additional information and/or resources to implement
an approved assessment plan or gather additional evidence.
Summative assessment
- Assessment that is done at the conclusion of a course or some larger
instructional period (e.g., at the end of the program). The purpose is to
determine success or to what extent the program/project/course met its goals.
Subtest - A group of test
items that measure a specific area (i.e., math calculation and reading
comprehension). Several subtests make up a test.
Surveys - are
commonly used with open-ended and closed-ended questions. Closed ended
questions require respondents to answer the question from a provided list of
responses. Typically, the list is a progressive scale ranging from low to high,
or strongly agree to strongly disagree.
Subjective Test - A test in which the impression or
opinion of the assessor determines the score or evaluation of performance. A
test in which the answers cannot be known or prescribed in advance.
Summative Assessment - Evaluation at the conclusion
of a unit or units of instruction or an activity or plan to determine or judge
student skills and knowledge or effectiveness of a plan or activity. Outcomes
are the culmination of a teaching/learning process for a unit, subject, or
year's study. (See Formative Assessment.)
Test
- A formal assessment of student achievement. Teacher made tests can take many
forms; external tests are always standardized. A portfolio can be used as a test,
as can a project or exhibition.
Third party - Person(s)
other than those directly involved in the educational process (e.g., employers,
parents, consultants).
Topology - Mapping
of the relationships among subjects.
Transcript Analysis
- are examined to see if students followed expected enrollment patterns or to
examine specific research questions, such as to explore differences between
transfer and freshmen enrolled students.
Triangulate (triangulation)
- The use of a combination of assessment methods in a study. An example of
triangulation would be an assessment that incorporated surveys, interviews, and
observations.
T-Score - A standard score with a mean of 50 and a standard
deviation of 10. A T-score of 60 represents a score that is 1 standard
deviation above the mean.
Test bias - The difference in test scores that is attributable to
demographic variables (e.g., gender, ethnicity, and age).
Utility - (1)
Usefulness of assessment results.
Utility - (2) The
relative value of an outcome with respect to a set of other possible outcomes.
Hence test utility refers to an evaluation, often in cost-benefit form, of the
relative value of using a test vs. not using it, of using a test in one manner
vs. another, or of using one test vs. another test.
Validity - Validity refers to the degree to
which a study accurately reflects or assesses the specific concept that the
researcher is attempting to measure. Validity has three components:
•
relevance - the option measures your educational
objective as directly as possible
•
accuracy - the option measures your educational
objective as precisely as possible
•
utility - the option provides formative and
summative results with clear implications for educational program evaluation
and improvement
Value added - The increase in learning that
occurs during a course, program, or undergraduate education. Can either focus
on the individual student (how much better a student can write, for example, at
the end than at the beginning) or on a cohort of students (whether senior
papers demonstrate more sophisticated writing skills-in the aggregate-than
freshmen papers). Requires a baseline measurement for comparison.
Variable (variability) - Observable
characteristics that vary among individuals responses.
Voluntary System of
Accountability (VSA) - a joint accountability initiative by the
American Association of State Colleges and Universities (AASCU) and the
Association of Public and Land Grand Universities (APLU) aimed at making
institutional data transparent.
z-Score - A standard score with a mean of 0 (zero) and a standard
deviation of 1.
The definitions of most terms in
this glossary were derived from several sources, including:
- Glossary of Useful Terms
Related to Authentic and Performance Assessments. Grant Wiggins
- SCASS Arts Assessment
Project Glossary of Assessment Terms
- Working Definitions for
Assessment Technology
- A True Test: Toward a More
Authentic and Equitable Assessment. Grant Wiggins. Phi Delta Kappan,
5/89. (703-713)
- The ERIC Review:
Performance-Based Assessment. Vol. 3 Issue 1, Winter, 1994.
- Assessment: How Do We Know
What They Know? ASCD. 1992.
- Assessment as an episode
of learning. Dennie Palmer Wolf
- Dissolving the Boundaries:
Assessment that Enhances Learning. Dee Dickinson
- Results-Based Education Model (R-BEM?, Bruce H. Crowder