- Concept and nature of measurement and evaluation, meaning, process, purposes , problems in valuation and measurement, principles of evaluation, characteristics, objectivity, validity, reliability and usability, Formative and summative evaluation internal and external evaluation criterion and norm referred evaluation
- Evaluation strategies meaning characteristics construction of tests administration of tests scoring grading vs. marks item analysis
- Essays, short answer questions and multiple choice questions, true and false completion
- Tools of evaluation
- Rating scale, checklists, objective structured clinical examination OSCE, objective structured practical examination (OSPE), viva examination
- Differential scales and summated scales, sociometry, anecdotal record, attitude scale, critical incident technique, question bank preparation, validation, moderation by panel.

Measurement and evaluation are
independent concepts.
Measurement can be defined as the process of assigning numbers to events
based on an established set of rules. Measurement is collection of quantitative data. A
measurement is made by comparing a quantity with a standard unit.
In educational
measurement, the “events” under consideration are students’ test performances.
In the simplest case, the numerals assigned are typically whole numbers, such
as a student’s number of correct responses. Educational measurement is closely
related to the concepts of testing, assessment, and evaluation. In
education, the numerical value of scholastics ability, aptitude,
achievement etc can be measured and obtained using instruments such as paper
and pencil test. It means that the values of the attribute are translated into
numbers by measurement.
Educational measurement
is the science and practice of obtaining information about characteristics of
students, such as their knowledge, skills, abilities, and interests. It is a
specialty within the broader discipline of psychometrics. Measurement in
education includes the development of instruments or protocols for obtaining
information, procedures for analyzing and evaluating the quality of the
information gained from the use of instruments or protocols, and strategies for
communicating the resulting information to diverse audiences, such as
educators, policymakers, parents, and students.
Aims of measurement in education
(1) Arriving at conclusions regarding
students’ standing with respect to a specified educational outcome,
(2) Documenting student ability, achievement,
or interests,
(3) Gauging student
progress toward specified educational goals, and
(4) improving teaching and learning.
In the evaluation process, information
is interpreted according to established standards so that decisions can be
made. Clearly, the success of evaluation depends on the quality of the data
collected. If test results are not consistent (or reliable) and truthful (or
valid), accurate evaluation is impossible.
The measurement process is the first
step in evaluation; an improved measurement leads to accurate evaluation.
Measurement determines the degree to which an individual possesses a defined
characteristic.
Characteristics
or quality of measurement
The first important characteristic of
measurement is reliability.
The second important characteristic is validity.
The third important characteristic of a
measurement is objectivity.
Evaluation is the process of giving
meaning to a measurement by judging it against some standard. The two most
widely used types of standards are criterion and norm referenced.
The criterion-referenced standard is
used to determine if a student has attained a specified level of skill.
The norm-referenced standard is used to
judge an individual’s performance in relation to the performances of other
members of a well-defined group. Norm-referenced standard is developed by
testing a large number of individuals of a defined group.
Types of Measurement:
Generally, there are three types of
measurement:
(i) Direct; (ii) Indirect; and
Relative.
Direct; To find the length and
breadth of a table involves direct measurement and this is always accurate if
the tool is valid.
Indirect; To know the quantity of heat
contained by a substance involves indirect measurement for we have to first
find out the temperature of the substance with the help of a thermometer and
then we can calculate the heat contained by the substance.
Relative ; To measure the intelligence of
a boy involves relative measurement, for the score obtained by the boy in an
intelligence test is compared with norms. It is obvious that psychological and
educational measurements are relative.
Levels and Classification of
Educational Measures
A students’ achievement may be
viewed at three different levels:
1. Self-referenced how
the student is progressing with reference to himself/herself.
2. Criterion-referenced how
the student is progressing with reference to the criteria set by the teacher.
Criterion-referenced – individual scores are interpreted in terms of the
student’s performance relative to some standard or criterion
3. Norm-referenced how
the student is progressing with reference to his/her peer group.
Norm-referenced – individual scores are interpreted relative to the scores of
others in a well defined Norming group.
EVALUATION
Evaluation is a systematic process of
determining to what extent instructional objectives has been achieved. It is a dynamic decision-making
process focusing on changes that have been made.
Definition
of evaluation
1.
James M. Bradfield:
Evaluation is the assignment of
symbols to phenomenon, in order to characterise the worth or value of a
phenomenon, usually with reference to some social, cultural or scientific
standards.
2.
Gronlund and Linn:
Evaluation is a systematic process of
collecting, analysing and interpreting information to determine the extent to
which pupils are achieving instructional objectives.
3.C.E. Beeby (1977), who described
evaluation as “the systematic collection and interpretation of evidence leading as a
part of process to a judgement of value with a view to action.”
In
this definition, there are the following four key elements:
(i) Systematic collection of
evidence.
(ii) Its interpretation.
(iii) Judgement of value.
(iv) With a view to action.
This
process of evaluation involves
(i) Collecting suitable data
(measurement)
(ii) Judging the value of these data
according to some standard; and
(iii) Making decisions based on the
data.
The function of evaluation is to
facilitate rational decisions. For the teacher, this can be to facilitate
student learning; for the exercise specialist, this could mean helping someone
establish scientifically sound weight reduction goals
Evaluation
in Education
Evaluation
focuses on grades and may reflect classroom components other than course
content and mastery level. Evaluation is a final review on your instruction to
gauge the quality. It’s product-oriented. This means that the main question is:
“What’s been learned?” Finally, evaluation is judgemental.
Principles of Evaluation
1. It must be clearly
stated what is to be evaluated:
A teacher must be clear about the
purpose of evaluation. He must formulate the instructional objectives and
define them clearly in terms of student’s observable behaviour. Before
selecting the achievement measures the intended learning out comes must be
specified clearly.
2. A variety of evaluation
techniques should be used for a comprehensive evaluation:
It is not possible to evaluate all
the aspect of achievement with the help of a single technique. For the better
evaluation, the techniques like objective tests, essay tests, observational
techniques etc. should be used so that a complete picture of the pupil
achievement and development can be assessed.
3. An evaluator should
know the limitations of different evaluation techniques:
Evaluation can be done with the help
of simple observation or highly developed standardized tests. But whatever the
instrument or technique may be it has its own limitation. There may be
measurement errors. Sampling error is a common factor in educational and
psychological measurements. An achievement test may not include the whole
course content. Error in measurement can also be found due to students guessing
on objective tests. Error is also found due to incorrect interpretation of test
scores.
4. The technique of
evaluation must be appropriate for the characteristics or performance to be
measured:
Every evaluation technique is
appropriate for some uses and inappropriate for another. Therefore while
selecting an evaluation technique one must be well aware of the strength and
limitations of the techniques.
5. Evaluation is a means
to an end but not an end in itself:
The evaluation technique is used to
take decisions about the learner. It is not merely gathering data about the
learner. Because blind collection of data is wastage of both time and effort.
But the evaluation is meant for some useful purpose.
CHARACTERISTICS OF EVALUATION
The
analysis of all the above definitions makes us able to draw following
characteristics of evaluation:
1. Evaluation implies a systematic
process
2. Evaluation is a continuous
process.
In an ideal situation, the teaching-
learning process on the one hand and the evaluation procedure on the other
hand, go together. It is certainly a wrong belief that the evaluation procedure
follows the teaching-learning process.
3. Evaluation emphasizes the broad
personality changes and major objectives of an educational programme.
Therefore, it includes not only subject-matter achievements but also attitudes,
interests and ideals, ways of thinking, work habits and personal and social
adaptability.
4. Evaluation always assumes that
educational objectives have previously been identified and defined.
5. A comprehensive programme of
evaluation involves the use of many procedures (for example,
analytico-synthetic, heuristic, experimental, lecture, etc.); a great variety
of tests (for example, essay type, objective type, etc.); and other necessary
techniques (for example, socio-metric, controlled-observation techniques,
etc.).
6. Learning is more important than
teaching. Teaching has no value if it does not result in learning on the part
of the pupils. Objectives and learning experiences should be so relevant that
ultimately they should direct the pupils towards the accomplishment of
educational goals. To assess the students and their complete development
brought about through education is evaluation.
7.Evaluation is the determination of
the congruence between the performance and objectives.
STEPS INVOLVED IN EVALUATION
(i) Identifying and Defining
General Objectives
In the evaluation process, first step
is to determine what to evaluate, i.e., to set down educational objectives. The
process of identifying and defining educational objectives is a complex one;
there is no simple or single procedure which suits all teachers. Some prefer to
begin with the course content, some with general aims, and some with lists of
objectives suggested by curriculum experts in the area. While stating the
objectives, therefore, we can successfully focus our attention on the product
i.e., the pupil’s behaviour, at the end of a course of study and state it in
terms of his knowledge, understanding, skill, application, attitudes,
interests, appreciation, etc.
(ii) Identifying and Defining Specific Objectives:
The setting of specific objectives
will provide direction to teaching-learning process. It determine two things;
one, the various types of learning situations to be provided by the class teacher
to his pupils and second, the method to be employed to evaluate both—the
objectives and the learning experiences.
(iii) Selecting Teaching
Points:
The next step in the process of
evaluation is to select teaching points through which the objectives can be
realised. Once the objectives are set up, the next step is to decide the
content (curriculum, syllabus, course) to help in the realisation of
objectives.
(iv) Planning Suitable Learning
Activities:
In the fourth step, the teacher will
have to plan the learning activities by caring the objectives as well as
teaching points. The process then becomes three dimensional, the three
co-ordinates being objectives, teaching points and learning activities. The
teacher gets the objectives and content readymade.
He is completely free to select the
type of learning activities such as analytico-synthetic method;
inducto-deductive reasoning; experimental method or a demonstration method;
discovery method, lecture method; or he may ask the pupils to divide into
groups and to do a sort of group work followed by a general discussion; and so
on. One thing he has to remember is that he should select only such activities
as will make it possible for him to realise his objectives.
(v) Evaluating:
In the fifth step, the teacher
observes and measures the changes in the behaviour of his pupils through
evaluation process.
Here the teacher will construct a
test by making the maximum use of the teaching points already introduced in the
class and the learning experiences already acquired by his pupils. He may plan
for an oral lest or a written test; he may administer an essay type test or an
objective type of lest; or he may arrange a practical test.
(vi) Using the Results as
Feedback
If the teacher, after testing his
pupils, finds that the objectives have not been realised to a great extent, he
will use the results in reconsidering the objectives and in organising the
learning activities. He will retrace his steps to find out the drawbacks in the
objectives or in the learning activities he has provided for his students. This
is known as feedback. Whatever results the teacher gets after testing his
pupils should be utilised for the betterment of the students.

Evaluation is a very important
requirement for the education system. It fulfills various purposes in systems
of education like quality control in education, selection/entrance to a higher
grade or tertiary level.
Functions of Evaluation:
Evaluation plays a vital role in
teaching learning experiences. It is an integral part of the instructional
programmes. It provides information’s on the basis of which many educational
decisions are taken.
Evaluation
has the following functions:
1. Placement Functions:
·
Evaluation helps to study the entry behaviour of
the children in all respects.
·
That helps to undertake special instructional
programmes.
·
To provide for individualisation of instruction.
·
It also helps to select pupils for higher
studies, for different vocations and specialised courses.
2. Instructional Functions:
·
It helps
in systematic determination
of a subject's merit, worth and significance, using criteria governed by a set
of standards.
·
Evaluation helps to build an educational
programme, assess its achievements and improve upon its effectiveness.
·
It reviews the progress in learning from time to
time.
·
It also provides valuable feedback on the design
and the implementation of the programme.
·
Evaluation plays an enormous role in the
teaching-learning process. It helps teachers and learners to improve teaching
and learning.
·
Evaluation is a continuous process and a
periodic exercise.
·
It helps in forming the values of judgement,
educational status, or achievement of student.
·
In learning, it contributes to formulation of
objectives, designing of learning experiences and assessment of learner
performance.
·
Besides this, it is very useful to bring
improvement in teaching and curriculum.
·
It provides accountability to the
society, parents, and to the education system.
·
The improvement in courses/curricula, texts and
teaching materials is brought about with the help of evaluation.
·
It helps in selecting instructional strategies.
3. Diagnostic Functions:
·
Evaluation has to diagnose the weak points in
the school programme as well as weakness of the students.
·
To suggest relevant remedial programmes.
·
The aptitude, interest and intelligence are also
to be recognised in each individual child so that he may be energised towards a
right direction.
·
To adopt instruction to the different needs of
the pupils.
·
To evaluate the progress of these weak students
in terms of their capacity, ability and goal.
4. Predictive functions:
·
To discover potential abilities and aptitudes
among the learners.
·
To predict the future success of the children.
·
And also helps the child in selecting the right
electives.
5. Administrative Functions:
·
To adopt better educational policy and decision
making.
·
Helps to classify pupils in different convenient
groups.
·
To promote students to next higher class,
·
To appraise the supervisory practices.
·
To have appropriate placement.
·
To draw comparative statement on the performance
of different children.
·
To have sound planning.
·
Helps to
test the efficiency of teachers in providing suitable learning experiences.
·
To mobilise public opinion and to improve public
relations.
·
Helps in developing a comprehensive criterion
tests.
6. Guidance Functions:
·
Assists a person in making decisions about
courses and careers.
·
Enables a learner to know his pace of learning
and lapses in his learning.
·
Helps a teacher to know the children in details
and to provide necessary educational, vocational and personal guidance.
7. Motivation Functions:
·
To motivate, to direct, to inspire and to
involve the students in learning.
·
To reward their learning and thus to motivate
them towards study.
8. Development Functions:
·
Gives reinforcement and feedback to teacher,
students and the teaching learning processes.
·
Assists in the modification and improvement of
the teaching strategies and learning experiences.
·
Helps in the achievement of educational
objectives and goals.
9. Research Functions:
·
Helps to provide data for research
generalisation.
·
Evaluation clears the doubts for further studies
and researches.
·
Helps to promote action research in education.
10. Communication Functions:
·
To communicate the results of progress to the
students.
·
To intimate the results of progress to parents.
·
To circulate the results of progress to other
schools.
Major Differences between Evaluation and Measurement
- 1.
While evaluation is a new concept, measurement is an old concept.
- 2.
While evaluation is a technical term, measurement is a simple word.
- 3.
While the scope of evaluation is wider, the scope of measurement is
narrow.
- 4. In
evaluation pupil’s qualitative progress and behavioural changes are
tested. In measurement only quantitative progress of the pupils can be
explored.
- 5. In
evaluation, the learning experiences are provided to the pupils in
accordance with predetermined teaching objectives are tested. In
measurement the content skill and achievement of the ability are not
tested on the basis of some objectives but the result of the testing is
expressed in numerals, scores, average and percentage.
- 6.
The qualities are measured in the evaluation as a whole. In measurement,
the qualities are measured as separate units.
- 7.
Evaluation is the process by which the previous effects and hence caused
behavioural changes are tested. Measurement means only those techniques
which are used to test a particular ability of the pupil.
- 8. In
evaluation, various techniques like observation, hierarchy, criteria,
interest and tendencies measurement etc. are used for testing the
behavioural changes. In measurement, personality test, intelligence test
and achievement test etc. are included.
- 9. Evaluation
is that process by which the interests, attitudes, tendencies, mental
abilities, ideals, behaviours and social adjustment etc. of pupils are
tested. By measurement, the interests, attitudes tendencies, ideals and
behaviours cannot be tested.
- 10. The
evaluation aims at the modification of education system by bringing a
change in the behaviour. Measurement aims at measurement only.
Types of Evaluation:
Evaluation can be classified into
different categories in many ways.
Some
important classifications are as follows:
2. FORMATIVE EVALUATION:
Formative evaluation are given at
regular and frequent intervals during a course to monitor the learning progress
of students during the period of instruction. It helps a teacher to ascertain
the pupil-progress from time to time.
Its main objective is to provide
continuous feedback to both teacher and student, concerning learning successes
and failures while instruction is in process.
Feedback to students provides
reinforcement of successful learning and identifies the specific learning
errors that need correction. The pupil knows his learning progress from time to
time. This type of evaluation is an essential tool to provide feedback to the
learners for improvement of their self-learning. Thus, formative evaluation
motivates the pupils for better learning.
Feedback to teacher provides
information for the teachers to improve their methodologies of teaching, nature
of instructional materials, etc. and to modify instruction and for prescribing
group and individual remedial work.
Thus, it aims at improvement of
instruction. “The
idea of generating information to be used for revising or improving educational
practices is the core concept of formative evaluation.”
Therefore, evaluation and development
must go hand in hand. The evaluation has to take place in every possible
situation or activity and throughout the period of formal education of a pupil.
The
functions of formation evaluation are:
(a) Diagnosing:
Diagnosing is concerned with
determining the most appropriate method or instructional materials conducive to
learning.
(b) Placement:
Placement is concerned with the
finding out the position of an individual in the curriculum from which he has
to start learning.
(c) Monitoring:
Monitoring is concerned with keeping
track of the day-to- day progress of the learners and to point out changes
necessary in the methods of teaching, instructional strategies, etc.
Characteristics of Formative
Evaluation:
The
characteristics of formative evaluation are as follows:
1.
It is an integral part of the learning process.
2.
It occurs, frequently during the course of instruction.
3.
Its results are made immediately known to the learners.
4.
It may sometime takes form of teacher observation only.
5.
It reinforces learning of the students.
6.
It pinpoints difficulties being faced by a weak
learner.
7.
Its results cannot be used for grading or placement
purposes.
8.
It helps in modification of instructional strategies
including method of teaching, immediately.
9.
It motivates learners, as it provides them with
knowledge of progress made by them.
10.
It sees role of evaluation as a process.
11.
It is generally a teacher-made test.
12.
It does not take much time to be constructed.
Examples:
i. Monthly tests.
ii. Class tests.
iii. Periodical assessment.
iv. Teacher’s observation, etc.
3. Diagnostic Evaluation:
Formative evaluation provides
first-aid treatment for simple learning problems whereas diagnostic evaluation
searches for the underlying causes of those problems that do not respond to
first-aid treatment. It is concerned with identifying the learning difficulties
or weakness of pupils during instruction. It tries to locate or discover the
specific area of weakness of a pupil in a given course of instruction and also
tries to provide remedial measure.
When the teacher finds that inspite
of the use of various alternative methods, techniques and corrective
prescriptions the child still faces learning difficulties, he takes recourse to
a detailed diagnosis through specifically designed tests called ‘diagnostic
tests’.
Diagnosis can be made by employing
observational techniques, too. In case of necessity the services of
psychological and medical specialists can be utilized for diagnosing serious
learning handicaps.
4. SUMMATIVE EVALUATION:
Summative evaluation is done at the
end of a course of instruction or at the end of a fairly long period (say, a
semester) to know how far the extent the objectives previously fixed have been
accomplished. In other words, it is the evaluation of pupils’ achievement at
the end of a course. The traditional examinations are generally summative
evaluation tools.
The main objective of the summative
evaluation is
·
the degree to which the students have mastered
the course content.
·
judge the appropriateness of instructional
objectives.
·
generally the work of standardised tests.
·
to compare one course with another.
·
imply some sort of final comparison of one item
or criteria against another.
The
functions of this type of evaluation are:
(a)
Crediting:
Crediting is concerned with
collecting evidence that a learner has achieved some instructional goals in
contents in respect to a defined curricular programme.
(b)
Certifying:
Certifying is concerned with giving
evidence that the learner is able to perform a job according to the previously
determined standards.
(c)
Promoting:
It is concerned with promoting pupils
to next higher class.
(d) Selecting:
Selecting the pupils for different
courses after completion of a particular course structure.
Characteristics of Summative
Evaluation:
a. It is terminal in nature as it
comes at the end of a course of instruction (or a programme).
b. It is judgemental in character in
the sense that it judges the achievement of pupils.
c. It views evaluation “as a
product”, because its chief concern is to point out the levels of attainment.
d. It cannot be based on teachers
observations only.
e. It does not pin-point difficulties
faced by the learner.
f. Its results can be used for
placement or grading purposes.
g. It reinforces learning of the
students who has learnt an area.
h. It may or may not motivate a
learner. Sometimes, it may have negative effect.
Examples:
1. Traditional school and university
examination,
2. Teacher-made tests,
3. Standardised tests,
4. Practical and oral tests,
and
5. Rating scales, etc.
5. NORM-REFERENCED AND
CRITERION-REFERENCED EVALUATION:
(i) Criterion-Referenced Evaluation:
When the evaluation is concerned with
the performance of the individual in terms of what he can do is termed as
criterion- referenced evaluation. There is no reference to the performance of
other members of the group. In it we refer an individual’s performance to a
predetermined criterion which is well defined. The purpose of
criterion-referenced evaluation/test is to assess the objectives. It is the
objective based test. The objectives are assessed, in terms of behavioural
changes among the students. Such type of test assesses the ability of the
learner in relation to the criterion behaviour.
Examples
(i) Raman got 93 marks in a test of
Mathematics.
(ii) A typist types 60 words per
minute.
(iii) Amit’s score in a reading test
is 70.
(ii) Norm Referenced Evaluation:
A norm-referenced test is used to
ascertain an individual’s status with respect to the performance of other
individuals on that test.
Norm-referenced evaluation is the
traditional class-based assignment of numerals to the attribute being measured.
It means that the measurement act relates to some norm, group or a typical
performance. It is an attempt to interpret the test results in terms of the
performance of a certain group. This group is a norm group because it serves as
a referent of norm for making judgements. Test scores are neither interpreted
in terms of an individual (self-referenced) nor in terms of a standard of
performance or a pre-determined acceptable level of achievement called the
criterion behaviour (criterion-referenced). The measurement is made in terms of
a class or any other norm group.
Almost all our classroom tests,
public examinations and standardised tests are norm-referenced as they are
interpreted in terms of a particular class and judgements are formed with
reference to the class.
Examples:
(i) Raman stood first in Mathematics
test in his class.
(ii) The typist who types 60 words
per minute stands above 90 percent of the typists who appeared the interview.
(iii) Amit surpasses 65% of students
of his class in reading test.
INTERNAL
ASSESSMENT
Internal assessment is often called “Home examination”,
“Class room test” or “Teacher made test. There are the assessments for which
all the arrangement is made by the teachers of the same institution. Its main
aim is to evaluate the progress of students in different classes at different
levels. Teachers themselves frame the question papers, take the exam, examine
the answer scripts/answer copies and decide about the Fail/Pass of the
students.
Objectives
of Internal Assessment:
·
To evaluate the Mental Nourishment of students.
·
To estimate the student’s educational progress,
speed of achieving and ability of learning.
·
On passing the internal exam, promotion is given
to next class.
·
Internal assessment creates the competing
environment, which make pleasant effects over the educational achievements.
·
Students and teacher both know the status of
each student, who is leading and who is lagging and how much.
·
Teacher evaluates his progress and his teaching
methods and tries to overcame his weakness.
·
It evaluates the particular curriculum for a
particular class.
·
Parents of the students are informed about the
progress of students so that they can care for their children.
·
Teacher can group the students according to
Ability, Hardwork, Intelligence on the basis of the result and make
arrangements for weak students’ betterment.
·
Result of these test work as motive for further
study and encourage or admonish the students accordingly.
·
It fulfills the objective of learning and
retaining it for along time.
·
Teacher knows the hidden abilities,
capabilities, desires and interests of the students, and became able to guide
them accordingly on the basis of there.
Types of
Internal Assessment
Following are the types of Assessment
·
Daily Test
·
Weekly Test
·
Fortnightly Test
·
Monthly Test
·
Three monthly or Terminal Test
·
Annual exam or Annual Promotion Test
·
Entrance Test or admission Test
Merits:
1.
It is direct, flexible and can easily be tied with the
unit of instruction.
2.
It is economical in terms of time and money and can be
conducted frequently.
3.
There is little scope of mal-practices and the students
get satisfaction (by receiving back their scripts) that they have been
accurately graded.
4.
It permits the use of a variety of evaluation tools and
the results can be used for the improvement of teaching learning processes and
providing remedial teaching.
5.
The student accepts it as of a variety of evaluation
tools and the results can before the improvement of teaching learning processes
and providing remedial teaching.
6.
The student accepts it as part of teaching learning
process and faces it without squirm or fear.
7.
It provides essential date for the cumulative record,
for grouping students according to their ability, and for reporting to parents
as well as for making decisions with regard to annual promotion.
8.
It has content validity and scares are sufficiently
reliable.
9.
Cheaper: Hiring an external evaluator often means
someone HARC with lots of graduate education and years of expertise, and that
doesn’t come cheap
10.
Doesn’t require collaboration: This makes the process
faster
Demerits
1.
Every teacher is not competent to construct and use
these techniques of evaluation.
2.
Internal assessment tends to lead to indiscreet
comparison of students.
3.
It is not possible to apply internal evaluation in
respect of thousands of private candidates.
4.
Teacher can yield to local pressures.
5.
Grades will vary from school and will not have uniform
significance.
6.
Pupils and their parents have lessor faith in internal
evaluation.
7.
Teachers having freedom of evaluating their own
students, may tend to be lax in covering the prescribed syllabus.
8.
Perceived lack of objectivity
9.
Lack of “outside the box” thinking
EXTERNAL EVALUATION
1.
External Assessment is organized and conducted through
standardized test, observation, and other techniques by an external agency,
other than the school.
2.
Process of
External Assessment Conduct
a.
Setting and moderation of question papers.
b.
Printing and packing of question papers confidential
nature of printing work.
c.
Selection of examination centres
d.
Appointment of superintendents and invigilators and
staff for the fair conduct of examination at centres.
e.
Supply of stationary to centres.
f.
Distribution of question papers to examinees under the
supervision of the centre superintendent.
g.
Posting of police personnel at the centres.
h.
Packing of answer scripts and sending them to Board’s
office or examining body’s office.
i.
Deployment of special squads for checking unfair means.
j.
Assignment of fake of fictitious or secret roll numbers
to answer books at the Board’s office.
k.
On the spot evaluation at some specified centres where
head examiner and examiners mark the scripts.
Importance
& Objectives of External Assessment:
External evaluation
provides
1.
Degree/Certificate
2.
A standard
3.
Comparison of abilities.
4.
To evaluate the progress of Institution
5.
Selection for Higher education
6.
To get employment
7.
Popularity/Standard of educational institution.
8.
Selection of intelligent students.
9.
Competition.
10.
Evaluation of teacher’s performance
11.
Evaluation of objectives and curriculum.
12.
Creation of good habits in students
13.
Satisfaction and happiness of parents
Merits
1.
Conducted by experts
2.
Perceived objectivity: Having a third-party do
your evaluation is like a stamp of approval. People tend to take the results
more seriously.
3.
Outside-the-box
perspective: Being one step removed, evaluators can see changes that have
happened that might have gone unnoticed (or at least unmeasured) by you and
your team.
De-Merits of External Assessment
1.
Use of unfair means in the examination hall.
2.
Just pass the exam/to get degree
3.
Partial curriculum is covered
4.
In complete evaluation of personality.
5.
Un reliable results.
6.
Use of helping books & guess papers.
7.
Chance/Luck
8.
Corruption
9.
Exams without specific objectives.
10.
Negative effect/Impact on the students.
11.
It is time consuming.
12.
Standards vary from Board to Board and University in
the same year.
13.
Marking is not up to the standard.
14.
Expensive: A good evaluator doesn’t come cheap, and you
get what you pay for.
15.
Requires collaboration: Collaboration is awesome when
done right, but it does take time and effort on both parties, and there can be
miscommunications between two teams just getting to know each other.
Suggestions for Improvement
1.
Comprehensive Evaluation
2.
Employees of examining bodies to be controlled.
3.
Invigilating staff.
4.
Secrecy sections should be fool proof.
5.
Appointment of Examiners
6.
Change in examination point of view, It should not be
objective, It should be mean to achieve objectives.
7.
Reform in question papers.
8.
Marking of Answer Scripts.
9.
Ban on helping books and guess papers.
10.
Amalgamation of Internal and External exam.
11.
Oral test should be taken.
12.
Amalgamation of subjective and objective type test.
13.
Record of students.
14.
Question paper should be based on curriculum rather
than text book.
In-spite of these flaws both is necessary for the betterment
of education system. Internal assessment prepares the students for external
Assessment. Therefore we can’t avoid any one. But we have to replace/remove the
negative points from these to make more effective to these systems.
Characteristic of a good test
1. Reliability
“Reliability refers
to the consistency of measurement—that is, how consistent test scores or other
evaluation results are from one measurement to other.”
Gronlund and Linn (1995)
2. Reliability is the “worthiness
with which a measuring device measures something; the degree to which a test or
other instrument of evaluation measures consistently whatever it does in fact
measure.”
C.V. Good (1973)
The dictionary meaning of reliability
is consistency, dependence or trust. So in measurement reliability is the
consistency with which a test yields the same result in measuring whatever it
does measure. Therefore reliability can be defined as the degree of consistency
between two measurements of the same thing.
For example we administered an
achievement test on Group-A and found a mean score of 55. Again after 3 days we
administered the same test on Group-A and found a mean score of 55. It
indicates that the measuring instrument (Achievement test) is providing a
stable or dependable result. On the other hand if in the second measurement the
test provides a mean score around 77 then we can say that the test scores are
not consistent.
Thus reliability answers to the
following questions:
How similar the test scores are if
the lost is administered twice?
How similar the test scores are if
two equivalent forms of tests are administered?
To what extent the scores of any
essay test differ when it is scored by different teachers?
It is not always possible to obtain
perfectly consistent results. Because there are several factors like physical
health, memory, guessing, fatigue, forgetting etc. which may affect the results
from one measurement to other? These extraneous variables may introduce some
error to our test scores. This error is called as measurement errors. So while
determining reliability of a test we must take into consideration the amount of
error present in measurement.
Methods
of Determining Reliability
Different types of consistency
are determined by different methods. These are as follows:
1. Consistency over a period of time.
2. Consistency over different forms
of instrument.
3. Consistency within the instrument
itself
There are four methods of
determining reliability coefficient, such as:
(a) Test-Retest method.
(b) Equivalent forms/Parallel forms
method.
(c) Split-half method.
(d) Rational
Equivalence/Kuder-Richardson method.
(а) Test-Retest Method:
This is the simplest method of
determining the test reliability. To determine reliability in this method the
test is given and repeated on same group. Then the correlation between the
first set of scores and second set of scores is obtained. A high coefficient of
correlation indicates high stability of test scores. Measures of stability in
the .80’s and .90’s are commonly reported for standardized tests over
occasions within the same year.
(b) Equivalent Forms/Parallel Forms Method:
Reliability of test scores can be
estimated by equivalent forms method. It is also otherwise known as Alternate
forms or parallel forms method. When two equivalent forms of tests can be constructed
the correlation between the two may be taken as measures of the self
correlation of the test. In this process two parallel forms of tests are
administered to the same group of pupils in short interval of time, then the
scores of both the tests are correlated. This correlation provides the index
of equivalence. Usually in case of standardized psychological and achievement
tests the equivalent forms are available.
Both the tests selected for
administration should be parallel in terms of content, difficulty, format and
length. When time gap between the administrations of two forms of tests are
provided the coefficient of test scores provide a measure of reliability and
equivalence. But the major drawback with this method is to get two parallel
forms of tests. When the tests are not exactly equal in terms of content,
difficulty, length and comparison between the scores obtained from these tests
may lead to erroneous decisions.
(c) Split-Half Method:
In this method a single test is
administered to a group of pupils in usual manner. Then the test is divided
into two equivalent values and correlation for these half-tests are found.
The common procedure of splitting the
test is to take all odd numbered items i.e. 1, 3, 5, etc. in one half and all
even-numbered items i.e. 2, 4, 6, 8 etc. in the other half
Then scores of both the halves are
correlated by using the Spearman- Brown formula.
For example by correlating both the
halves we found a coefficient of .70.
By using formula (5.1) we can
get the reliability coefficient on full test as:
The reliability coefficient .82 when
the coefficient of correlation between half test is .70. It indicates to what
extent the sample of test items are dependable sample of the content being
measured—internal consistency.
“Split half reliabilities tend to be higher
than equivalent form reliabilities because the split half method is based on
the administration of a single test form.” This method
over-comes the problem of equivalent forms method introduced due to differences
from form to form, in attention, speed of work, effort, fatigue and test
content etc.
Factors Affecting Reliability:
The major factors which affect the reliability of test, scores can be
categorized in to three headings:
1. Factors related to test.
2. Factors related to testee.
3. Factors related to testing
procedure.
1. Factors related to test:
(а) Length of the test:
Spearman Brown formula indicates the longer the test
is, the higher the reliability will be. Because a longer test will provide
adequate sample of the behaviour. Another cause is that guessing factor is apt
to be neutralized in a longer test.
(b) Content of the test:
Content homogeneity is also a factor
which results is high reliability.
(c) Characteristics of items:
The difficulty level and clarity of
expression of a test item also affect the reliability of test scores. If test
items are too easy or difficult for the group members it will tend to produce
scores of low reliability. Because both the tests have a restricted spread of
scores.
(d) Spread of Scores:
According to Gronlund and Minn (1995)
“other things being equal, larger the spread of scores is the higher the
estimate of reliability will be.” When the spread of scores are large there is
greater chance of an individual to stay in the same relative position in a
group from one testing to another. We can say that errors of measurement affect
less to the relative position of the individual when the spread of scores are
large.
For example in Group A students have
secured marks ranging from 30 to 80 and in Group B student have secured marks
ranging from 65 to 75. If we shall administer the tests second time in Group A
the test scores of individuals could vary by several points, with very little
shifting in the relative position of the group members. It is because the
spread of scores in Group A is large.
2. Factors related to testee:
(a) Heterogeneity of the group:
When the group is a homogeneous group
the spread of the test scores is likely to be less and when the group tested is
a heterogeneous group the spread of scores is likely to be more. Therefore
reliability coefficient for a heterogeneous group will be more than homogeneous
group.
(b) Test wiseness of the
students:
Experience of test taking also affect
the reliability of test scores. Practice of the students in taking
sophisticated tests increases the test reliability. But when in a group all the
students do not have same level of test wiseness, it leads to greater
measurement errors.
(c) Motivation of the students:
When the students are not motivated
to take the test, they will not represent their best achievement. This depresses
the test scores.
3. Factors related to testing
procedure:
(a) Time Limit of test:
When the students get more time to
take the test they can make more guessing, which may increase the test scores.
Therefore by speeding up a test we can increase the test reliability.
(b) Cheating opportunity given
to the students:
Cheating by the students during the
test administration leads to measurement errors. This will make the observed
score of cheaters higher than their true score.
2. VALIDITY
Gronlund and Linn (1995)—”Validity
refers to the appropriateness of the interpretation made from test scores and
other evaluation results with regard to a particular use.”
Validity means truth-fullness of a
test. It means to what extent the test measures that, what the test maker
intends to measure.
Nature of Validity:
1. Validity refers to the
appropriateness of the test results but not to the instrument itself.
2. Validity does not exist on an
all-or-none basis but it is a matter of degree.
3. Tests are not valid for all
purposes. It is specific to particular
interpretation. For example the results of a vocabulary test may be highly
valid to test vocabulary but may not be that much valid to test composition
ability of the student.
4. Validity is not of different types.
It is a unitary concept. It is based on various types of evidence.
Factors
Affecting Validity:
1. Factors in the test:
(i) Unclear directions to the
students to respond the test.
(ii) Difficulty of the reading
vocabulary and sentence structure.
(iii) Too easy or too difficult test
items.
(iv) Ambiguous statements in the test
items.
(v) Inappropriate test items for
measuring a particular outcome.
(vi) Inadequate
time provided to take the test.
(vii) Length of the test is too
short.
(viii) Test items not arranged in
order of difficulty.
(ix) Identifiable pattern of answers.
Factors in Test Administration
and Scoring:
(i) Unfair aid to individual
students, who ask for help,
(ii) Cheating by the pupils during
testing.
(iii) Unreliable scoring of essay
type answers.
(iv) Insufficient time to complete
the test.
(v) Adverse physical and
psychological condition at the time of testing.
Factors related to Testee:
(i) Test anxiety of the students.
(ii) Physical and Psychological state
of the pupil,
(iii) Response set—a consistent
tendency to follow a certain pattern in responding the items.
3. Objectivity:
Objectivity in testing is “the extent
to which the instrument is free from personal error (personal bias), that is
subjectivity on the part of the scorer”.
C.V. Good (1973)
“Objectivity of a test refers to the
degree to which equally competent scores obtain the same results. So a test is
considered objective when it makes for the elimination of the scorer’s personal
opinion and bias judgement. In this context there are two aspects of
objectivity which should be kept in mind while constructing a test.”
Gronlund and Linn
(1995)
(i) Objectivity of Scoring:
Objectivity of scoring means same
person or different persons scoring the test at any time arrives at the same result
without may chance error. The scoring procedure should be such that there
should be no doubt as to whether an item is right or wrong or partly right or
partly wrong.
(ii) Objectivity of Test Items:
By item objectivity we mean that the
item must have one and only one interpretation by students. It means the test
items should be free from ambiguity. A given test item should mean the same
thing to all the students that the test maker intends to ask. Dual meaning
sentences, items having more than one correct answer should not be included in
the test as it makes the test subjective.
4. Usability:
Usability is another important
characteristic of measuring instruments. Because practical considerations of
the evaluation instruments cannot be neglected. The test must have practical
value from time, economy, and administration point of view. This may be termed
as usability.
So while constructing or selecting a test the following practical aspects
must be taken into account:
(i) Ease of Administration:
It means the test should be easy to
administer by simple and clear directions and the timing of the test should not
be too difficult.
(ii) Time required for
administration:
Appropriate time limit to take the
test should be provided. Gronlund and Linn (1995) are of the opinion that
“Somewhere between 20 and 60 minutes of testing time for each individual score
yielded by a published test is probably a fairly good guide”.
(iii) Ease of Interpretation
and Application:
Another important aspect of test
scores are interpretation of test scores and application of test results. If
the results are misinterpreted, it is harmful on the other hand if it is not
applied, then it is useless.
(iv) Availability of Equivalent
Forms:
Equivalent forms tests helps to
verify the questionable test scores. It also helps to eliminate the factor of
memory while retesting pupils on same domain of learning. Therefore equivalent
forms of the same test in terms of content, level of difficulty and other
characteristics should be available.
(v) Cost of Testing: It should be economical.
CONSTRUCTION OF TESTS
The four main steps in
construction of tests are:
1. Planning the Test
2. Preparing the Test
3. Try out the Test
4. Evaluating the Test.
Step 1. Planning the Test:
Planning of the test is the first
important step in the test construction. The main goal of evaluation process is
to collect valid, reliable and useful data about the student.
It includes
1. Determining the objectives of testing.
2. Preparing test specifications.
3. Selecting appropriate item types.
1. Determining the Objectives of Testing:
A test can be used for different
purposes in a teaching learning process such as
1.
An instrument to measure the entry performance of the
students.
2.
for formative evaluation.
3.
to find out the immediate learning difficulties and
4.
to assign grades
or to determine the mastery level of the students
5.
To suggest its remedies.
So these tests should cover the whole instructional
objectives and content areas of the course.
2. Preparing Test Specifications:
The second important step in the test
construction is to prepare the test specifications in order to be sure
that the test will measure a representative sample of the instructional
objectives and an elaborate design for test construction. One of the most commonly
used devices for this purpose is ‘Table of Specification’ or ‘Blue Print.’
Preparation of Table of
Specification/Blue Print:
Preparation of table of specification is the most
important task in the planning stage. It acts, as a guide for the test construction.
Table of specification or ‘Blue Print’ is a three dimensional chart showing
list of instructional objectives, content areas and types of items in its
dimensions.
It includes four major steps:
(i) Determining the weightage to different instructional
objectives.
(ii) Determining the weightage to different content
areas.
(iii) Determining the item types to be included.
(iv) Preparation of the table of specification.
(i) Determining the weightage
to different instructional objectives:
In a written test
we cannot measure the psychomotor domain and affective domain. We can only
measure the cognitive domain. It is also true that all the subjects do not
contain different learning objectives like knowledge, understanding,
application and skill in equal proportion. Therefore, it must be planned how
much weightage to be given to different instructional objectives by keeping in
mind the importance of the particular objective for that subject or
chapter.
For example, if we
may give the weightage to different instructional objectives in General Science
for Class—X as following:
(ii) Determining the weightage
to different content areas:
The second step in preparing the
table of specification is to outline the content area. It also prevents
repetition or omission of any unit. Now weightage should be given to which unit
should be decided by the concerned teacher by keeping the importance of the chapter in
mind, area covered by the topic in the text book and number of items to
be prepared.
For example
Weightage of a topic:
Table 3.2. Table showing
weightage given to different content areas:
(iii) Determining
the item types:
The third important step in preparing
table of specification is to decide appropriate item types. Items used in the
test construction can broadly be divided into two types like objective type
items and essay type items. For some instructional purposes, the objective type
items are most efficient where as for others the essay questions prove satisfactory.
Appropriate item types should be
selected according to the learning outcomes to be measured.
(iv) Preparing the
Three Way Chart:
Preparation of the three way chart is
last step in preparing table of specification. This chart relates the
instructional objectives to the content area and types of items. In a table of
specification the instructional objectives are listed across the top of the
table, content areas are listed down the left side of the table and under each
objective the types of items are listed content-wise. Table 3.3 is a model
table of specification for X class science.
Step 2. Preparing the Test:
After planning test items are constructed in
accordance with the table of specification. Each type of test item need special
care for construction.
The preparation stage includes
the following three functions:
(i) Preparing test items.
(ii) Preparing instruction for the test.
(iii) Preparing the scoring key.
(i) Preparing the Test Items:
Preparation of test items is the most important task
in the preparation step. Therefore care must be taken in preparing a test item.
The following principles help in preparing relevant test items.
1. Test items must be
appropriate for the learning outcome to be measured:
The test items should be so designed
that it will measure the performance described in the specific learning
outcomes.
2. Test items should
measure all types of instructional objectives and the whole content area:
The items in the test should be so
prepared that it will cover all the instructional objectives—Knowledge,
understanding, thinking skills and match the specific learning outcomes and
subject matter content being measured. When the items are constructed on the
basis of table of specification the items became relevant.
3. The test items should
be free from ambiguity:
The item should be clear.
Inappropriate vocabulary and awkward sentence structure should be avoided. The
items should be so worded that all pupils understand the task.
Example:
Poor item —Where did Gandhi born
Better —In which city did Gandhi
born?
4. The test items should
be of appropriate difficulty level:
The test items should be proper
difficulty level, so that it can discriminate properly. If the item is meant
for a criterion-referenced test its difficulty level should be as per the
difficulty level indicated by the statement of specific learning outcome.
Therefore if the learning task is easy the test item must be easy and if the
learning task is difficult then the test item must be difficult.
In a norm-referenced test the main
purpose is to discriminate pupils according to achievement so that the test
should be so designed that there must be a wide spread of test scores.
Therefore the items should not be so easy that everyone answers it correctly
and also it should not be so difficult that everyone fails to answer it. The
items should be of average difficulty level.
5. The test item must be
free from technical errors and irrelevant clues:
Sometimes there are some
unintentional clues in the statement of the item which helps the pupil to
answer correctly. For example grammatical inconsistencies, verbal associations,
extreme words (ever, seldom, always), and mechanical features (correct
statement is longer than the incorrect). Therefore while constructing a test
item careful step must be taken to avoid most of these clues.
6. Test items should be
free from racial, ethnic and sexual biasness:
The items should be universal in
nature. Care must be taken to make a culture fair item. While portraying a role
all the facilities of the society should be given equal importance. The terms
used in the test item should have an universal meaning to all members of group.
(ii) Preparing Instruction for the Test:
This is the most neglected aspect of
the test construction. Generally everybody gives attention to the construction
of test items. So the test makers do not attach directions with the test items.
But the validity and reliability of the test items to a great extent
depends upon the instructions for the test.
N.E. Gronlund has suggested
that the test maker should provide clear-cut direction about;
a. The purpose of testing.
b. The time allowed for answering.
c. The basis for answering.
d. The procedure for recording answers.
e. The methods to deal with guessing.
Direction about the Purpose of
Testing:
A written statement about the purpose
of the testing maintains the uniformity of the test. Therefore there must be a
written instruction about the purpose of the test before the test items.
Instruction about the time
allowed for answering:
Clear cut instruction must be
supplied to the pupils about the time allowed for whole test. It is also better
to indicate the approximate time required for answering each item, especially
in case of essay type questions. So that the test maker should carefully judge
the amount of time taking the types of items, age and ability of the students
and the nature of the learning outcomes expected. Experts are of the opinion
that it is better to allow more time than to deprive a slower student to answer
the question.
Instructions about basis for
answering:
Test maker should provide specific
direction on the basis of which the students will answer the item. Direction
must clearly state whether the students will select the answer or supply the
answer. In matching items what is the basis of matching the premises and
responses (states with capital or country with production) should be given.
Special directions are necessary for interpretive items. In the essay type
items clear direction must be given about the types of responses expected from
the pupils.
Instruction about recording
answer
Students should be instructed where
and how to record the answers. Answers may be recorded on the separate answer
sheets or on the test paper itself. If they have to answer in the test paper
itself then they must be directed, whether to write the correct answer or to
indicate the correct answer from among the alternatives. In case of separate
answer sheets used to answer the test direction may be given either in the test
paper or in the answer sheet.
Instruction about guessing:
Direction must be provided to the
students whether they should guess uncertain items or not in case of
recognition type of test items. If nothing is stated about guessing, then the bold
students will guess these items and others will answer only those items of
which they are confident. So that the bold pupils by chance will answer some
items correctly and secure a higher score. Therefore a direction must be given
‘to guess but not wild guesses.’
(iii) Preparing the Scoring Key:
A scoring key increases the reliability
of a test so that the test maker should provide the procedure for scoring the
answer scripts. Directions must be given whether the scoring will be made by a
scoring key or by a scoring stencil and how marks will be awarded to the test
items.
Thus a scoring key helps to obtain a
consistent data about the pupils’ performance. So the test maker should prepare
a comprehensive scoring procedure along with the test items.
Step 3. Try Out of the Test:
Try out helps us to identify defective and ambiguous
items, to determine the difficulty level of the test and to determine the
discriminating power of the items.
Try out involves two important
functions:
(a) Administration
of the test.
(b) Scoring the
test.
(a) Administration of the test:
Administration means administering
the prepared test on a sample of pupils. So the effectiveness of the final
form test depends upon a fair administration. It implies that the pupils must
be provided congenial physical and psychological environment during the time of
testing. Any other factor that may affect the testing procedure should be
controlled.
Physical environment means proper
sitting arrangement, proper light and ventilation and adequate space for
invigilation, Psychological environment refers to these aspects which influence
the mental condition of the pupil. Therefore steps should be taken to reduce
the anxiety of the students. The test should not be administered just before or
after a great occasion like annual sports on annual drama etc.
One should follow the following
principles during the test administration:
1. The teacher should talk as less as possible.
2. The teacher should not interrupt the students at
the time of testing.
3. The teacher should not give any hints to any
student who has asked about any item.
4. The teacher should provide proper invigilation in
order to prevent the students from cheating.
(b) Scoring the test:
Once the test is administered and the
answer scripts are obtained the next step is to score the answer scripts. A
scoring key may be provided for scoring when the answer is on the test paper
itself Scoring key is a sample answer script on which the correct answers are
recorded.
Step 4. Evaluating the Test:
Evaluating the test is most important
step in the test construction process. Evaluation is necessary to determine
the quality of the test and the quality of the responses. Quality of the test
implies that how good and dependable the test is? (Validity and reliability).
Quality of the responses means which items are misfit in the test. It also
enables us to evaluate the usability of the test in general class-room
situation.
Evaluating the test involves
following functions:
(a) Item analysis.
(b) Determining validity of the test.
(c) Determining reliability of the test.
(d) Determining
usability of the test.
(a) Item analysis:
Item analysis is a procedure
which helps us to find out the answers to the following questions:
a. Whether the items functions as intended?
b. Whether the test items have appropriate difficulty
level?
c. Whether the item is free from irrelevant clues and
other defects?
d. Whether the distracters in multiple choice type
items are effective?
The item analysis data also
helps us:
a. To provide a basis for efficient class discussion
of the test result
b. To provide a basis for the remedial works
c. To increase skill in test construction
d. To improve class-room discussion.
Item Analysis Procedure:
Item analysis procedure gives special emphasis on item
difficulty level and item discriminating power.
The item analysis procedure
follows the following steps:
1. The test papers should be ranked from highest to
lowest.
2. Select 27% test papers from highest and 27% from
lowest end.
For example if the test is administered on 60 students
then select 16 test papers from highest end and 16 test papers from lowest end.
3. Keep aside the other test papers as they are not
required in the item analysis.
4. Tabulate the number of pupils in the upper and
lower group who selected each alternative for each test item. This can be done
on the back of the test paper or a separate test item card may be used
5. Calculate item difficulty
for each item by using formula:
Where R= Total number of students got the item
correct.
T = Total number of students tried the item.
In our example out of 32 students from both the groups
20 students have answered the item correctly and 30 students have tried the
item.
The item difficulty is as
following:
It implies that the item has a proper difficulty
level. Because it is customary to follow 25% to 75% rule to consider the item
difficulty. It means if an item has a item difficulty more than 75% then is a
too easy item if it is less than 25% then item is a too difficult item.
6. Calculate item
discriminating power by using the following formula:
Where RU= Students from
upper group who got the answer correct.
RL= Students from
lower group who got the answer correct.
T/2 = half of the total number of pupils included in
the item analysis.
In our example 15 students from upper group responded
the item correctly and 5 from lower group responded the item correctly.
A high positive ratio indicates the
high discriminating power. Here .63 indicates an average discriminating power.
If all the 16 students from lower group and 16 students from upper group
answers the item correctly then the discriminating power will be 0.00.
It indicates that the item has no
discriminating power. If all the 16 students from upper group answer the item
correctly and all the students from lower group answer the item in correctly
then the item discriminating power will be 1.00 it indicates an item with
maximum positive discriminating power.
Preparing a test item file:
Once the item analysis process is
over we can get a list of effective items. Now the task is to make a file of
the effective items. It can be done with item analysis cards. The items should
be arranged according to the order of difficulty. While filing the items the
objectives and the content area that it measures must be kept in mind. This
helps in the future use of the item.
(b) Determining Validity of the Test:
At the time of evaluation it is estimated that to what
extent the test measures what the test maker intends to measure.
(c) Determining Reliability of the Test:
Evaluation process also estimates to what extent a
test is consistent from one measurement to other. Otherwise the results of the
test can not be dependable.
(d) Determining the Usability of the Test:
Try out and the evaluation process indicates to what
extent a test is usable in general class-room condition. It implies that how
far a test is usable from administration, scoring, time and economic point of
view.
CONTINUOUS
AND COMPREHENSIVE EVALUATION,
CCE (concept, need and relevance)
Continuous
and Comprehensive Evaluation (CCE) system was introduced by the Central Board
of Secondary Education (CBSE) in India for students of sixth to tenth grades.
In this scheme the term `continuous’ means regularity of assessment, frequency
of unit testing, diagnosis of learning gaps, use of corrective measures,
retesting and for their self evaluation.
The
second term `comprehensive’ means that the scheme attempts to cover both the
scholastic and the co scholastic aspects of students’ growth and development.
The
main aim is to evaluate every aspect of the child during their presence at the
school.
1. It
assesses all aspects of a student’s development on a continuous basis
throughout the year.
2. The
assessment covers both scholastic subjects as well as co-scholastic areas such
as performance in sports, art, music, dance, drama, and other cultural
activities and social qualities.
3. It
is a developmental process of assessment which emphasizes on two fold
objectives. These objectives are continuity in evaluation and assessment
of broad based learning and behaviourial outcomes on the other.
4. This
is non-threatening for all children including those with special needs as it
discourages irrational comparison and labeling with no fear of examination.
5. It
brings a change to the usual chalk and talk method.
6. It
finds out the learning difficulties and can give remedial measures.
7. It
brings flexibility to plan academic schedules.
8. It
reduces workload on students and improves overall skill and ability of the
student by means of evaluation of other activities.
9. In
this the marks of the students are replaced by grades. Grades are awarded to
students based on work experience, skills, dexterity, innovation, steadiness,
team work public speaking, behaviour etc to evaluate and present an overall
measure of the students ability. This helps the students who are not good in
academics to show their talent in other fields such as arts, humanities,
sports, music athletics etc.
10. It is done
in projects, assignments, practical, seminar records and collections which are
graded based on specific grading indicators.
11. Co
scholastic abilities are also considered in terms of work experience, art
education health and physical education.
12. It
makes children and parents as active participants in learning and development
of children.
13. Opportunities
of self-assessments and peer assessments enable children take charge of their
learning and gradually progress towards self-learning.
14. Sharing
of their learning progress with timely feedback during teaching learning and
constructive suggestions during quarterly Parent-Teacher Meetings (PTMs) makes
them aware of the extent of accomplishment and be prepared for the further
efforts required to be undertaken.
15. Rational
division of the syllabus to be covered in each quarter may be planned in
advance for the yearly academic calendars.
16. Teachers’
suggestions and participation towards development of such plans needs to be
ensured. If possible, such a planning may be done at the school level.
17. Resources
and activities may only be suggestive and teachers need to be given freedom to
chose or devise new learning aids or strategies.
18. Assessment
questions, exercises, assignments need to be process based and allow children
to think critically and explore.
19. They
should not assess rote memory of children.
20. The
written tests if evaluated using marks or grades need to be supported with
qualitative descriptions as marks or grades can help you to decide the learning
level but remarks highlight the gaps and the suggestions for improvement.
21. The
levels assigned for different learning outcomes under different curricular
areas provide useful information to the teachers on how many children are
lagging behind on the specific learning outcome(s).
Hence, the data from the quarterly progress reports further
provide insights to not just students but also the teachers on how to review
their teaching learning to take steps (assessment for learning) for the next
quarter.
Types of Grading Systems
There are 7 Types of grading systems available. They are :
1. Percentage
Grading – From 0 to 100 Percent
2. Letter
grading and variations – From A Grade to F Grade
3. Norm-referenced
grading – Comparing students to each other usually letter grades
4. Mastery
grading – Grading students as “masters” or “passers” when their attainment
reaches a pre specified level
5. Pass/Fail
– Using the Common Scale as Pass/Fail
6. Standards
(or Absolute-Standards) grading – Comparing student performance to a pre
established standard (level) of performance
7. Narrative
grading -Writing Comments about students
1.Grading
System in India
Percentage
|
Grade Point
|
Grade
|
Classification/ Division
|
60–100
|
3.5–4.0
|
A or (O)
|
First class/ Distinction / Outstanding
|
55–59
|
3.15–3.49
|
B+
|
Second Class
|
50–54
|
2.5–3.14
|
B
|
Second Class
|
43–49
|
2.15–2.49
|
C+
|
Third Division
|
35*–42
|
1.5–2.14
|
C
|
Fail/Third Division
|
0–34
|
0–1.49
|
F
|
Fail
|
Grading
Grading in
education is the process of applying standardized measurements of varying
levels of achievement in a course. Grading system gives verbal description and
symbols to the achievement rather than scoring numerically traditional marking
scheme.
Grades can be assigned
as letters (for example A through F), as a range (for example 1 to 6), as a
percentage of a total number of questions answered correctly, or as a number
out of a possible total (for example out of 20 or 100). Grading system is a method used by
teachers to assess students’ educational performance. In early times, simple
marking procedure was used by educators. But now, a proper grading system is
followed by every educational institute. The grades such as A, A-, A+, B, B-,
B+, C, D E and so on are used to evaluate the performance of a student in a
test, presentation or final examination. Each grade contains a range of percentage
or marks
Advantages of
Grading System in Education:
1. Takes
the pressure off from the students at certain levels:
In a general grading system as considered above, a student’s
real scores and its associated marks are not accounted on the official transcript,
which denotes that their GPA will not have an effect on either a pass or a fail
mark category. This spares the students from getting preoccupied and become
fussy about getting an elevated letter grade.
2.
Grading Pattern description:
Students are bundled and grouped according to the different
types of grading scales they get which are entirely based on the marks that
they get in each subject that is taught in school.
In case of India the general pattern is as follows
A1:
91 to 100
A2 :
81 to 90
B1:
71 to 80
B2:
61 to 70
C1 :
51 to 60
C2 :
41 to 50
D for
33 to 40 and lesser for E’s.
Another advantage of this method is that it has introduced
the notion of measuring the students’ knowledge based on their internal
assignments, projects, and their answering ability in class and their overall
performance in all the major examinations. It is not just a solitary
examination forced method. Earlier the marks that were obtained in the exams
are the only indicator of whether a child is studying or not. But, this system
analyzes whether a child understands the concept or not.
3. Gives
the students an obvious idea about their weaknesses and strengths:
Knowing precisely which subject(s) are their weak spots,
students can easily decide where to toggle their focal point on. In a grading
system where the alphabets are the scales, a grade of C or grade of D is known
to speak a lot.
So, when the total grades arrive these students can easily
get to know their forte.
4. Make
class work easier:
The student does not need to toil them to achieve the
necessary minimum.
5. Leads
to better ideas:
Classes or the courses that are often taught in a classroom
medium within the confined premises of a school are highly difficult and are
taken in the ultimate sense as getting a pass or a fail on a subject and
this builds a sense of responsibility in their minds to work and
train hard in their weak spots.
Disadvantages of Grading System in Education:
Also, the following points can be considered as worthy of our
importance while considering the disadvantages of grading system in education.
They are,
1. It
doesn’t instill a sense of competition:
When all that required is a mere pass mark, we would neither
have the urge to outperform others nor do we want to excel with the overall
grades.
The A grade speaks a lot about our calibre than a D or an F.
With a D or an F, we can be only satisfied that we are okay enough in studies,
which will make us go lazy.
2. Not
an accurate representation of the performance and the knowledge gained:
As we have said already, passing in an examination cannot be
considered as plausible enough to declare that the same student has gained an
immense amount of knowledge by these exams.
An alphabet cannot explain the inner knowledge gained by a
student and there is no easy way of gauging a student’s level of performance
and knowledge in the examinations.
3. It is
not an exact scoring system:
the inner knowledge we have gained via these grades can be
nil, as we may have attempted for learning without understanding the concept,
with the sole perspective of getting an A or a C.
4. Demotivation: Grading system demotivates the students who perform
higher because they stand equal to those making less efforts. For instance,
grade A will be assigned to all those scoring from 90 to 100. So students who
made no mistakes and those who made a few, all will stand equally at one grade.
5. Increased Lethargy: As grading system has divided the marks among
different tasks such as assignments, presentations and final exams, the
students become lethargic due to it. They score enough in assignments and
projects and become lesser active in final exams.
What is GPA
and CGPA?
Grade Point Average
The GPA is calculated by taking the number of grade points a
student earned in a given period of time in school and in undergraduate,
graduate and postgraduate courses in most universities. GPA is an
abbreviation for Grade Point Average. It is a standard method of calculating a
student’s average grade over a stipulated period, like one term/semester.
GPA is calculated by dividing the average of grade points a
student achieves, by the total credit hours attended by the student.
GPA, or Grade Point Average,
is a number that indicates how well or how high you scored in your
courses on average. This number is then used to assess whether you meet
the standards and expectations set by the degree programme or university
A cumulative grade point average (CGPA)
is a calculation of the average of all of a student's total earned points
divided by the possible number of points. This grading system calculates for
all of his or her complete education career.
CGPA refers to ‘Cumulative Grade Point Average’. It is used
to denote a student’s overall average performance throughout their academic
program in high school, Bachelor’s, or Master’s program. To start off
with, credit hours are the total amount of time a student spends in classes.
Grade points are the marks you receive for your subjects.
TO CALCULATE CGPA
Divide your total score of grade points for all subjects
throughout your semesters by the total number of credit hours attended
throughout your semesters. GPA and CGPA are indicated by a number as
opposed to percentages, and grades are assigned under the Indian grading
system.
DIRECT GRADING
Performance is assessed in qualitative terms
Evaluator gives grades such as A B C D E F according to the
standards without assigning scores.
Preferred to non cognitive learning outcomes
ADVANTAGES
OF DIRECT GRADING
Simplifies the process of
assessment
Makes a raw assessment on a
raw scale
Uses a uniform scale for the
assessment of quality
Separates assessment of
quality and range
INDIRECT GRADING
Evaluator gives grades through marks. Convert marks to grades
This is of two types
They are absolute and relative
Absolute grading
Relative grading
ABSOLUTE GRADING
Based on predetermined standard which becomes a reference
point for assessing students performance. Direct conversion of marks into
grades irrespective of the distribution of marks
For example, a common absolute grading scale would be
A = 90-100
B = 80-89
C = 70-79
D = 60-69
F = 0-59
B = 80-89
C = 70-79
D = 60-69
F = 0-59
Whatever score the student earns is their grade.
There are no adjustments made to their grade. For example, if everyone
gets a score between 90-100 everyone gets an “A” or if everyone gets below 59
everyone gets an “F.” The absolute nature of absolute grading makes it
inflexible and constraining for unique situations.
RELATIVE GRADING
Range varies in tune with the relative position of the group.
The evaluation is done according to the performance of members. Relative
grading allows for the teacher to interpret the results of an assessment and
determine grades based on student performance.
A = Top 10% of students
B = Next 25% of students
C = Middle 30% of students
D = Next 25% of students
F = Bottom 10% of students
B = Next 25% of students
C = Middle 30% of students
D = Next 25% of students
F = Bottom 10% of students
As such, if the entire class had a score on an exam
between 90-100% using relative grading would still create a distribution that
is balanced.
WEIGHTED AVERAGES
We can calculate the arithmetic mean or elementary average
of the measurements by summing them and dividing by the number of measurements.
However, in certain situations, some measurements count more than others, and
to get a meaningful average, we have to assign weight to the measurements. The
usual way to do this is to multiply each measurement by a factor that indicates
its weight, then sum the new values, and divide by the number of weight units
we assigned.
Mathematically
When calculating an arithmetic average, first sum all the
measurements (m) and divide by the number of measurements (n).
∑(m1...mn)
÷ n
where
the symbol ∑ means "sum all the measurements from 1 to n."
To
calculate a weighted mean, multiply each measurement by a weighting factor (w).
In
most cases, the weighting factors add up to 1 or, if you are using percentages,
to 100 percent. If they don't add up to 1, use this formula:
∑
(m1w1...mnwn) ÷ ∑(w1...wn)
or simply ∑mw ÷ ∑w
Weighted Averages in the Classroom
Teachers typically use weighted averages to assign
appropriate importance to classwork, homework, quizzes and exams when
calculating final grades.
For example, in a
certain physics class, the following weights may be assigned:
- Lab work: 20 percent
- Homework: 20 percent
- Quizzes: 20 percent
- Final Exam: 40 percent
In this case, all the weights add up to 100 percent, so a
student's score can be calculated as follows:
[(Lab work score) * 0.2 + (homework) * 0.2 + (quizzes) *
0.2 + (final exam) * 0.4]
If a student's grades were 75 percent for lab work, 80
percent for homework, 70 percent for quizzes and 75 percent for the final exam,
her final grade would be:
= (75) * 0.2 + (80) * 0.2 + (70) * 0.2 + (75) * 0.4
= 15 + 16 + 14 + 30 = 75 percent.
WEIGHTED SCORE
A weighted score or weighted grade is merely the average of
a set of grades, where each set carries a different amount of importance.
Suppose your final grade will be determined in this manner:
Percentage of your Grade By Category
- Homework:
10%
- Quizzes:
20%
- Essays:
20%
- Midterm:
25%
- Final:
25%
Eg 1
Category Averages:
- Homework
average: 98%
- Quiz
average: 84%
- Essay
average: 91%
- Midterm:
64%
- Final: ?
To figure out the math and determine what kind of studying
efforts,we need to follow a 3-part process:
Step 1:
Set
up an equation with goal percentage
(80%) in mind:
H%*(H
average) + Q%*(Q average) + E%*(E average) + M%*(M average) + F%*(F average) =
80%
Step 2:
Next,
we multiply the percentage of grade by
the average in each category:
- Homework: 10% of grade * 98% in
category = (.10)(.98) = 0.098
- Quiz average: 20% of grade *
84% in category = (.20)(.84) = 0.168
- Essay average: 20% of grade *
91% in category = (.20)(.91) = 0.182
- Midterm: 25% of grade * 64% in
category = (.25)(.64) = 0.16
- Final: 25% of grade * X in
category = (.25)(x) = ?
Step 3:
Finally
we, add them up and solve for x:
0.098 + 0.168 + 0.182 + 0.16 + .25x = .80
0.608 + .25x = .80
.25x = .80 – 0.608
.25x = .192
x = .192/.25
x = .768
x = 77%
0.098 + 0.168 + 0.182 + 0.16 + .25x = .80
0.608 + .25x = .80
.25x = .80 – 0.608
.25x = .192
x = .192/.25
x = .768
x = 77%
Teacher
uses weighted scores for final exam.
Marks:
- It measures intelligence of students based on marks and
ranks.
- Students in this system are aim to get only one thing –
good scores in each subject.
- Students in the system are always encouraged to
outperform each other which may not give fruitful result.
- Many times it puts students as well as parents under a
lot of pressure for the different reasons like low score in academic,
rewrite mistaken task etc.
- Always note that the passion to outperform makes
students ready to take pressure in Higher classes. In this marks system
pressure over students’ increases only.
Grades:
- In the grade system one of the best thing that measures
students intelligence based on their performance instead of marks and
ranks.
- Grade system believes to encourage overall development
of students rather than only academic such as personality development,
social development etc.
- There is very poor competitiveness among students
whether students are encouraged to focus on their aim for own success.
- The system relieves students and parents from
unnecessary pressure, it believes to set them free so that students will
achieve their aim and parents will know what our kids like or love to do
in their lives. In the long run School life become so easy for everyone.
- It makes students to face difficulty in coping up with
the pressure during higher studies.
Non-standardized assessment looks at an
individual's performance, and does not allow us to compare that performance to
another's. It allows us to obtain specific information about that particular
student
Forms of Non-Standardized Testing
Forms
include portfolios, interviews, informal questioning, group discussions, oral
tests, quick pop quizzes, and exhibitions of work, projects and performance
exams.
ESSAY
An essay is generally a short piece
of writing outlining the writer’s perspective or story. It is often considered
synonymous with a story or a paper or an article. Essays can be both formal as
well as informal. Formal essays are generally academic in nature and tackle
serious topics.
Types of Essays
There are broadly four types of
essays.
1.
Narrative Essays: This is when the writer is narrating
an incident or story through the essay. So these are in the first person. The
aim when writing narrative essays is to involve the reader in them as if they
were right there when it was happening. SO make them as vivid and real as
possible. So you must involve the reader in the story.
2.
Descriptive Essays: Here the writer will describe a place, an
object, an event or maybe even a memory. But it is not just plainly describing
things. The writer must paint a picture through his words. One clever way to do
that is to evoke the senses of the reader. Do not only rely on sight but also
involve the other senses of smell, touch, sound etc. A descriptive essay when
done well will make the reader feel the emotions the writer was feeling at the
moment.
3.
Expository Essays: In such an essay a writer presents a
balanced study of a topic. To write such an essay, the writer must have real
and extensive knowledge about the subject. There is no scope for the writer’s
feelings or emotions in an expository essay. It is completely based on facts,
statistics, examples etc. There are sub-types here like contrast essays, cause
and effect essays etc.
4.
Persuasive Essays: Here the purpose of the essay is to get
the reader to your side of the argument. A persuasive essay is not just a
presentation of facts but an attempt to convince the reader of the writer’s
point of view. Both sides of the argument have to presented in these essays.
But the ultimate aim is to persuade the readers that the writer’s argument
carries more weight.
Format of an Essay
Now there is no rigid format of an
essay. It is a creative process so it should not be confined within boundaries.
However, a basic structure followed is
Introduction
The writer introduces his topic for
the very first time about 4-6 lines. It gives a very brief synopsis of your
essay. You can start with a quote or a proverb or with a definition or with a
question.
Body
The body is between the introduction
and the conclusion. So the most vital and important content of the essay will
be here. It can extend to two or more paragraphs according to the content. It
is important to organize your thoughts and content. Write the information in a
systematic flow so that the reader can comprehend. So, for example, you were
narrating an incident. The best manner to do this would be to go in a
chronological order.
Conclusion
This is the last paragraph of the
essay. Sometimes a conclusion will just mirror the introductory paragraph but
make sure the words and syntax are different. A conclusion is also a great
place to sum up a story or an argument. You can round up your essay by
providing some moral or wrapping up a story. Make sure you complete your essays
with the conclusion, leave no hanging threads.
SHORT ANSWER TYPE
Short-answer questions are open-ended questions
used in examinations to assess the basic knowledge and understanding (low
cognitive levels) of a topic before more in-depth assessment questions are
asked on the topic.
Short Answer Questions
do not have a generic structure. Questions may require answers such as complete
the sentence, supply the missing word, short descriptive or qualitative
answers, diagrams with explanations etc. The answer is usually short, from one
word to a few lines. Often students may answer in bullet form.
Example
Example
1.
MHz measures the _________________ of the computer.
2.
List the different types of plastic surgery procedures.
·
Short Answer Questions are relatively fast to mark
·
They are also relatively easy to set
·
Short Answer Questions can
be used as part of a formative and summative assessment,
·
Unlike MCQs, there is no guessing on answers.
- short responses.
- the
assessor is very clear on the type of answers expected
- students
are not free to answer any way they choose
- short-answer questions can lead to difficulties in
grading if the question is not worded
carefully.
- Short
Answer Questions are typically used for assessing knowledge only,
- students
may often memorize Short Answer Questions with rote learning.
- Accuracy of
assessment may be influenced by handwriting/spelling skills
- There can
be time management issues when answering Short Answer Questions
- Design
learning objective
- Make sure
the content of the short answer question measures knowledge appropriate to
the desired learning goal
- Express the
questions with clear wordings and language which are appropriate to the
students
- Ensure
there is only one clearly correct answer in each question
- Ensure that
the item clearly specifies how the question should be answered
- Consider
whether the positioning of the item blank promote efficient scoring
- Write the
instructions clearly so as to specify the desired knowledge and
specificity of response
- Set the
questions explicitly and precisely.
- Direct
questions are better than those which require completing the sentences.
- For
numerical answers, let the students know if they will receive marks for
showing partial work (process based) or only the results (product based),
also indicated the importance of the units.
- Let the
students know what your marking style is like, is bullet point format
acceptable, or does it have to be an essay format?
- Prepare a
structured marking sheet; allocate marks or part-marks for acceptable
answer(s).
- Be prepared
to accept other equally acceptable answers, some of which you may not have
predicted.
True/False Test Taking Strategies
The following strategies will
enhance your ability to answer true/false questions correctly:
1.
Approach each
statement as if it were true.
Approach each statement as if it were true and then determine if any part of the statement is false. Just one false part in a statement will make the entire statement false.
Approach each statement as if it were true and then determine if any part of the statement is false. Just one false part in a statement will make the entire statement false.
2.
For a sentence to
be true, every part must be "true".
At first glance, a sentence may appear to be true because it contains facts and statements that are true. However, if just one part of the sentence if false, then the entire sentence is false. A sentence may be mostly true because it contains correct information but it is ultimately false if it contains any incorrect information.
At first glance, a sentence may appear to be true because it contains facts and statements that are true. However, if just one part of the sentence if false, then the entire sentence is false. A sentence may be mostly true because it contains correct information but it is ultimately false if it contains any incorrect information.
3.
Pay attention for
"qualifiers".
Qualifiers words like: Sometimes, seldom, few, always, every, often, frequently, never, generally, ordinarily restrict or open up the possibilities of making accurate statements. More modest qualifiers, such as "sometimes, often, many, few, generally, etc", are more likely to reflect a true statement, sentence, or answer. Stricter qualifiers, such as "always" or "never", often reflect a false statement, sentence, or answer.
Qualifiers words like: Sometimes, seldom, few, always, every, often, frequently, never, generally, ordinarily restrict or open up the possibilities of making accurate statements. More modest qualifiers, such as "sometimes, often, many, few, generally, etc", are more likely to reflect a true statement, sentence, or answer. Stricter qualifiers, such as "always" or "never", often reflect a false statement, sentence, or answer.
4.
Don't let
"negatives" confuse you.
Negatives, such as "no, not, cannot", can be confusing within the context of a true/false sentence or statement. If at true/false sentence contains a negative, drop the negative word and then read what remains. Without the negative, determine whether the sentence is true of false. If the sentence (without the negative) is true, then the correct answer would be "false".
Negatives, such as "no, not, cannot", can be confusing within the context of a true/false sentence or statement. If at true/false sentence contains a negative, drop the negative word and then read what remains. Without the negative, determine whether the sentence is true of false. If the sentence (without the negative) is true, then the correct answer would be "false".
5.
Watch for
statements with double negatives.
Statements with two negative words are positive. For example, "It is unlikely the car will not win the race." is the same as "It is likely the car will win the race. Negative words include not and cannot along with words beginning with the prefixes dis-, il-, im-, in-, ir-, non-, and un-.
Statements with two negative words are positive. For example, "It is unlikely the car will not win the race." is the same as "It is likely the car will win the race. Negative words include not and cannot along with words beginning with the prefixes dis-, il-, im-, in-, ir-, non-, and un-.
6.
Pay attention for
"absolute" qualifiers.
As we already discussed, qualifiers open up or restrict the possibilities of a statement being true of false. Absolute qualifiers, such as: all, always, never, entirely, completely, best, worst, none, absolutely which do not allow for exceptions imply that the statement must be true 100% of time. In most cases, statements that contain absolute qualifiers are false.
As we already discussed, qualifiers open up or restrict the possibilities of a statement being true of false. Absolute qualifiers, such as: all, always, never, entirely, completely, best, worst, none, absolutely which do not allow for exceptions imply that the statement must be true 100% of time. In most cases, statements that contain absolute qualifiers are false.
7.
Thoroughly examine
long sentences and statements.
Long sentences often contain groups of words and phrases separated or organized by punctuation. Read each word set and phrase individually and carefully. If one word set or phrase in the statement is false (even if the rest are true) then the entire statement is false and the answer is "false".
Long sentences often contain groups of words and phrases separated or organized by punctuation. Read each word set and phrase individually and carefully. If one word set or phrase in the statement is false (even if the rest are true) then the entire statement is false and the answer is "false".
8.
Make an educated
guess.
If it will not negatively impact your score, and you're unsure of the answer, make an educated guess. You have a 1 in 2 chance of being right. However, truth be told, often true/false tests contain more true answers than false answers. So if you're completely unsure, guess "true".
If it will not negatively impact your score, and you're unsure of the answer, make an educated guess. You have a 1 in 2 chance of being right. However, truth be told, often true/false tests contain more true answers than false answers. So if you're completely unsure, guess "true".
9.
Longer statements
may be false.
The longer a true/false statement, the greater the likelihood the statement will be false. The longer the statement, the more chance one part will be false.
The longer a true/false statement, the greater the likelihood the statement will be false. The longer the statement, the more chance one part will be false.
10. Reason statements tend to be false.
Questions that state a reason tend to be false. Words including "because, reason, since, etc" often indicate a "reason" statement.
Questions that state a reason tend to be false. Words including "because, reason, since, etc" often indicate a "reason" statement.
11. Budget your time.
Before tackling even one true/false question, take a look at the entire test to see how many questions there are. If the test has 60 true/false questions, and you have a 1 hour time limit, then you should spend no more than 1 minute on each question. While some questions will require more time than others, remember, you can't spend a lot of time on any one question.
Before tackling even one true/false question, take a look at the entire test to see how many questions there are. If the test has 60 true/false questions, and you have a 1 hour time limit, then you should spend no more than 1 minute on each question. While some questions will require more time than others, remember, you can't spend a lot of time on any one question.
RATING SCALE
Definition
Rating scale is defined as a closed-ended survey question to rate an
attribute or feature. Rating scale is a variant of the popular multiple-choice question which is widely
used to gather information that provides relative information about a specific
topic.
Types of Rating Scale: Ordinal and Interval Scales.
An ordinal
scale is a scale the depicts the answer options in an ordered
manner.
An interval scale is a scale where not only is the order of
the answer variables established but the magnitude of difference between each
answer variable is also calculable. Absolute or true zero value is not present
in an interval scale. Temperature in Celsius or Fahrenheit is the most popular
example of an interval scale. Net Promoter Score, Likert Scale, Bipolar Matrix Table are some of
the most effective types of interval scale.
There are four primary types of rating scales which can be
suitably used in an online survey:
·
Graphic Rating Scale
·
Numerical Rating Scale
·
Descriptive Rating Scale
·
Comparative Rating Scale
1.
Graphic Rating Scale: Graphic rating scale indicates the answer options on a scale
of 1-3, 1-5, etc. Respondents can select a particular option on a line or
scale to depict rating. Likert Scale is a popular graphic rating
scale example.
2.
Numerical Rating Scale: Numerical rating scale has numbers as answer options
3.
Descriptive Rating
Scale: In a descriptive
rating scale, each answer option is elaborately explained for the respondents.
for example, a customer satisfaction survey, which needs to
describe all the answer options in detail
4.
Comparative
Rating Scale: Comparative rating scale, as
the name suggests, expects respondents to answer a particular question in terms
of comparison, i.e. on the basis of relative measurement or keeping other
organizations/products/features as a reference.
Uses of Rating Scale
- Gain relative
information about a particular subject
- Compare and analyze
data:
- Measure one important product/service
element:
Advantages of rating scale
- Rating scale questions are easy to understand
and implement.
- Offers a comparative analysis of quantitative data
- Using graphic rating scales, it is easy for
researchers to create surveys
- Abundant information can be collected and
analyzed using a rating scale.
- The analysis of answer is quick and less
time-consuming.
- Rating scale is a standard for collecting qualitative and quantitative
information
WHAT IS AN ANECDOTAL RECORD?
An anecdotal record (or anecdote) is like a short story
that educators use to record a significant incident that they have observed.
Anecdotal records are usually relatively short and may contain descriptions of
behaviours and direct quotes.
Why use anecdotal records?
Anecdotal records are easy to use and quick to write, so
they are the most popular form of record that educators use. Anecdotal records
allow educators to record qualitative information, like details about a child’s
specific behaviour or the conversation between two children. These details can
help educators plan activities, experiences and interventions because they can
be written on his break, or at the end of the day.
The Critical Incident Technique (or CIT)
is a set of procedures used
for collecting facts by direct observations of human behavior that have critical
significance and meet methodically defined criteria. These observations are
then kept track of as incidents, which are then used to solve practical
problems and develop broad psychological principles and how to improve
the performance of the individuals involved.
The investigator may focus on a particular incident or set
of incidents which caused serious loss. Critical events are recorded and stored
in a database or on a spreadsheet. Analysis may show how clusters of
difficulties are related to a certain aspect of the system or human practice.
Investigators then develop possible explanations for the source of the
difficulty.
The method generates a list of good and bad behaviors which
can then be used for performance appraisal.
CIT is a flexible method that usually relies on five major
areas.
1.
The first is determining
and reviewing the incident
2.
Then fact-finding,
which involves collecting the details of the incident from the participants.
3.
The next step is to
identify the issues.
4.
Afterwards a decision
can be made on how to resolve the issues based on various possible solutions.
5.
The final and most
important aspect is the evaluation, which will determine if the solution that
was selected will solve the root cause of the situation and will cause no
further problems.
SOCIOMETRY
Sociometry is the inquiry into the evolution and
organization of groups and the position of individuals within them. Sociometric
explorations reveal the hidden structures that give a group its form: the
alliances, the subgroups, the hidden beliefs, the forbidden agendas, the
ideological agreements, the ‘stars’ of the show.
One of Moreno's innovations in sociometry was the
development of the sociogram, a systematic method for
graphically representing individuals as points/nodes and the relationships between
them as lines/arcs.
Objective
Structured Clinical Examination(OSCEs) is a form of
performance-based testing used to measure candidates’ clinical
competence.
Originally it is described as ‘a
timed examination in which medical students interact with a series of simulated
patients in stations that may involve history-taking, physical examination,
counselling or patient management.
The OSCE is a versatile
multipurpose evaluative tool that can be utilized to evaluate health care
professionals in a clinical setting. It assesses competency, based on objective
testing through direct observation. It is comprised of several
"stations" in which examinees are expected to perform a variety of
clinical tasks within a specified time period against criteria formulated to
the clinical skill, thus demonstrating competency of skills and/or attitudes.
The OSCE has been used to evaluate the ability
to obtain/interpret data, problem-solve, teach, communicate, and handle
unpredictable patient behavior, which are otherwise impossible in the
traditional clinical examination. Any attempt to evaluate these critical areas
in the old-fashioned clinical case examination will seem to be assessing theory
rather than simulating practical performance.
It has proved to be so effective that it is now being adopted in disciplines
other than medicine, like dentistry, nursing, midwifery, pharmacy and event
engineering and law.
Features of the Objective Structured
Clinical Examination (OSCEs)
·
Stations are
short,
·
Stations are
numerous
·
Stations are highly
focused ,
·
candidates are given very specific
instructions
·
A pre-set structured
mark scheme is used hence…
·
reduced examiner
input and discretion
Emphasis on:
·
What candidates can do
rather than what they know
·
The application of
knowledge rather
than the recall of knowledge
Typically…
·
5 minutes most common
(3-20 minutes)
·
(minimum) 18-20
stations/2 hours for adequate reliability
·
Written answer sheets
or observer assessed using checklists
·
Mix of station
types/competences tested
·
Examination hall is a
hospital ward
·
Atmosphere active and
busy
Additional options…
·
Double or triple
length stations
·
Linked stations
·
Preparatory stations
·
“Must pass” stations
·
Rest stations
How is the OSCEs done?
The following are steps
in sequence:
1. Registration: The first step is the registration.
- Show your examination invitation card and identification.
- Be reminded about the exam rules.
- Be checked for things which are allowed and
other not allowed things.
- Receive your exam envelope which contains your
ID badge, stickers, a pencil, a notebook or clipboard (both with numbered
blank papers),.. etc.
2. Orientation: The next step
is orientation. An orientation video may be shown. Here:
- Exam format, procedures and policies will be
reviewed.
- Introduced to your team and team leader.
- Instructed about your starting station and how
to proceed.
- Your questions will be answered (and not
allowed beyond this step).
3. Escorting to exam position: Now
it is exam time.
You will be escorted to your station. You will stop by the
assigned room door until a long bell / buzzer announces the start of the exam.
4. Station Instruction Time:
This is one or two minutes to read the instruction about
this station situation, patient, and required tasks. Read carefully. At the
next bell / buzzer enter the room.
5. The Encounter:
Start your encounter with the SP. This is a 5-20 minute
encounter. Perform the required tasks. Stop at the next bell / buzzer.
6. Post Encounter Period: Next
is a question period.
There are some differences here. Some OSCEs will have no
post encounter periods. Some will have one or two minutes of the encounter
period assigned to an oral questions asked by the examiner inside the exam
room. No more communication is allowed with the SP. Others have written
questions to be answered on paper or computer outside the exam room for 5-10
minutes. At the next long bell / buzzer, the first station ended as well as the
next station has started. You have to proceed to the next station quickly as it
is the same long bell / buzzer at step 4.
7. Repeat Steps 4 to 6:
Steps 4 to 6 will be repeated until you have been in all
the stations. Some OSCEs will offer one or two short rest
periods.
8. Exam ended / Escorting to dismissal area: The exam is over.
You will be escorted back to the dismissal area for signing
out. You will be asked to handle back all what you had received on signing in,
the ID badge, remaining stickers, all the papers, and the pencil. You may also
be asked to stay without outside contacts for some time (sometimes hours) for
exam security reasons.
Advantages and Disadvantages of OSCE
1.
The advantages of OSCE
apart from its versatility
2.
Broadening scope are
its objectivity, reproducibility, and easy recall.
3.
All students get
examined on predetermined criteria on same or similar clinical scenario or
tasks with marks written down against those criteria thus enabling recall,
teaching audit and determination of standards.
4.
In a study from
Harvard medical school, students in second year were found to perform better on
interpersonal and technical skills than on interpretative or integrative
skills. This allows for review of teaching technique and curricula.
5.
Performance is judged
not by two or three examiners but by a team of many examiners in-charge of the
various stations of the examination. This is to the advantage of both the
examinee and the teaching standard of the institution as the outcome of the
examination is not affected by prejudice and standards get determined by a lot
more teachers each looking at a particular issue in the training.
6.
OSCE takes much
shorter time to execute examining more students in any given time over a
broader range of subjects.
7.
However no examination
method is flawless and the OSCE has been criticized for using unreal subjects
even though actual patients can be used according to need
8.
OSCE is more difficult
to organize and requires more materials and human resources
THE OBJECTIVE STRUCTURED PRACTICAL EXAMINATION
(OSPE)
The objective structured practical examination (OSPE) was used as an
objective instrument for assessment of laboratory exercises in preclinical
sciences, particularly physiology.
It was adapted from the objective
structured clinical examination (OSCE). The OSPE was administered to two
consecutive classes in conjunction with the conventional examination in which
the candidate is expected to perform a given experiment. The scores of the
students in the two components of the examination were used to compare the OSPE
with the conventional examination and to evaluate the new instrument of
assessment. The OSPE appears to be a reliable device with a good capacity for
discriminating between different categories of students. It is better in these
respects than the conventional practical examination. Moreover, it has scope
for being structured in such a way that all the objectives of laboratory
teaching can be tested and each aspect can be assigned the desired weightage.
The assessment of practical skills is often neglected. A
contributing factor is the unsatisfactory nature of the assessment instruments
commonly used. The objective structured practical examination (OSPE) is a
practical, reliable and valid alternative.
The main features of the OSPE are:
(1)
separate assessment of process and product through observation
of performance and assessment of end result;
(2)
adequate sampling of skills and content to be tested;
(3)
an analytical approach to the assessment;
(4)
objectivity;
(5)
feedback to teacher and
students.
|
The OSPE approach merits consideration in any subject where
practical skills should be assessed.
DIFFERENTIAL SCALES (OR THURSTONE-TYPE SCALES)
The name of
L.L. Thurstone is associated with differential scales which have been developed
using consensus scale approach. Under such an approach the selection of items
is made by a panel of judges who evaluate the items in terms of whether they
are relevant to the topic area and unambiguous in implication. The detailed
procedure is as under:
- The researcher
gathers a large number of statements, usually twenty or more, that express
various points of view toward a group, institution, idea, or practice
(i.e., statements belonging to the topic area).
- These statements
are then submitted to a panel of judges, each of whom arranges them in
eleven groups or piles ranging from one extreme to another in position.
Each of the judges is requested to place generally in the first pile the
statements which he thinks are most unfavorable to the issue, in the
second pile to place those statements which he thinks are next most
unfavorable and he goes on doing so in this manner till in the eleventh
pile he puts the statements which he considers to be the most favorable.
- This sorting by
each judge yields a composite position for each of the items. In case of
marked disagreement between the judges in assigning a position to an item,
that item is discarded.
- For items that are
retained, each is given its median scale value between one and eleven as established
are arranged in random order of scale value. If the values are valid and
if the opinionnaire deals with only one attitude dimension, the typical
respondent will choose one or several contiguous items (in terms of scale
values) to reflect his views. However, at times divergence may occur when
a statement appears to tap a different attitude dimension.
Thurstone
method has been widely used for developing differential scales which are
utilized to measure attitudes towards varied issues like war, religion, etc.
Such scales are considered most appropriate and reliable when used for
measuring a single attitude.
- Requires
more cost and effort.
- the
values assigned to various statements by the judges may reflect their own
attitudes.
- The
method is not completely objective; it involves ultimately subjective
decision process.
Summated Scales (or
Likert-type Scales) Summated scales (or Likert-type scales)
Summated
scales consist of a number of statements which express either a favorable or
unfavorable attitude towards the given object to which the respondent is asked
to react. The respondent indicates his agreement or disagreement with each
statement in the instrument. Each response is given a numerical score,
indicating its favorableness or unfavorableness, and the scores are totaled to
measure the respondent’s attitude. For this reason they are often referred to
as Likert-type scales.
In a Likert scale, the respondent is asked to
respond to each of the statements in terms of several degrees, usually five degrees
(but at times 3 or 7 may also be used) of agreement or disagreement.
Eg. i.
strongly agree, ii. agree, iii. undecided, iv. disagree, v. strongly disagree.
We find that these five points constitute the scale. At one extreme of the
scale there is strong agreement with the given statement and at the other,
strong disagreement, and between them lie intermediate points. It assigns a
scale value to each of the five responses. The instrument yields a total score
for each respondent, which would then measure the respondent’s favorableness
toward the given point of view. If the instrument consists of, say 30
statements, the following score values would be revealing. 30 × 5 = 150 Most
favorable response possible 30 × 3 = 90 A neutral attitude 30 × 1 = 30 Most unfavorable
attitude. The scores for any individual would fall between 30 and 150. If the
score happens to be above 90, it shows favorable opinion to the given point of
view, a score of below 90 would mean unfavorable opinion and a score of exactly
90 would be suggestive of a neutral attitude.
Procedure:
- As a first step, the researcher
collects a large number of statements which are relevant to the attitude
being studied
- A trial test should be
administered to a number of subjects.
- Each statement, included in the Likert-type
scale, is given an empirical test for discriminating ability
- Likert-type scale can easily be used in
respondent-centered and stimulus centered studies
- Likert-type scale takes much less time to
construct, it is frequently used by the students of opinion research.
- It is most useful in a situation wherein it is
possible to compare the respondent’s score with a distribution of scores
from some well defined group.
Limitations:
- With
this scale, we can simply examine whether respondents are more or less
favorable to a topic, but we cannot tell how much more or less they are.
- There
is no basis for belief that the five positions indicated on the scale are
equally spaced.
- The
interval between ‘strongly agree’ and ‘agree’, may not be equal to the
interval between “agree” and “undecided”.
- The
total score of an individual respondent has little clear meaning since a
given total score can be secured by a variety of answer patterns.
- It
is unlikely that the respondent can validly react to a short statement on
a printed form in the absence of real-life qualifying situations.
Moreover, there “remains a possibility that people may answer according to
what they think they should feel rather than how they do feel.”
Standardised Test
A standardised test is one
that has been carefully constructed by experts in the light of acceptable
objectives or purposes; procedure for administering, scoring and interpreting
scores are specified in detail so that the result should be comparable; and
norms or average for different age or grade levels have been pre-determined. It
requires more thinking, planning, exact preparation, scoring, analysis and
refinement. It is a complex and multidimensional work.
Standardised tests are those
tests
1.
which are constructed
by individual or by a group of individuals
2.
are being processed
and universalised for all the situations and for all the purposes.
3.
its content is
carefully designed, carefully phrased and simultaneously pretested
4.
for all the situations
inside and outside the educational institutions.
5.
Generally these tests
are norm-referenced tests
A
standardised test is one which passes through the following process:
(i) Standardisation of the
content and questions:
Due weightage is given to
the content and objectives. Items are to be prepared according to the
blue-print. Relevant items are included and irrelevant items are omitted,
giving due consideration to item difficulty and discriminating value. Internal
consistency is also taken into account.
(ii) Standardisation of the method
of administration:
Procedure of test
administration, conditions for administration, time allowed for the test etc.,
are to be clearly stated.
(iii) Standardisation of
the scoring procedure:
To ensure objective and
uniform scoring, the adequate scoring key and detailed instruction for method
of scoring is to be provided.
(iv) Standardisation of
interpretation:
Adequate norms to be
prepared to interpreted the results. Test is administered over a large sample
(representative one). Test scores are interpreted with reference to norms.
Derivation of norms is an integral part of the process of standardisation.
Characteristics of Standardised
Tests:
1. They consist of items of high
quality.
The items are pretested and
selected on the basis of difficulty value, discrimination power, and
relationship to clearly defined objectives in behavioural terms.
2. As the directions for
administering, exact time limit, and scoring are precisely stated, any person
can administer and score the test.
3. Norms, based on representative
groups of individuals, are provided as an aid for interpreting the test scores.
These norms are frequently based on age, grade, sex, etc.
4. The reliability and
validity are established.
5. A manual is supplied that
explains the purposes and uses of the test, describes briefly how it was
constructed, provides specific directions for administering, scoring, and
interpreting results, contains tables of norms and summarizes available
research data on the test.
6. No two standardized tests
are exactly alike. Each test measures certain specific aspects of behaviour and
serves a slightly different purpose.
Thus, one has to be careful
in selecting a standardised test.
Uses of Standardised Tests:
1. Standardised test
assesses the rate of development of a student’s ability.
2. It checks and ascertains
the validity of a teacher-made test.
3. These tests are useful in
diagnosing the learning difficulties of the students.
4. It helps the teacher to
know the casual factors of learning difficulties of the students.
5. Provides information’s
for curriculum planning and to provide remedial coaching for educationally
backward children.
6. It also helps the teacher
to assess the effectiveness of his teaching and school instructional
programmes.
7. Provides data for tracing
an individual’s growth pattern over a period of years.
8. It helps for organising
better guidance programmes.
9. Evaluates the influences
of courses of study, teacher’s activities, teaching methods and other factors
considered to be significant for educational practices.
Types of standardized tests
1-Achievement – tests of content knowledge or
skills
2-Aptitude - tests which are used to
predict future cognitive performance
3-Standards-based - criterion-referenced
tests based on established standards
4-Domain-referenced
Standardized tests V/S Informal
Teacher-made tests.
- Standardized tests assess
broad, general content while teacher-made tests tend to focus on specific
objectives related to the instruction in a class
- Standardized tests are more
technically sound than teacher-made tests.
- Standardized tests are
administered in “standardized” manners while teacher-made tests tend to be
administered informally
- Standardized tests are scored
in consistent, reliable manners and produce sets of standard scores;
teacher-made tests are scored in less reliable manners and generally are
scored as the percentage of correct responses
Questionnaires
A questionnaire is an instrument containing
statements designed to obtain a subject’s perceptions, attitudes, beliefs, values,
opinions, or other non-cognitive traits
Personality inventories
Personality inventories are concerned with psychological
orientation (i.e., general psychological adjustment) and Educational
orientation (i.e., traits such as self-concept or self-esteem)
Attitudes, values, or interests
Attitudes, values, or interests are affective
traits that indicate some degree of preference toward something.
Scales
Scales are continuum that describes
subject’s responses to a statement.
Likert Scales
Checklists
Ranked items
Observations
Interviews
Advantages
- Establish rapport
- Enhance motivation
- Clarify responses through additional
questioning
- Capture the depth and richness of
responses
- Allow for flexibility
- Reduce “no response” and/or “neutral”
responses
Disadvantages
- Time consuming
- Expensive
- Small samples
- Subjective
Scales of Measurement
Numbers can be grouped into 4 types or levels:
nominal, ordinal, interval, and ratio.
Nominal
Not really a ‘scale’ because it does not scale
objects along any dimension.
Nominal refers to quality more than quantity. A
nominal level of measurement is simply a matter of distinguishing by name,
e.g., 1 = male, 2 = female. Even though we are using the numbers 1 and 2, they
do not denote quantity.
Ordinal
Ordinal refers to order in measurement. In
ordinal measurement the attributes can be rank-ordered. Here, distances between
attributes do not have any meaning Ordinal refers to quantities that have a
natural ordering. For example, we often using rating scales (Likert questions).
This is also an easy one to remember, ordinal sounds like order. An
ordinal scale indicates direction, in addition to providing nominal
information. Low/Medium/High; or Faster/Slower are examples of ordinal levels
of measurement.” Many psychological scales or inventories are at the ordinal
level of measurement.
An ordinal scale extends the information of a
nominal scale to show order, i.e. that one unit has more of a certain
characteristic than another unit. For example, an ordinal scale can be used
•
to rank job applicants from the best to the worst,
•
to categorise people according to their level of education, or
- to measure people’s feelings
about some matter using a measure like ‘strongly agree’, ‘agree’,
‘neutral’, ‘disagree’, ‘strongly disagree’
Interval
An interval scale is a scale on which equal
intervals between objects, represent equal differences.
Interval scales provide information about order,
and also possess equal intervals. Equal-interval scales of measurement can be
devised for opinions and attitudes. Constructing them involves an understanding
of mathematical and statistical principles.
Interval scales are not simply ordinal. They
give a deeper meaning to order. An interval scale is a scale of measurement in
which the magnitude of difference between measurements of any two units is
meaningful. If weights are measured in kilograms (kg), then the difference in
weights between two people whose weights are respectively 82 kg and 69 kg is
the same as that between people whose respective weights are 64 kg and 51 kg.
That is, the ‘intervals’ are the same (13 kg) and have the same meaning.
Ratio
A ratio scale is a special form of interval
scale that has a true zero. For some interval scales, measurement ratios are
not meaningful. For example, 40° C does not represent a temperature which has
twice the heat of 20° C because the zero on the Celsius scale is arbitrary, and
does not represent an absence of heat. However, when we consider the metric
system for temperature (known as ‘degrees Kelvin’), then there is a true zero
(called ‘absolute zero’). Therefore, a measure of 40K (i.e. 40 degrees Kelvin)
is twice as hot as 20K.
SUBJECTIVE AND OBJECTIVE TESTS
Objective test: this is a test consisting of factual
questions requiring extremely short answers that can be quickly and
unambiguously scored by anyone with an answer key. They are tests that call for
short answer which may consist of one word, a phrase or a sentence.
Subjective test: this is a type of test that is evaluated
by giving opinion. They are more challenging and expensive to prepare,
administer and evaluate correctly, though they can be more valid.
TYPES OF OBJECTIVE TEST ITEMS
They include the following:
I. True- false items
II. Matching items
III. Multiple choice items
IV. Completion items
They include the following:
I. True- false items
II. Matching items
III. Multiple choice items
IV. Completion items
1) True –false test items
Here, a factual statement is made and the learner is
required to respond with either true or false depending on the correctness of
the statement. They are easy to prepare, can be marked objectively and cover a
wide range of topics
ADVANTAGES
can test a large
body of materia
they are easy to score
DISADVANTAGES
they are easy to score
DISADVANTAGES
Difficult to
construct questions that are definitely or unequivocally true or false.
They are prone to guessing
2) MATCHING ITEMS
Involves connecting contents of one list to contents in another list. The learners are presented with two columns of items, for instance column A and column B to match content in both columns correctly.
They are prone to guessing
2) MATCHING ITEMS
Involves connecting contents of one list to contents in another list. The learners are presented with two columns of items, for instance column A and column B to match content in both columns correctly.
Advantages:
a. Measures primarily associations and relationships as well as sequence of events.
b. Can be used to measure questions beginning with who, when, where and what
c. Relatively easy to construct
d. They are easy to score
a. Measures primarily associations and relationships as well as sequence of events.
b. Can be used to measure questions beginning with who, when, where and what
c. Relatively easy to construct
d. They are easy to score
Disadvantages:
Difficult to construct effective questions that measure higher order thinking and contain a number of plausible distracters.
Difficult to construct effective questions that measure higher order thinking and contain a number of plausible distracters.
3) MULTIPLE CHOICE TEST ITEMS
In a multiple choice item, a statement of fact is made. It is followed by four or five alternative responses from which only the best or correct one must be selected. The statement or question is termed as ‘stem’. The alternatives or choices are termed as ‘options’ and the ‘key is the correct alternative. The other options are called ‘distracters’.
Advantages:
Measures a variety of levels of learning.Ø
They are easy to score.Ø
Can be analyzed to yield a variety of statistics.Ø
When well constructed, has proven to be an effective assessment tool.Ø
Disadvantages:
Difficult to construct effective questions that measure higher order of thinking and contain a number of plausible distracters.
In a multiple choice item, a statement of fact is made. It is followed by four or five alternative responses from which only the best or correct one must be selected. The statement or question is termed as ‘stem’. The alternatives or choices are termed as ‘options’ and the ‘key is the correct alternative. The other options are called ‘distracters’.
Advantages:
Measures a variety of levels of learning.Ø
They are easy to score.Ø
Can be analyzed to yield a variety of statistics.Ø
When well constructed, has proven to be an effective assessment tool.Ø
Disadvantages:
Difficult to construct effective questions that measure higher order of thinking and contain a number of plausible distracters.
4) COMPLETION ITEMS OR SHORT ANSWER TEST ITEMS
In this, learners are required to supply the words or figures which have been left out. They may be presented in the form of questions or phrases in which a learner is required to respond with a word or several statements.
In this, learners are required to supply the words or figures which have been left out. They may be presented in the form of questions or phrases in which a learner is required to respond with a word or several statements.
Advantages:
• Relatively easy to construct.
• Can cover a wide range of content.
• Reduces guessing.
Disadvantages:
Primarily used for lower levels of thinking.Ø
Prone to ambiguity.Ø
Must be constructed carefully so as not to provide too many clues to the correct answer.Ø
Scoring is dependent on the judgment of the evaluator.
• Relatively easy to construct.
• Can cover a wide range of content.
• Reduces guessing.
Disadvantages:
Primarily used for lower levels of thinking.Ø
Prone to ambiguity.Ø
Must be constructed carefully so as not to provide too many clues to the correct answer.Ø
Scoring is dependent on the judgment of the evaluator.
An intelligence quotient, or IQ, is a score designed to assess intelligence. The term "IQ," from the German Intelligenz-Quotient, was devised by the German psychologist William Stern in 1912 as a proposed method of scoring children's intelligence tests such as those developed by Alfred Binet and Théodore Simon in the early 20th Century.
IQ scores have been shown to be associated with such
factors as morbidity and mortality, parental social status, parental IQ.
as predictors of educational achievement
, as predictors of job performance and income.
Types of Intelligence Tests:
Verbal or Language Tests:
In these the subjects make
use of language in which the instructions are given in words, written, oral
The
test content is designed with verbal material which may include varieties of
items like:
a. Vocabulary tests:
In these the subject is
required to give the meanings of words or phrases.
b. Memory tests:
These are designed to test
the subjects immediate and long-term memory and include all recall and
recognition type of items like telephone number, vehicle number, teachers,
names, etc.
c. Comprehension tests:
By means of these, the
subject is tested for the ability to grasp, understand and react to a given
situation.
d. Information tests:
The subject is tested on his
knowledge about the things around him by means of these tests.
e. Reasoning tests:
In these tests the subject
is asked to provide answers which demonstrate his ability to reason logically,
analytically, systematically, inductively or deductively as, for example 1, 2,
4, 7, 11, 16, 22, 29, ….
f. Association tests:
Through these test items the
subject is tested for his ability to point out the similarities or
dissimilarities between two or more concepts or objects.
Non-Verbal and Non-Language
Tests:
These tests involve
activities in which the use of language is not necessary. Performance tests are
the typical examples for these type of tests. Here the individual is tested
through material objects, where he is instructed orally and the reactions of
the person is assessed with respect to the individual’s approach towards the
work. Then needed directions are provided to him.
Individual Verbal Intelligence
Tests:
Tests involving the use of
language are administered to one individual at a time, e.g. the Stanford Binet
scale, individual performance tests, Arthur point scale, Bhatia’s battery of
performance test.
Group Verbal Intelligence
Tests:
The tests which necessitate
the use of language and are applied to a group of individuals at a time. For
example,
1. Army alpha test
(developed during World War I)
2. Army general
classification Test (World War II).
Popular Indian tests of this
nature are:
a. Group tests of
intelligence prepared by Bureau of Psychology, Allahabad (Hindi).
b. Samuhik Budhi Pariksha
(Hindi) prepared by PL Shrimali, Vidya Bhavan GS Teacher College, Udaipur.
Group Non-Verbal Intelligence
Tests:
These tests do not
necessitate the use of language and are applicable to a group of individuals at
a time. The difference between performance tests (used for an individual) and
non-verbal tests (used for a group) is one of the degree as far as their
non-verbal nature is concerned.
The individual performance
tests require the manipulation by the subject of concrete objects or materials
supplied in the test. The responses are purely motor in character and seldom
requires the use of paper and pencil by the testee.
Types of Intelligence Tests:
Intelligence tests may be
classified under three categories:
1. Individual Tests:
These tests are administered
to one individual at a time. These cover age group from 2 years to 18 years.
These are:
(a) The Binet- Simon Tests,
(b) Revised Tests by Terman,
(c) Mental Scholastic Tests
of Burt, and
(d) Wechsler Test.
2. Group Tests:
Group tests are administered
to a group of people Group tests had their birth in America – when the
intelligence of the recruits who joined the army in the First World War was to
be calculated.
These are:
(a) The Army Alpha and Beta
Test,
(b) Terman’s Group Tests,
and
(c) Otis Self-
Administrative Tests.
Among the group tests there are
two types:
(i) Verbal, and
(ii) Non-Verbal.
Verbal tests are those which
require the use of language to answer the test items.
3. Performance:
These tests are administered
to the illiterate persons. These tests generally involve the construction of
certain patterns or solving problems in terms of concrete material.
Some of the famous tests are:
(a) Koh’s Block Design Test,
(b) The Cube Construction
Tests, and
(c) The Pass along Tests.
1. Individual Tests:
The first tests that were
prepared were individual. Binet’s test was individual, and so was Terman-Merril
Stanford Revision. Individual tests are most reliable but these consume more
time and energy. These are, however, useful in making case-studies or
individual studies of behaviour problems or backwardness.
The child has to read the
question or listen to the question and answer in language.
But suppose the child is not
fully conversant with the language of the examiner, or he is illiterate, non-verbal
or performance tests have been prepared. Here the tasks set up require the
child to do ‘something’ rather than reply a question.
The child may, for instance,
fit in a wooden board with depressions in some geometrical forms, some wooden
shapes like triangles or rectangles or circles. He may put some cubes in
descending or ascending order of size. He may assemble certain disintegrated
parts to form full designs or pictures. No language is used here. Instructions
also can be had through demonstration or action.
A number of performance tests
have been prepared. The most important are:
1.
Alexander’s
Pass-a-long test.
2.
Koh’s Block Design
test.
3.
Weschlers Performance
Test.
4.
Terman and Merill’s
Performance Test.
5.
Kent’s Performance
Test.
Kent’s test is used for
clinical purposes. It consists of five oral tests and seven written tests, each
requiring one minute.
Individual performance tests
have the disadvantage that these take a lot of time. Their reliability is also
questioned on the ground that temporary response sets or work habits may play a
major role in determining score.
Again, the intelligence
measured by performance tests is not quite the same as tested by Binet and
others.
2. Group Tests:
These are more helpful as
these deal with large masses of subjects such as in schools, industry, army and
public. These are reliable and have high predictive validity, and can be
compared favourably with individual tests.
The Army Alpha and Army Beta
test were the most prominent tests
characteristics of group tests:
(i) Most of the group-tests
have been standardised,
(ii) Most of the test items
in group verbal tests are linguistic in character.
(iii) Some group verbal
tests have been used in measuring scholastic aptitude
(iv) These are convenient in
administration and scoring.
3. Comparison of Individual and
Group Test:
4. Performance Tests:
The importance of non-verbal
or performance was discussed above.
Non-verbal tests include such
items as:
(i) Relationship of figures,
which may be either (a) functional or (b) spatial.
(ii) Drawing figures,
especially human figures,
(iii) Completing pictures
and patterns.
(iv) Analysing space
relationship from diagrams
(v) Analysing cube
relationship.
(vi) Drawing lines through
figures to break them up into given section, as in Minnesota paper form board
test.
(vii) Mechanical relationship,
tracing relationship of interlocking gears-pulleys, shown in pictorial form.
(viii) Memory for design.
The following tests are
examples where actual handling is needed:
(i) Assembly of objects from
their disconnected pans
(ii) Kolhi’s Block Design,
(iii) Picture completion,
(iv) Cube construction,
(v) Form board paper pencil,
(vi) Pass along test,
(vii) Picture arrangement,
(viii) Mazes, and
(ix) Cube imitation
(tapping).
Progressive Matrices
prepared by G.C. Raven at Dumfries are one of the widely used paper-pencil
group performance tests.
Advantages:
Performance tests have the
following advantages:
(i) These are generally
useful for measuring specific abilities include deaf persons, language
difficulties, educationally backward, and those who are discouraged in verbal
talks.
(ii) These are highly useful
in vocational and educational guidance.
(iii) For the study of
pre-school children, who have not begun reading and writing
(iv) These are useful for
clinical purposes, for testing neurotics and mentally defective (or
feeble-minded).
(v) These are useful for
adults over 30, who have lost interest in numbers and words.
(vi) Performance tests are
culture-free.
Limitations:
(i)
Some test items do not
have connection with life situations.
(ii)
Some call for speed
rather the solution of problems.
(iii)
Enough of emphasis is
not given to item difficulty.
(iv)
Performance tests do
not measure exactly what Binet’s tests measure- reasoning, judgment and
imagination.
(v)
Most of these tests do
not require above-average thinking, so these are not suitable for higher
levels.
(vi)
There are variations
in the utility of different tests. Picture completion tests may suffer from
poor material. Maze tests require continual adaptation and planning. Form-board
tests tend to depend upon speed.
(vii)
performance tests are
not so reliable. A battery of tests is needed, which makes the task mere
complex.
(viii)
expensive
Uses of Intelligence Test
a. Use in selection:
Results of intelligence
tests can be used for selection of suitable candidates for training in
educational and professional skills
b. Use in classification:
Intelligence tests help in
classifying individuals as backward, average, bright or gifted, and thus
arrange for homogenous grouping to provide proper educational opportunities.
c. Use in assessment for
promotion:
can be successfully used for
promotion of students to the next higher grades of classes.
d. Use in provision of
guidance:
in providing training to
teachers and for personnel guidance.
e. Use for improving the
learning process:
helpful to teachers to plan
the teaching-learning skills.
f. Use for diagnosis:
to diagnose, distinguish and
discriminate the differences in the mental functioning of individuals.
g. Use in research work:
The intelligence tests can
be used in carrying out research in the field of education, psychology and
sociology with different age groups for generalization.
h. For Determining the
optimum level of work:
The mental age gives the
mental level at which a child can be expected to work most efficiently in
academic subjects.
i. Estimating the range of
abilities in a class:
j. Determining the level
of ability:
k. Measuring special
abilities:
l. Predicting success in
particular Academic Subjects:
Readiness and prognoses
tests have been designed to give a high prediction of success in specific
subjects, and provide useful basis for the selection of courses.
m. Diagnosing
Subject-Matter Difficulties:
It gives the teacher
information about the areas in which the child needs more training.
ATTITUDE
Perhaps the most straightforward way of finding out about
someone’s attitudes would be to ask them. However, attitudes are
related to self-image and social acceptance (i.e. attitude functions).
Attitude measurement can be divided into two basic
categories
- Direct Measurement (likert scale and semantic
differential)
- Indirect Measurement (projective
techniques)
Evaluation of Direct Methods
An attitude scale is designed to provide a valid, or
accurate, measure of an individual’s social attitude. However, as anyone
who has every “faked” an attitude scales knows there are shortcomings in these
self report scales of attitudes. There are various problems that affect
the validity of attitude scales. However,
the most common problem is that of social desirability.
Socially desirability refers to the tendency for people to
give “socially desirable” to the questionnaire items.
Projective Techniques
A projective test is involves
presenting a person with an ambiguous (i.e. unclear) or
incomplete stimulus (e.g. picture or words). The stimulus requires interpretation from
the person. Therefore, the person’s attitude is inferred from their
interpretation of the ambiguous or incomplete stimulus.
The assumption about these measures of attitudes it that
the person will “project” his or her views, opinions or attitudes into the
ambiguous situation, thus revealing the attitudes the person holds.
However, indirect methods only provide general information and do not offer a
precise measurement of attitude strength since it is qualitative rather than
quantitative. This method of attitude measurement is not objective or
scientific which is a big criticism.
Examples of projective techniques include:
• Rorschach Inkblot Test
• Thematic Apperception Test (or TAT)
• Draw a Person Task
Thematic Apperception Test

Here a person is presented with an ambiguous picture which
they have to interpret.
The thematic apperception test (TAT) taps into a person’s
unconscious mind to reveal the repressed aspects of their personality. Although
the picture, illustration, drawing or cartoon that is used must be interesting
enough to encourage discussion, it should be vague enough not to immediately
give away what the project is about.
TAT can be used in a variety of ways, from eliciting
qualities associated with different products to perceptions about the kind of
people that might use certain products or services.
The person must look at the picture(s) and tell a story.
For example:
o What has led up to the event shown
o What is happening at the moment
o What the characters are thinking and feeling, and
o What the outcome of the story was
o What is happening at the moment
o What the characters are thinking and feeling, and
o What the outcome of the story was
Draw a Person Test
Figure drawings are projective diagnostic
techniques in which an individual is instructed to draw a person,
an object, or a situation so that cognitive, interpersonal, or psychological
functioning can be assessed. The test can be used to evaluate
children and adolescents for a variety of purposes (e.g.
self-image, family relationships, cognitive ability and personality).
A projective test is one in which a test taker responds to
or provides ambiguous, abstract, or unstructured stimuli, often in the form of
pictures or drawings.
In these tests, there is a consideration of how well a
child draws and the content of a child's drawing. In some tests, the child's
self-image is considered through the use of the drawings.
In other figure drawing tests, interpersonal relationships
are assessed by having the child draw a family or some other situation in which
more than one person is present. Some tests are used for the evaluation of
child abuse. Other tests involve personality interpretation through
drawings of objects, such as a tree or a house, as well as people.
Finally, some figure drawing tests are used as part of the
diagnostic procedure for specific types of psychological or neuropsychological
impairment, such as central nervous system dysfunction or mental retardation.
Despite the flexibility in administration and
interpretation of figure drawings, these tests require skilled and trained
administrators familiar with both the theory behind the tests and the structure
of the tests themselves. Interpretations should be made with caution and
the limitations of projective tests should be considered.
Evaluation of Indirect Methods
The major criticism of indirect methods is their lack of
objectivity.
Such methods are unscientific and do not objectively
measure attitudes in the same way as a Likert scale.
There is also the ethical problem of deception as often the
person does not know that their attitude is actually being studied when using
indirect methods.
The advantages of such indirect techniques of attitude
measurement are that they are less likely to produce socially desirable
responses, the person is unlikely to guess what is being measured and behavior should
be natural and reliable.
Aptitude test, examination that attempts to determine and measure a
person’s ability to acquire, through future training, some specific set of
skills (intellectual, motor, and so on). The tests assume that people differ in
their special abilities and that these differences can be useful in predicting
future achievements.
General, or multiple,
aptitude tests are similar to intelligence tests in that they measure a
broad spectrum of abilities (e.g., verbal comprehension, general reasoning,
numerical operations, perceptual speed, or mechanical knowledge).
Aptitude tests also have
been developed to measure professional potential
The Differential Aptitude
Test (DAT) measures specific abilities such as clerical speed and mechanical
reasoning as well as general academic ability.
An aptitude test is designed to assess what a person is
capable of doing or to predict what a person is able to learn or do given the
right education and instruction. It represents a person's level of competency
to perform a certain type of task. Such aptitude tests are often used to assess
academic potential or career suitability. Such tests may be used to assess
either mental or physical talent in a variety of domains.
A Few Examples of Aptitude Tests
- A test assessing an individual's aptitude to
become a fighter pilot
- A career test evaluating a person's capability
to work as an air traffic controller
- An aptitude test is given to high
school students to determine which type of careers they
might be good at
- A computer programming test to determine how a
job candidate might solve different hypothetical problems
- A test designed to test a person's physical
abilities needed for a particular job such as a police officer or
firefighter
Meaning of Interest:
An interest is a subjective
attitude motivating a person to perform a certain task. It affords pleasure and
satisfaction. It results in curiosity towards the object of interest,
enthusiasm to be attached to the object, strength of will to face difficulties
while engaged in the task of one’s interest, a definite change in behaviour in
the presence of the object characterised by attention and concentration.
Definitions of interest
Jones states, “Interest is a
feeling of likening associated with a reaction, either actual or imagined to a
specific thing or situation.”
Bingham defines: “Interest is a
tendency to become absorbed in an experience and to continue it, while an
aversion is a tendency to turn away from it to something else.”
Types of Interest:
Jones mentions two distinct
types of interests- extrinsic and intrinsic.
The former are pleasurable
emotions connected with a purpose or goal of an activity. It may involve fame,
name, money, victory or such external motives of conduct.
But the latter are connected
with the activity itself, being basic and real attraction without any external
motive, This intrinsic interest is continuous and permanent, even if the
immediate goal is reached. The extrinsic interest, dies as soon as the goal is
reached.
Super and some other guidance
experts have classified interests into:
(i) Expressed interest,
(ii) Manifest interest, and
(iii) Measured interest.
In the expressed interest
the person expresses his personal likings through such sentences as ‘I love
sports’. Although, it is the first source of knowing the interest of a person
yet much reliance cannot be based on it, as such expressions like permanency
and are prone to vary from time to time depending upon the maturity of the
person.
Manifest interest is the
interest that is not expressed but observed by others while the person is
engaged and absorbed in an activity. Newton forgot his meals while engaged in
scientific experiments.
The measured interest is the
estimate and account of a person’s interest as revealed by some psychological
tests or interest inventories.
Types of Tools for Measuring
Interest:
The tools for measurement of
interest are of two types – formal and informal.
The formal methods are
specialised and standardised measuring instruments such as interest
inventories, interest test batteries.
The informal methods include
the person’s own statement, a record of his activities and observation by the
parents and the teachers. The former i.e., the informal methods are usually
supplemented by the informal methods.
Three notable formal methods
universally employed are:
1. Strong Vocational
Interest Blank,
2. Kuder Preference Record,
and
3. Thustone’s Vocational
Interest Schedule.
1. Strong Vocational Interest
Blank:
Prof. Strong of Stanford
University California designed and standardised this check list. The check list
contains 400 separate items. It is presented to the individual and he is simply
asked to indicate whether he likes, dislikes or is indifferent, on a three
point scale.
The test reveals the
interest maturity of the individual, his masculinity and of femininity, and his
occupational level. The 400 items include 100 occupations, 49 recreations, 36
school subjects, 48 activities and 47 peculiar interests. As such it is useful
for both educational and vocational guidance.
2. Kuder Preference Record:
This has been prepared by G.
Frederic Kuder. This test covers a wider field, comprising of nine separate
scales of occupations, viz. mechanical, computational, scientific, persuasive,
artistic, literary, musical, social and clerical. Kuder presupposes three major
interests viz. mechanical, literary and artistic. So when the same task is
presented to the subject, with three related activities, the subject will
select the activity that relates one of the three interests that he possesses.
For instance, three choices
are given about one item viz. building a bird house, writing articles about
birds and drawing sketches about birds. If the subject opts for the first, his
interest is mechanical.
Another example is presented.
The
subject is asked to select the activity that he would prefer the most, and the activity
he would prefer the least out of the following three:
(i) Visit an art gallery.
(ii) Browse in a library.
(iii) Visit a museum.
A triple activity regarding
collections is:
(i) Collect autographs.
(ii) Collect coins.
(iii) Collect butterflies.
A detailed scoring system is
employed for analysis and interpretation. A percentile of 75 or above is
considered significantly high. If a person goes beyond P 75 in any of the
areas, all the occupations in that area are attractive for him.
3. Thurston’s Vocational
Interest Schedule:
This test has been devised
by Thurstone. He administered a comprehensive test to 3400 college students who
expressed their Likeness (L). Indifference (I) and Dislike (D) to each of the
items in the test.
He analysed the test scores and through the techniques of factor
analysis, arrived at 8 factors of interest viz.;
(i) Commercial Interest,
(ii) Legal,
(iii) Athletic,
(iv) Academic,
(v) Descriptive,
(vi) Biological,
(vii) Physical Science,
(viii) Art.
Limitation of Interest
Inventories:
1. Some of the tests reveal
ability rather than interest. But interest is not the same thing as ability. So
some tests are not fully valid or reliable.
2. The tests presuppose that
the subject possesses a particular interest. But it can reveal the interest
that is present at the time of test, and not afterwards. The interests revealed
may not remain permanent. Moreover the interests are cultivable also. At the
time of testing a particular interest may not have developed fully, but it may
develop afterwards. It has been seen that some interests develop during the
vocation.
3. The interest inventories
reveal facts on the basis of the report given by the subject. The accuracy of
the report is still a problem. Some people do not reveal facts.
4. The questions in the
inventories deal with certain types of activities, and not all these lead to
clear-cut vocations. Again, there is much overlapping between one activity and
another. An occupation is not one interest but a combination of activities or
interests.
5. The predictictive side of
the inventories have also been tested. On investigation Proctor found that
these have 25% permanence in school studies. Strong finds correlation with
future vocation as 0.75, i.e., less + 1.
Inspite of the above
limitations, Interest Inventories are very useful in determining the future
trends of the individual’s vocational life.
ACHIEVEMENT TESTS
The achievement tests that most people are familiar with
are the standard exams taken by every student in school. Students are regularly
expected to demonstrate their learning and proficiency in a variety of
subjects. In most cases, certain scores on these achievement tests are needed
in order to pass a class or continue on to the next grade level.
Examples of Achievement Tests
- A math exam covering the latest chapter in your
book
- A test in your social psychology class
- A comprehensive final in your Spanish class
- A skills demonstration in your martial arts
class
Each of these tests is designed to assess how much you know
at a specific point in time about a certain topic. Achievement tests are not
used to determine what you are capable of; they are designed to evaluate what
you know and your level of skill at the given moment.
Achievement tests are widely used in a number of domains,
both academic- and career-related. Students face an array of achievement tests
almost every day. Such tests allow educators and parents to assess how their
kids are doing in school, but also provide feedback to students on their own
performance.
When Are Achievement Tests Used?
Achievement tests are often used in educational and
training settings. In schools, for example, achievements tests are frequently
used to determine the level of education for which students might be prepared.
Students might take such a test to determine if they are ready to enter into a
particular grade level or if they are ready to pass of a particular subject or
grade level and move on to the next.
Each grade level has certain educational expectations, and
testing is used to determine if schools, teachers, and students are meeting
those standards.
Measuring
Socioeconomic Status and Subjective Social Status
One objective of the Stop
Skipping Class campaign is to provide best practices for measuring
socioeconomic status (SES) and subjective social status (SSS).
An important determinant of
the approach you will use to measure SES and SSS is the level at which you plan
to assess its effects — the societal level, the community or neighborhood
level, or the individual level.
Education can be measured
using continuous variables (e.g., highest year of school completed) or
categorical variables (e.g., 1-6 scale indicating the highest grade completed).
Higher levels of education are often associated with better economic outcomes,
as well as the expansion of social resources
Income
Income can be measured in a
variety of ways, including family income, assessments of wealth and subjective
assessments of economic pressure. At the neighborhood and societal level,
federal poverty thresholds, supplemental poverty measures and school and
neighborhood level indicators of poverty can be assessed. Lack of income has
been found to be related to poorer health, mainly due to reduced access to
goods and services (such as health care) that can be beneficial to health
Occupation
Occupation can be assessed
by asking participants to note their current or most recent occupation or job
title, or to indicate their occupational category from a list. Aside from
financial benefits, employment can improve one's physical and mental health and
expand social networks. However, the nature of lower SES positions can
undermine these benefits, as the job itself may be hazardous or monotonous
No comments:
Post a Comment