Wednesday, 2 January 2019

Measurement and Evaluation


  • Concept and nature of measurement and evaluation, meaning, process, purposes , problems in valuation and measurement, principles of evaluation, characteristics, objectivity, validity, reliability and usability, Formative and summative evaluation internal and external evaluation criterion and norm referred evaluation
  • Evaluation strategies meaning characteristics construction of tests administration of tests scoring grading vs. marks item analysis
  • Essays, short answer questions and multiple choice questions, true and false completion
  • Tools of evaluation
  • Rating scale, checklists, objective structured clinical examination OSCE, objective structured practical examination (OSPE), viva examination
  • Differential scales and summated scales, sociometry, anecdotal record, attitude scale, critical incident technique, question bank preparation, validation, moderation by panel.




*      CONCEPT AND NATURE OF MEASUREMENT AND EVALUATION

Measurement and evaluation are independent concepts.
Measurement can be defined as the process of assigning numbers to events based on an established set of rules. Measurement is collection of quantitative data. A measurement is made by comparing a quantity with a standard unit.
In educational measurement, the “events” under consideration are students’ test performances. In the simplest case, the numerals assigned are typically whole numbers, such as a student’s number of correct responses. Educational measurement is closely related to the concepts of testing, assessment, and evaluation. In education, the numerical value of scholastics ability, aptitude, achievement etc can be measured and obtained using instruments such as paper and pencil test. It means that the values of the attribute are translated into numbers by measurement.
Educational measurement is the science and practice of obtaining information about characteristics of students, such as their knowledge, skills, abilities, and interests. It is a specialty within the broader discipline of psychometrics. Measurement in education includes the development of instruments or protocols for obtaining information, procedures for analyzing and evaluating the quality of the information gained from the use of instruments or protocols, and strategies for communicating the resulting information to diverse audiences, such as educators, policymakers, parents, and students.
Aims of measurement in education
 (1) Arriving at conclusions regarding students’ standing with respect to a specified educational outcome,
 (2) Documenting student ability, achievement, or interests,
(3) Gauging student progress toward specified educational goals, and
 (4) improving teaching and learning.
In the evaluation process, information is interpreted according to established standards so that decisions can be made. Clearly, the success of evaluation depends on the quality of the data collected. If test results are not consistent (or reliable) and truthful (or valid), accurate evaluation is impossible.
The measurement process is the first step in evaluation; an improved measurement leads to accurate evaluation. Measurement determines the degree to which an individual possesses a defined characteristic.
Characteristics or quality of measurement
The first important characteristic of measurement is reliability.
The second important characteristic is validity.
The third important characteristic of a measurement is objectivity.

Evaluation is the process of giving meaning to a measurement by judging it against some standard. The two most widely used types of standards are criterion and norm referenced.
The criterion-referenced standard is used to determine if a student has attained a specified level of skill.
The norm-referenced standard is used to judge an individual’s performance in relation to the performances of other members of a well-defined group. Norm-referenced standard is developed by testing a large number of individuals of a defined group.
Types of Measurement:
Generally, there are three types of measurement:
(i) Direct; (ii) Indirect; and Relative.
Direct;  To find the length and breadth of a table involves direct measurement and this is always accurate if the tool is valid.
Indirect; To know the quantity of heat contained by a substance involves indirect measurement for we have to first find out the temperature of the substance with the help of a thermometer and then we can calculate the heat contained by the substance.
Relative ; To measure the intelligence of a boy involves relative measurement, for the score obtained by the boy in an intelligence test is compared with norms. It is obvious that psychological and educational measurements are relative.
Levels and  Classification of Educational Measures
A students’ achievement may be viewed at three different levels:
1.  Self-referenced how the student is progressing with reference to himself/herself.
2.  Criterion-referenced how the student is progressing with reference to the criteria set by the teacher. Criterion-referenced – individual scores are interpreted in terms of the student’s performance relative to some standard or criterion
3.  Norm-referenced how the student is progressing with   reference to his/her peer group. Norm-referenced – individual scores are interpreted relative to the scores of others in a well defined Norming group.


EVALUATION
Evaluation is a sys­tematic process of determining to what extent instructional ob­jectives has been achieved. It is a dynamic decision-making process focusing on changes that have been made.

Definition of evaluation
1. James M. Bradfield:
Evaluation is the assignment of symbols to phenomenon, in order to characterise the worth or value of a phenomenon, usually with reference to some social, cultural or scientific standards.
2. Gronlund and Linn:
Evaluation is a systematic process of collecting, analysing and interpreting information to determine the extent to which pupils are achieving instructional objectives.

3.C.E. Beeby (1977), who described evaluation as “the systematic collection and interpretation of evidence leading as a part of process to a judgement of value with a view to action.”
In this definition, there are the following four key elements:
(i) Systematic collection of evidence.
(ii) Its interpretation.
(iii) Judgement of value.
(iv) With a view to action.

This process of evaluation involves
(i) Collecting suitable data (measurement)
(ii) Judging the value of these data according to some standard; and
(iii) Making decisions based on the data.
The function of evaluation is to facilitate rational decisions. For the teacher, this can be to facilitate student learning; for the exercise specialist, this could mean helping someone establish scientifically sound weight reduction goals

Evaluation in Education
Evaluation focuses on grades and may reflect classroom components other than course content and mastery level. Evaluation is a final review on your instruction to gauge the quality. It’s product-oriented. This means that the main question is: “What’s been learned?” Finally, evaluation is judgemental.

Principles of Evaluation

1. It must be clearly stated what is to be evaluated:

A teacher must be clear about the purpose of evaluation. He must formulate the instructional objectives and define them clearly in terms of student’s observable behaviour. Before selecting the achievement measures the intended learning out comes must be specified clearly.

2. A variety of evaluation techniques should be used for a comprehensive evaluation:

It is not possible to evaluate all the aspect of achievement with the help of a single technique. For the better evaluation, the techniques like objective tests, essay tests, observational techniques etc. should be used so that a complete picture of the pupil achievement and development can be assessed.

3. An evaluator should know the limitations of dif­ferent evaluation techniques:

Evaluation can be done with the help of simple observation or highly developed standardized tests. But whatever the instrument or technique may be it has its own limitation. There may be measurement errors. Sampling error is a common factor in educational and psychological meas­urements. An achievement test may not include the whole course content. Error in measurement can also be found due to students guessing on objective tests. Error is also found due to incorrect interpretation of test scores.

4. The technique of evaluation must be appropriate for the characteristics or performance to be measured:

Every evaluation technique is appropriate for some uses and inap­propriate for another. Therefore while selecting an evaluation technique one must be well aware of the strength and limitations of the techniques.

5. Evaluation is a means to an end but not an end in itself:

The evaluation technique is used to take decisions about the learner. It is not merely gathering data about the learner. Because blind collection of data is wastage of both time and effort. But the evaluation is meant for some useful purpose.

CHARACTERISTICS OF EVALUATION

The analysis of all the above definitions makes us able to draw following characteristics of evaluation:
1. Evaluation implies a systematic process
2. Evaluation is a continuous process.
In an ideal situation, the teaching- learning process on the one hand and the evaluation procedure on the other hand, go together. It is certainly a wrong belief that the evaluation procedure follows the teaching-learning process.
3. Evaluation emphasizes the broad personality changes and major objectives of an educational programme. Therefore, it includes not only subject-matter achievements but also attitudes, interests and ideals, ways of thinking, work habits and personal and social adaptability.
4. Evaluation always assumes that educational objectives have previously been identified and defined.
5. A comprehensive programme of evaluation involves the use of many procedures (for example, analytico-synthetic, heuristic, experimental, lecture, etc.); a great variety of tests (for example, essay type, objective type, etc.); and other necessary techniques (for example, socio-metric, controlled-observation techniques, etc.).
6. Learning is more important than teaching. Teaching has no value if it does not result in learning on the part of the pupils. Objectives and learning experiences should be so relevant that ultimately they should direct the pupils towards the accomplishment of educational goals. To assess the students and their complete development brought about through education is evaluation.
7.Evaluation is the determination of the congruence between the performance and objectives.

STEPS INVOLVED IN EVALUATION

(i) Identifying and Defining General Objectives

In the evaluation process, first step is to determine what to evaluate, i.e., to set down educational objectives. The process of identifying and defining educational objectives is a complex one; there is no simple or single procedure which suits all teachers. Some prefer to begin with the course content, some with general aims, and some with lists of objectives suggested by curriculum experts in the area. While stating the objectives, therefore, we can successfully focus our attention on the product i.e., the pupil’s behaviour, at the end of a course of study and state it in terms of his knowledge, understanding, skill, application, attitudes, interests, appreciation, etc.

(ii) Identifying and Defining Specific Objectives:

The setting of specific objectives will provide direction to teaching-learning process. It determine two things; one, the various types of learning situations to be provided by the class teacher to his pupils and second, the method to be employed to evaluate both—the objectives and the learning experiences.

(iii) Selecting Teaching Points:

The next step in the process of evaluation is to select teaching points through which the objectives can be realised. Once the objectives are set up, the next step is to decide the content (curriculum, syllabus, course) to help in the realisation of objectives.

 (iv) Planning Suitable Learning Activities:

In the fourth step, the teacher will have to plan the learning activities by caring the objectives as well as teaching points. The process then becomes three dimensional, the three co-ordinates being objectives, teaching points and learning activities. The teacher gets the objectives and content readymade.
He is completely free to select the type of learning activities such as analytico-synthetic method; inducto-deductive reasoning; experimental method or a demonstration method; discovery method, lecture method; or he may ask the pupils to divide into groups and to do a sort of group work followed by a general discussion; and so on. One thing he has to remember is that he should select only such activities as will make it possible for him to realise his objectives.

(v) Evaluating:

In the fifth step, the teacher observes and measures the changes in the behaviour of his pupils through evaluation process.
Here the teacher will construct a test by making the maximum use of the teaching points already introduced in the class and the learning experiences already acquired by his pupils. He may plan for an oral lest or a written test; he may administer an essay type test or an objective type of lest; or he may arrange a practical test.

(vi) Using the Results as Feedback

If the teacher, after testing his pupils, finds that the objectives have not been realised to a great extent, he will use the results in reconsidering the objectives and in organising the learning activities. He will retrace his steps to find out the drawbacks in the objectives or in the learning activities he has provided for his students. This is known as feedback. Whatever results the teacher gets after testing his pupils should be utilised for the betterment of the students.

*                  PURPOSES OF EVALUATION
Evaluation is a very important requirement for the education system. It fulfills various purposes in systems of education like quality control in education, selection/entrance to a higher grade or tertiary level.

Functions of Evaluation:

Evaluation plays a vital role in teaching learning experiences. It is an integral part of the instructional programmes. It provides information’s on the basis of which many educational decisions are taken.
Evaluation has the following functions:

1. Placement Functions:

·                     Evaluation helps to study the entry behaviour of the children in all respects.
·                     That helps to undertake special instructional programmes.
·                     To provide for individualisation of instruction.
·                     It also helps to select pupils for higher studies, for different vocations and specialised courses.


2. Instructional Functions:
·                     It helps in systematic determination of a subject's merit, worth and significance, using criteria governed by a set of standards.
·                   Evaluation helps to build an educational programme, assess its achievements and improve upon its effectiveness.
·                   It reviews the progress in learning from time to time.
·                   It also provides valuable feedback on the design and the implementation of the programme.
·                   Evaluation plays an enormous role in the teaching-learning process. It helps teachers and learners to improve teaching and learning.
·                   Evaluation is a continuous process and a periodic exercise.
·                   It helps in forming the values of judgement, educational status, or achievement of student.
·                   In learning, it contributes to formulation of objectives, designing of learning experiences and assessment of learner performance.
·                   Besides this, it is very useful to bring improvement in teaching and curriculum.
·                   It provides accountability to the society, parents, and to the education system.
·                   The improvement in courses/curricula, texts and teaching materials is brought about with the help of evaluation.
·                   It helps in selecting instructional strategies.

3. Diagnostic Functions:

·                     Evaluation has to diagnose the weak points in the school programme as well as weakness of the students.
·                     To suggest relevant remedial programmes.
·                     The aptitude, interest and intelligence are also to be recognised in each individual child so that he may be energised towards a right direction.
·                     To adopt instruction to the different needs of the pupils.
·                     To evaluate the progress of these weak students in terms of their capacity, ability and goal.

4. Predictive functions:

·                     To discover potential abilities and aptitudes among the learners.
·                     To predict the future success of the children.
·                     And also helps the child in selecting the right electives.

5. Administrative Functions:

·                     To adopt better educational policy and decision making.
·                     Helps to classify pupils in different convenient groups.
·                     To promote students to next higher class,
·                     To appraise the supervisory practices.
·                     To have appropriate placement.
·                     To draw comparative statement on the performance of different children.
·                     To have sound planning.
·                      Helps to test the efficiency of teachers in providing suitable learning experiences.
·                     To mobilise public opinion and to improve public relations.
·                     Helps in developing a comprehensive criterion tests.

6. Guidance Functions:

·                     Assists a person in making decisions about courses and careers.
·                     Enables a learner to know his pace of learning and lapses in his learning.
·                     Helps a teacher to know the children in details and to provide necessary educational, vocational and personal guidance.

7. Motivation Functions:

·                     To motivate, to direct, to inspire and to involve the students in learning.
·                     To reward their learning and thus to motivate them towards study.

8. Development Functions:

·                     Gives reinforcement and feedback to teacher, students and the teaching learning processes.
·                     Assists in the modification and improvement of the teaching strategies and learning experiences.
·                     Helps in the achievement of educational objectives and goals.

9. Research Functions:

·                     Helps to provide data for research generalisation.
·                     Evaluation clears the doubts for further studies and researches.
·                     Helps to promote action research in education.

10. Communication Functions:

·                     To communicate the results of progress to the students.
·                     To intimate the results of progress to parents.
·                     To circulate the results of progress to other schools.

Major Differences between Evaluation and Measurement
  • 1. While evaluation is a new concept, measurement is an old concept.
  • 2. While evaluation is a technical term, measurement is a simple word.
  • 3. While the scope of evaluation is wider, the scope of measurement is narrow.
  • 4. In evaluation pupil’s qualitative progress and behavioural changes are tested. In measurement only quantitative progress of the pupils can be explored.
  • 5. In evaluation, the learning experiences are provided to the pupils in accordance with pre­determined teaching objectives are tested. In measurement the content skill and achievement of the ability are not tested on the basis of some objectives but the result of the testing is expressed in numerals, scores, average and percentage.
  • 6. The qualities are measured in the evaluation as a whole. In measurement, the qualities are measured as separate units.
  • 7. Evaluation is the process by which the previous effects and hence caused behavioural changes are tested. Measurement means only those techniques which are used to test a particular ability of the pupil.
  • 8. In evaluation, various techniques like observation, hierarchy, criteria, interest and tendencies measurement etc. are used for testing the behavioural changes. In measurement, personality test, intelligence test and achievement test etc. are included.
  • 9. Evaluation is that process by which the interests, attitudes, tendencies, mental abilities, ideals, behaviours and social adjustment etc. of pupils are tested. By measurement, the interests, attitudes tendencies, ideals and behaviours cannot be tested.
  • 10. The evaluation aims at the modification of education system by bringing a change in the behaviour. Measurement aims at measurement only.

*                  Types of Evaluation:

Evaluation can be classified into different categories in many ways.
Some important classifications are as follows:
Types of Evaluation

2. FORMATIVE EVALUATION:

Formative evaluation are given at regular and frequent intervals during a course to monitor the learning progress of students during the period of instruction. It helps a teacher to ascertain the pupil-progress from time to time.
Its main objective is to provide continuous feedback to both teacher and student, concerning learning successes and failures while instruction is in process.
Feedback to students provides reinforcement of successful learning and identifies the specific learning errors that need correction. The pupil knows his learning progress from time to time. This type of evaluation is an essential tool to provide feedback to the learners for improvement of their self-learning. Thus, formative evaluation motivates the pupils for better learning.
Feedback to teacher provides information for the teachers to improve their methodologies of teaching, nature of instructional materials, etc. and to modify instruction and for prescribing group and individual remedial work.
Thus, it aims at improvement of instruction. “The idea of generating information to be used for revising or improving educational practices is the core concept of formative evaluation.”
Therefore, evaluation and development must go hand in hand. The evaluation has to take place in every possible situation or activity and throughout the period of formal education of a pupil.
The functions of formation evaluation are:
(a) Diagnosing:
Diagnosing is concerned with determining the most appropriate method or instructional materials conducive to learning.
(b) Placement:
Placement is concerned with the finding out the position of an individual in the curriculum from which he has to start learning.
(c) Monitoring:
Monitoring is concerned with keeping track of the day-to- day progress of the learners and to point out changes necessary in the methods of teaching, instructional strategies, etc.
Characteristics of Formative Evaluation:
The characteristics of formative evaluation are as follows:
1.                  It is an integral part of the learning process.
2.                  It occurs, frequently during the course of instruction.
3.                  Its results are made immediately known to the learners.
4.                  It may sometime takes form of teacher observation only.
5.                  It reinforces learning of the students.
6.                  It pinpoints difficulties being faced by a weak learner.
7.                  Its results cannot be used for grading or placement purposes.
8.                  It helps in modification of instructional strategies including method of teaching, immediately.
9.                  It motivates learners, as it provides them with knowledge of progress made by them.
10.              It sees role of evaluation as a process.
11.              It is generally a teacher-made test.
12.              It does not take much time to be constructed.
Examples:
i. Monthly tests.
ii. Class tests.
iii. Periodical assessment.
iv. Teacher’s observation, etc.

3. Diagnostic Evaluation:

Formative evaluation provides first-aid treatment for simple learning problems whereas diagnostic evaluation searches for the underlying causes of those problems that do not respond to first-aid treatment. It is concerned with identifying the learning difficulties or weakness of pupils during instruction. It tries to locate or discover the specific area of weakness of a pupil in a given course of instruction and also tries to provide remedial measure.
When the teacher finds that inspite of the use of various alternative methods, techniques and corrective prescriptions the child still faces learning difficulties, he takes recourse to a detailed diagnosis through specifically designed tests called ‘diagnostic tests’.
Diagnosis can be made by employing observational techniques, too. In case of necessity the services of psychological and medical specialists can be utilized for diagnosing serious learning handicaps. 

4. SUMMATIVE EVALUATION:

Summative evaluation is done at the end of a course of instruction or at the end of a fairly long period (say, a semester) to know how far the extent the objectives previously fixed have been accomplished. In other words, it is the evaluation of pupils’ achievement at the end of a course. The traditional examinations are generally summative evaluation tools.
The main objective of the summative evaluation is
·                     the degree to which the students have mastered the course content.
·                     judge the appropriateness of instructional objectives.
·                     generally the work of standardised tests.
·                     to compare one course with another.
·                     imply some sort of final comparison of one item or criteria against another.
The functions of this type of evaluation are:
(a) Crediting:
Crediting is concerned with collecting evidence that a learner has achieved some instructional goals in contents in respect to a defined curricular programme.
(b) Certifying:
Certifying is concerned with giving evidence that the learner is able to perform a job according to the previously determined standards.
(c) Promoting:
It is concerned with promoting pupils to next higher class.
(d) Selecting:
Selecting the pupils for different courses after completion of a particular course structure.
Characteristics of Summative Evaluation:
a. It is terminal in nature as it comes at the end of a course of instruction (or a programme).
b. It is judgemental in character in the sense that it judges the achievement of pupils.
c. It views evaluation “as a product”, because its chief concern is to point out the levels of attainment.
d. It cannot be based on teachers observations only.
e. It does not pin-point difficulties faced by the learner.
f. Its results can be used for placement or grading purposes.
g. It reinforces learning of the students who has learnt an area.
h. It may or may not motivate a learner. Sometimes, it may have negative effect.
Examples:
1. Traditional school and university examination,
2. Teacher-made tests,
3. Standardised tests,
4. Practical and oral tests, and 
5. Rating scales, etc.

5. NORM-REFERENCED AND CRITERION-REFERENCED EVALUATION:

 (i) Criterion-Referenced Evaluation:
When the evaluation is concerned with the performance of the individual in terms of what he can do is termed as criterion- referenced evaluation. There is no reference to the performance of other members of the group. In it we refer an individual’s performance to a predetermined criterion which is well defined. The purpose of criterion-referenced evaluation/test is to assess the objectives. It is the objective based test. The objectives are assessed, in terms of behavioural changes among the students. Such type of test assesses the ability of the learner in relation to the criterion behaviour.
Examples
(i) Raman got 93 marks in a test of Mathematics.
(ii) A typist types 60 words per minute.
(iii) Amit’s score in a reading test is 70.
 (ii) Norm Referenced Evaluation:
A norm-referenced test is used to ascertain an individual’s status with respect to the performance of other individuals on that test.
Norm-referenced evaluation is the traditional class-based assignment of numerals to the attribute being measured. It means that the measurement act relates to some norm, group or a typical performance. It is an attempt to interpret the test results in terms of the performance of a certain group. This group is a norm group because it serves as a referent of norm for making judgements. Test scores are neither interpreted in terms of an individual (self-referenced) nor in terms of a standard of performance or a pre-determined acceptable level of achievement called the criterion behaviour (criterion-referenced). The measurement is made in terms of a class or any other norm group.
Almost all our classroom tests, public examinations and standardised tests are norm-referenced as they are interpreted in terms of a particular class and judgements are formed with reference to the class.
Examples:
(i) Raman stood first in Mathematics test in his class.
(ii) The typist who types 60 words per minute stands above 90 percent of the typists who appeared the interview.
(iii) Amit surpasses 65% of students of his class in reading test.
INTERNAL ASSESSMENT

Internal assessment is often called “Home examination”, “Class room test” or “Teacher made test. There are the assessments for which all the arrangement is made by the teachers of the same institution. Its main aim is to evaluate the progress of students in different classes at different levels. Teachers themselves frame the question papers, take the exam, examine the answer scripts/answer copies and decide about the Fail/Pass of the students.

Objectives of Internal Assessment:
·                     To evaluate the Mental Nourishment of students.
·                     To estimate the student’s educational progress, speed of achieving and ability of learning.
·                     On passing the internal exam, promotion is given to next class.
·                     Internal assessment creates the competing environment, which make pleasant effects over the educational achievements.
·                     Students and teacher both know the status of each student, who is leading and who is lagging and how much.
·                     Teacher evaluates his progress and his teaching methods and tries to overcame his weakness.
·                     It evaluates the particular curriculum for a particular class.
·                     Parents of the students are informed about the progress of students so that they can care for their children.
·                     Teacher can group the students according to Ability, Hardwork, Intelligence on the basis of the result and make arrangements for weak students’ betterment.
·                     Result of these test work as motive for further study and encourage or admonish the students accordingly.
·                     It fulfills the objective of learning and retaining it for along time.
·                     Teacher knows the hidden abilities, capabilities, desires and interests of the students, and became able to guide them accordingly on the basis of there.

Types of Internal Assessment
                            Following are the types of Assessment
·                     Daily Test
·                     Weekly Test
·                     Fortnightly Test
·                     Monthly Test
·                     Three monthly or Terminal Test
·                     Annual exam or Annual Promotion Test
·                     Entrance Test or admission Test
Merits:
1.                  It is direct, flexible and can easily be tied with the unit of instruction.
2.                  It is economical in terms of time and money and can be conducted frequently.
3.                  There is little scope of mal-practices and the students get satisfaction (by receiving back their scripts) that they have been accurately graded.
4.                  It permits the use of a variety of evaluation tools and the results can be used for the improvement of teaching learning processes and providing remedial teaching.
5.                  The student accepts it as of a variety of evaluation tools and the results can before the improvement of teaching learning processes and providing remedial teaching.
6.                  The student accepts it as part of teaching learning process and faces it without squirm or fear.
7.                  It provides essential date for the cumulative record, for grouping students according to their ability, and for reporting to parents as well as for making decisions with regard to annual promotion.
8.                  It has content validity and scares are sufficiently reliable.
9.                  Cheaper: Hiring an external evaluator often means someone HARC with lots of graduate education and years of expertise, and that doesn’t come cheap
10.              Doesn’t require collaboration: This makes the process faster


Demerits
1.                  Every teacher is not competent to construct and use these techniques of evaluation.
2.                  Internal assessment tends to lead to indiscreet comparison of students.
3.                  It is not possible to apply internal evaluation in respect of thousands of private candidates.
4.                  Teacher can yield to local pressures.
5.                  Grades will vary from school and will not have uniform significance.
6.                  Pupils and their parents have lessor faith in internal evaluation.
7.                  Teachers having freedom of evaluating their own students, may tend to be lax in covering the prescribed syllabus.
8.                  Perceived lack of objectivity
9.                  Lack of “outside the box” thinking

EXTERNAL EVALUATION

1.                  External Assessment is organized and conducted through standardized test, observation, and other techniques by an external agency, other than the school.
2.                  Process of External Assessment Conduct
a.                   Setting and moderation of question papers.
b.                  Printing and packing of question papers confidential nature of printing work.
c.                   Selection of examination centres
d.                  Appointment of superintendents and invigilators and staff for the fair conduct of examination at centres.
e.                   Supply of stationary to centres.
f.                   Distribution of question papers to examinees under the supervision of  the centre superintendent.
g.                  Posting of police personnel at the centres.
h.                  Packing of answer scripts and sending them to Board’s office or examining body’s office.
i.                    Deployment of special squads for checking unfair means.
j.                    Assignment of fake of fictitious or secret roll numbers to answer books at the Board’s office.
k.                  On the spot evaluation at some specified centres where head examiner and examiners mark the scripts.
Importance & Objectives of External Assessment:
External evaluation provides
1.                  Degree/Certificate
2.                  A standard
3.                  Comparison of abilities.
4.                  To evaluate the progress of Institution
5.                  Selection for Higher education
6.                  To get employment
7.                  Popularity/Standard of educational institution.
8.                  Selection of intelligent students.
9.                  Competition.
10.              Evaluation of teacher’s performance
11.              Evaluation of objectives and curriculum.
12.              Creation of good habits in students
13.              Satisfaction and happiness of parents
Merits
1.                   Conducted by experts
2.                   Perceived objectivity: Having a third-party do your evaluation is like a stamp of approval. People tend to take the results more seriously.
3.                   Outside-the-box perspective: Being one step removed, evaluators can see changes that have happened that might have gone unnoticed (or at least unmeasured) by you and your team.
De-Merits of External Assessment
1.            Use of unfair means in the examination hall.
2.            Just pass the exam/to get degree
3.            Partial curriculum is covered
4.            In complete evaluation of personality.
5.            Un reliable results.
6.            Use of helping books & guess papers.
7.            Chance/Luck
8.            Corruption
9.            Exams without specific objectives.
10.        Negative effect/Impact on the students.
11.        It is time consuming.
12.        Standards vary from Board to Board and University in the same year.
13.        Marking is not up to the standard.
14.        Expensive: A good evaluator doesn’t come cheap, and you get what you pay for.
15.        Requires collaboration: Collaboration is awesome when done right, but it does take time and effort on both parties, and there can be miscommunications between two teams just getting to know each other.

Suggestions for Improvement
1.                  Comprehensive Evaluation
2.                  Employees of examining bodies to be controlled.
3.                  Invigilating staff.
4.                  Secrecy sections should be fool proof.
5.                  Appointment of Examiners
6.                  Change in examination point of view, It should not be objective, It should be mean to achieve objectives.
7.                  Reform in question papers.
8.                  Marking of Answer Scripts.
9.                  Ban on helping books and guess papers.
10.              Amalgamation of Internal and External exam.
11.              Oral test should be taken.
12.              Amalgamation of subjective and objective type test.
13.              Record of students.
14.              Question paper should be based on curriculum rather than text book.

In-spite of these flaws both is necessary for the betterment of education system. Internal assessment prepares the students for external Assessment. Therefore we can’t avoid any one. But we have to replace/remove the negative points from these to make more effective to these systems.

Characteristic of a good test

1. Reliability

 “Reliability refers to the consistency of measurement—that is, how consistent test scores or other evaluation results are from one measurement to other.”
 Gronlund and Linn (1995) 
2. Reliability is the “wor­thiness with which a measuring device measures something; the degree to which a test or other instrument of evaluation measures consistently whatever it does in fact measure.”
 C.V. Good (1973)
The dictionary meaning of reliability is consistency, depend­ence or trust. So in measurement reliability is the consistency with which a test yields the same result in measuring whatever it does measure. Therefore reliability can be defined as the degree of consistency between two measurements of the same thing.
For example we administered an achievement test on Group-A and found a mean score of 55. Again after 3 days we ad­ministered the same test on Group-A and found a mean score of 55. It indicates that the measuring instrument (Achievement test) is providing a stable or dependable result. On the other hand if in the second measurement the test provides a mean score around 77 then we can say that the test scores are not consistent.
Thus reliability answers to the following questions:
How similar the test scores are if the lost is administered twice?
How similar the test scores are if two equivalent forms of tests are administered?
To what extent the scores of any essay test differ when it is scored by different teachers?

It is not always possible to obtain perfectly consistent results. Because there are several factors like physical health, memory, guessing, fatigue, forgetting etc. which may affect the results from one measurement to other? These extraneous variables may introduce some error to our test scores. This error is called as measurement errors. So while determining reliability of a test we must take into consideration the amount of error present in measurement.

Methods of Determining Reliability
Different types of consistency are determined by different methods. These are as follows:
1. Consistency over a period of time.
2. Consistency over different forms of instrument.
3. Consistency within the instrument itself
There are four methods of determining reliability coefficient, such as:
(a) Test-Retest method.
(b) Equivalent forms/Parallel forms method.
(c) Split-half method.
(d) Rational Equivalence/Kuder-Richardson method.
(а) Test-Retest Method:
This is the simplest method of determining the test reliability. To determine reliability in this method the test is given and repeated on same group. Then the correlation between the first set of scores and second set of scores is obtained. A high coefficient of correlation indicates high stability of test scores. Measures of stability in the .80’s and .90’s are com­monly reported for standardized tests over occasions within the same year.
 (b) Equivalent Forms/Parallel Forms Method:
Reliability of test scores can be estimated by equivalent forms method. It is also otherwise known as Alternate forms or parallel forms method. When two equivalent forms of tests can be con­structed the correlation between the two may be taken as measures of the self correlation of the test. In this process two parallel forms of tests are administered to the same group of pupils in short interval of time, then the scores of both the tests are cor­related. This correlation provides the index of equivalence. Usually in case of standardized psychological and achievement tests the equivalent forms are available.
Both the tests selected for administration should be parallel in terms of content, difficulty, format and length. When time gap between the administrations of two forms of tests are provided the coefficient of test scores provide a measure of reliability and equivalence. But the major drawback with this method is to get two parallel forms of tests. When the tests are not exactly equal in terms of content, difficulty, length and comparison between the scores ob­tained from these tests may lead to erroneous decisions.
(c) Split-Half Method:
In this method a single test is administered to a group of pupils in usual manner. Then the test is divided into two equivalent values and correlation for these half-tests are found.
The common procedure of splitting the test is to take all odd numbered items i.e. 1, 3, 5, etc. in one half and all even-numbered items i.e. 2, 4, 6, 8 etc. in the other half
Then scores of both the halves are correlated by using the Spearman- Brown formula.
For example by correlating both the halves we found a coef­ficient of .70.
By using formula (5.1) we can get the reliability coefficient on full test as:
The reliability coefficient .82 when the coefficient of correlation between half test is .70. It indicates to what extent the sample of test items are dependable sample of the content being measured—internal consistency.
 “Split half reliabilities tend to be higher than equivalent form reliabilities because the split half method is based on the administration of a single test form.” This method over-comes the problem of equivalent forms method introduced due to differences from form to form, in attention, speed of work, effort, fatigue and test content etc.

Factors Affecting Reliability:

The major factors which affect the reliability of test, scores can be categorized in to three headings:
1. Factors related to test.
2. Factors related to testee.
3. Factors related to testing procedure.
1. Factors related to test:
(а) Length of the test:
Spearman Brown formula in­dicates the longer the test is, the higher the reliability will be. Because a longer test will provide adequate sample of the behaviour. Another cause is that guessing factor is apt to be neutralized in a longer test.
 (b) Content of the test:
Content homogeneity is also a factor which results is high reliability.
(c) Characteristics of items:
The difficulty level and clarity of expression of a test item also affect the reliability of test scores. If test items are too easy or difficult for the group members it will tend to produce scores of low reliability. Because both the tests have a restricted spread of scores.
(d) Spread of Scores:
According to Gronlund and Minn (1995) “other things being equal, larger the spread of scores is the higher the estimate of reliability will be.” When the spread of scores are large there is greater chance of an individual to stay in the same relative position in a group from one testing to another. We can say that errors of measurement affect less to the relative position of the individual when the spread of scores are large.
For example in Group A students have secured marks ranging from 30 to 80 and in Group B student have secured marks ranging from 65 to 75. If we shall administer the tests second time in Group A the test scores of individuals could vary by several points, with very little shifting in the relative position of the group mem­bers. It is because the spread of scores in Group A is large.
2. Factors related to testee:
 (a) Heterogeneity of the group:
When the group is a homogeneous group the spread of the test scores is likely to be less and when the group tested is a heterogeneous group the spread of scores is likely to be more. Therefore reliability coefficient for a heterogeneous group will be more than homogeneous group.
(b) Test wiseness of the students:
Experience of test taking also affect the reliability of test scores. Practice of the students in taking sophisticated tests increases the test reliability. But when in a group all the students do not have same level of test wiseness, it leads to greater measurement errors.
(c) Motivation of the students:
When the students are not motivated to take the test, they will not represent their best achievement. This depresses the test scores.
3. Factors related to testing procedure:
 (a) Time Limit of test:
When the students get more time to take the test they can make more guessing, which may increase the test scores. Therefore by speeding up a test we can increase the test reliability.
(b) Cheating opportunity given to the students:
Cheat­ing by the students during the test administration leads to meas­urement errors. This will make the observed score of cheaters higher than their true score.

2. VALIDITY

Gronlund and Linn (1995)—”Validity refers to the ap­propriateness of the interpretation made from test scores and other evaluation results with regard to a particular use.”
Validity means truth-fullness of a test. It means to what extent the test measures that, what the test maker intends to measure.

Nature of Validity:

1. Validity refers to the appropriateness of the test results but not to the instrument itself.
2. Validity does not exist on an all-or-none basis but it is a matter of degree.
3. Tests are not valid for all purposes.  It is specific to particular interpretation. For example the results of a vocabulary test may be highly valid to test vocabulary but may not be that much valid to test composition ability of the student.
4. Validity is not of different types. It is a unitary concept. It is based on various types of evidence.

Factors Affecting Validity:

1. Factors in the test:
(i) Unclear directions to the students to respond the test.
(ii) Difficulty of the reading vocabulary and sentence structure.
(iii) Too easy or too difficult test items.
(iv) Ambiguous statements in the test items.
(v) Inappropriate test items for measuring a particular outcome.
(vi) Inadequate time provided to take the test.
(vii) Length of the test is too short.
(viii) Test items not arranged in order of difficulty.
(ix) Identifiable pattern of answers.
Factors in Test Administration and Scoring:
(i) Unfair aid to individual students, who ask for help,
(ii) Cheating by the pupils during testing.
(iii) Unreliable scoring of essay type answers.
(iv) Insufficient time to complete the test.
(v) Adverse physical and psychological condition at the time of testing.
Factors related to Testee:
(i) Test anxiety of the students.
(ii) Physical and Psychological state of the pupil,
(iii) Response set—a consistent tendency to follow a certain pattern in responding the items.

3. Objectivity:

Objectivity in testing is “the extent to which the instrument is free from personal error (personal bias), that is subjectivity on the part of the scorer”.
C.V. Good (1973)
“Objectivity of a test refers to the degree to which equally competent scores obtain the same results. So a test is considered objective when it makes for the elimination of the scorer’s personal opinion and bias judgement. In this con­text there are two aspects of objectivity which should be kept in mind while constructing a test.”
Gronlund and Linn (1995)
 (i) Objectivity of Scoring:
Objectivity of scoring means same person or different persons scoring the test at any time arrives at the same result without may chance error. The scoring procedure should be such that there should be no doubt as to whether an item is right or wrong or partly right or partly wrong.
(ii) Objectivity of Test Items:
By item objectivity we mean that the item must have one and only one interpretation by students. It means the test items should be free from ambiguity. A given test item should mean the same thing to all the students that the test maker intends to ask. Dual meaning sentences, items having more than one correct answer should not be included in the test as it makes the test subjective.

4. Usability:

Usability is another important characteristic of measuring instruments. Because practical considerations of the evaluation instruments cannot be neglected. The test must have practical value from time, economy, and administration point of view. This may be termed as usability.
So while constructing or selecting a test the following practical aspects must be taken into account:
(i) Ease of Administration:
It means the test should be easy to administer by simple and clear directions and the timing of the test should not be too difficult.
(ii) Time required for administration:
Appropriate time limit to take the test should be provided. Gronlund and Linn (1995) are of the opinion that “Somewhere between 20 and 60 minutes of testing time for each individual score yielded by a published test is probably a fairly good guide”.
(iii) Ease of Interpretation and Application:
Another im­portant aspect of test scores are interpretation of test scores and application of test results. If the results are misinterpreted, it is harmful on the other hand if it is not applied, then it is useless.
(iv) Availability of Equivalent Forms:
Equivalent forms tests helps to verify the questionable test scores. It also helps to eliminate the factor of memory while retesting pupils on same domain of learning. Therefore equivalent forms of the same test in terms of content, level of difficulty and other characteristics should be available.
(v) Cost of Testing: It should be economical.

CONSTRUCTION OF TESTS
The four main steps in construction of tests are:
1. Planning the Test
2. Preparing the Test
3. Try out the Test
4. Evaluating the Test.

 

Step 1. Planning the Test:

Planning of the test is the first important step in the test construction. The main goal of evaluation process is to collect valid, reliable and useful data about the student.
It includes
1. Determining the objectives of testing.
2. Preparing test specifications.
3. Selecting appropriate item types.

1. Determining the Objectives of Testing:

A test can be used for different purposes in a teaching learning process such as
1.      An instrument to measure the entry per­formance of the students.
2.      for formative evaluation.
3.      to find out the immediate learning difficulties and
4.       to assign grades or to determine the mastery level of the students
5.      To suggest its remedies.
So these tests should cover the whole instructional objectives and content areas of the course.

2. Preparing Test Specifications:

The second important step in the test construction is to prepare the test specifications in order to be sure that the test will measure a representative sample of the instructional objectives and an elaborate design for test construction. One of the most commonly used devices for this purpose is ‘Table of Specification’ or ‘Blue Print.’
Preparation of Table of Specification/Blue Print:
Preparation of table of specification is the most important task in the planning stage. It acts, as a guide for the test con­struction. Table of specification or ‘Blue Print’ is a three dimen­sional chart showing list of instructional objectives, content areas and types of items in its dimensions.
It includes four major steps:
(i) Determining the weightage to different instructional objectives.
(ii) Determining the weightage to different content areas.
(iii) Determining the item types to be included.
(iv) Preparation of the table of specification.
(i) Determining the weightage to different instructional ob­jectives:
In a written test we cannot measure the psychomotor domain and affective domain. We can only measure the cognitive domain. It is also true that all the subjects do not contain different learning objectives like knowledge, un­derstanding, application and skill in equal proportion. Therefore, it must be planned how much weightage to be given to different instructional objectives by keeping in mind the importance of the particular objective for that subject or chapter.
For example, if we may give the weightage to different instructional objectives in General Science for Class—X as following:
Weightage Given To Different Instructional Objectives in a Test of 100 Marks
(ii) Determining the weightage to different content areas:
The second step in preparing the table of specification is to outline the content area. It also prevents repetition or omission of any unit. Now weightage should be given to which unit should be decided by the concerned teacher by keeping the impor­tance of the chapter in mind, area covered by the topic in the text book and number of items to be prepared.
For example
Weightage of a topic:
Table 3.2. Table showing weightage given to different content areas:
Weightage Given To Different Content Areas
(iii) Determining the item types:
The third important step in preparing table of specification is to decide appropriate item types. Items used in the test construction can broadly be divided into two types like objective type items and essay type items. For some instructional purposes, the objective type items are most efficient where as for others the essay questions prove satisfac­tory.
Appropriate item types should be selected according to the learning outcomes to be measured.
(iv) Preparing the Three Way Chart:
Preparation of the three way chart is last step in preparing table of specification. This chart relates the instructional objectives to the content area and types of items. In a table of specification the instructional objec­tives are listed across the top of the table, content areas are listed down the left side of the table and under each objective the types of items are listed content-wise. Table 3.3 is a model table of specification for X class science.
Table of Specification for Science (Biology) Class-X

Step  2. Preparing the Test:

After planning test items are constructed in accordance with the table of specification. Each type of test item need special care for construction. 
The preparation stage includes the fol­lowing three functions:
(i) Preparing test items.
(ii) Preparing instruction for the test.
(iii) Preparing the scoring key.

(i) Preparing the Test Items:

Preparation of test items is the most important task in the preparation step. Therefore care must be taken in preparing a test item. The following principles help in preparing relevant test items.
1. Test items must be appropriate for the learning out­come to be measured:
The test items should be so designed that it will measure the performance described in the specific learning outcomes.
2. Test items should measure all types of instructional objectives and the whole content area:
The items in the test should be so prepared that it will cover all the instructional objectives—Knowledge, understanding, think­ing skills and match the specific learning outcomes and subject matter content being measured. When the items are constructed on the basis of table of specification the items became relevant.
3. The test items should be free from ambiguity:
The item should be clear. Inappropriate vocabulary and awkward sentence structure should be avoided. The items should be so worded that all pupils understand the task.
Example:
Poor item —Where did Gandhi born
Better —In which city did Gandhi born?
4. The test items should be of appropriate difficulty level:
The test items should be proper difficulty level, so that it can discriminate properly. If the item is meant for a criterion-refer­enced test its difficulty level should be as per the difficulty level indicated by the statement of specific learning outcome. Therefore if the learning task is easy the test item must be easy and if the learning task is difficult then the test item must be difficult.
In a norm-referenced test the main purpose is to discriminate pupils according to achievement so that the test should be so designed that there must be a wide spread of test scores. Therefore the items should not be so easy that everyone answers it correctly and also it should not be so difficult that everyone fails to answer it. The items should be of average difficulty level.
5. The test item must be free from technical errors and irrelevant clues:
Sometimes there are some unintentional clues in the state­ment of the item which helps the pupil to answer correctly. For example grammatical inconsistencies, verbal associations, extreme words (ever, seldom, always), and mechanical features (correct statement is longer than the incorrect). Therefore while construct­ing a test item careful step must be taken to avoid most of these clues.
6. Test items should be free from racial, ethnic and sexual biasness:
The items should be universal in nature. Care must be taken to make a culture fair item. While portraying a role all the facilities of the society should be given equal importance. The terms used in the test item should have an universal meaning to all members of group.

(ii) Preparing Instruction for the Test:

This is the most neglected aspect of the test construction. Generally everybody gives attention to the construction of test items. So the test makers do not attach directions with the test items.
But the validity and reliability of the test items to a great extent depends upon the instructions for the test.
N.E. Gronlund has suggested that the test maker should provide clear-cut direction about;
a. The purpose of testing.
b. The time allowed for answering.
c. The basis for answering.
d. The procedure for recording answers.
e. The methods to deal with guessing.
Direction about the Purpose of Testing:
A written statement about the purpose of the testing maintains the uniformity of the test. Therefore there must be a written instruction about the purpose of the test before the test items.
Instruction about the time allowed for answering:
Clear cut instruction must be supplied to the pupils about the time allowed for whole test. It is also better to indicate the ap­proximate time required for answering each item, especially in case of essay type questions. So that the test maker should care­fully judge the amount of time taking the types of items, age and ability of the students and the nature of the learning outcomes expected. Experts are of the opinion that it is better to allow more time than to deprive a slower student to answer the question.
Instructions about basis for answering:
Test maker should provide specific direction on the basis of which the students will answer the item. Direction must clearly state whether the students will select the answer or supply the answer. In matching items what is the basis of matching the premises and responses (states with capital or country with produc­tion) should be given. Special directions are necessary for inter­pretive items. In the essay type items clear direction must be given about the types of responses expected from the pupils.
Instruction about recording answer
Students should be instructed where and how to record the answers. Answers may be recorded on the separate answer sheets or on the test paper itself. If they have to answer in the test paper itself then they must be directed, whether to write the correct answer or to indicate the correct answer from among the alternatives. In case of separate answer sheets used to answer the test direction may be given either in the test paper or in the answer sheet.
Instruction about guessing:
Direction must be provided to the students whether they should guess uncertain items or not in case of recognition type of test items. If nothing is stated about guessing, then the bold students will guess these items and others will answer only those items of which they are confident. So that the bold pupils by chance will answer some items correctly and secure a higher score. There­fore a direction must be given ‘to guess but not wild guesses.’

(iii) Preparing the Scoring Key:

A scoring key increases the reliability of a test so that the test maker should provide the procedure for scoring the answer scripts. Directions must be given whether the scoring will be made by a scoring key or by a scoring stencil and how marks will be awarded to the test items.
Thus a scoring key helps to obtain a consistent data about the pupils’ performance. So the test maker should prepare a comprehensive scoring procedure along with the test items.

Step  3. Try Out of the Test:

Try out helps us to identify defective and ambiguous items, to determine the difficulty level of the test and to determine the discriminating power of the items.
Try out involves two important functions:
(a) Administration of the test.
(b) Scoring the test.

(a) Administration of the test:

Administration means ad­ministering the prepared test on a sample of pupils. So the ef­fectiveness of the final form test depends upon a fair administra­tion. It implies that the pupils must be provided congenial physical and psychological environment during the time of testing. Any other factor that may affect the testing procedure should be controlled.
Physical environment means proper sitting arrangement, proper light and ventilation and adequate space for invigilation, Psychological environment refers to these aspects which in­fluence the mental condition of the pupil. Therefore steps should be taken to reduce the anxiety of the students. The test should not be administered just before or after a great occasion like annual sports on annual drama etc.
One should follow the following principles during the test administration:
1. The teacher should talk as less as possible.
2. The teacher should not interrupt the students at the time of testing.
3. The teacher should not give any hints to any student who has asked about any item.
4. The teacher should provide proper invigilation in order to prevent the students from cheating.

(b) Scoring the test:

Once the test is administered and the answer scripts are obtained the next step is to score the answer scripts. A scoring key may be provided for scoring when the answer is on the test paper itself Scoring key is a sample answer script on which the correct answers are recorded.

Step  4. Evaluating the Test:

Evaluating the test is most important step in the test con­struction process. Evaluation is necessary to determine the quality of the test and the quality of the responses. Quality of the test implies that how good and dependable the test is? (Validity and reliability). Quality of the responses means which items are misfit in the test. It also enables us to evaluate the usability of the test in general class-room situation.
Evaluating the test involves following functions:
(a) Item analysis.
(b) Determining validity of the test.
(c) Determining reliability of the test.
(d) Determining usability of the test.

(a) Item analysis:

Item analysis is a procedure which helps us to find out the answers to the following questions:
a. Whether the items functions as intended?
b. Whether the test items have appropriate difficulty level?
c. Whether the item is free from irrelevant clues and other defects?
d. Whether the distracters in multiple choice type items are effective?
The item analysis data also helps us:
a. To provide a basis for efficient class discussion of the test result
b. To provide a basis for the remedial works
c. To increase skill in test construction
d. To improve class-room discussion.
Item Analysis Procedure:
Item analysis procedure gives special emphasis on item difficulty level and item discriminating power.
The item analysis procedure follows the following steps:
1. The test papers should be ranked from highest to lowest.
2. Select 27% test papers from highest and 27% from lowest end.
For example if the test is administered on 60 students then select 16 test papers from highest end and 16 test papers from lowest end.
3. Keep aside the other test papers as they are not required in the item analysis.
4. Tabulate the number of pupils in the upper and lower group who selected each alternative for each test item. This can be done on the back of the test paper or a separate test item card may be used
5. Calculate item difficulty for each item by using formula: 
Where R= Total number of students got the item correct.
T = Total number of students tried the item.
In our example out of 32 students from both the groups 20 students have answered the item correctly and 30 students have tried the item.
The item difficulty is as following:
It implies that the item has a proper difficulty level. Because it is customary to follow 25% to 75% rule to consider the item difficulty. It means if an item has a item difficulty more than 75% then is a too easy item if it is less than 25% then item is a too difficult item.
6. Calculate item discriminating power by using the following formula:
Item discriminating power = 
Where RU= Students from upper group who got the answer correct.
RL= Students from lower group who got the answer correct.
T/2 = half of the total number of pupils included in the item analysis.
In our example 15 students from upper group responded the item correctly and 5 from lower group responded the item correctly.
A high positive ratio indicates the high discriminating power. Here .63 indicates an average discriminating power. If all the 16 students from lower group and 16 students from upper group answers the item correctly then the discriminating power will be 0.00.
It indicates that the item has no discriminating power. If all the 16 students from upper group answer the item correctly and all the students from lower group answer the item in correctly then the item discriminating power will be 1.00 it indicates an item with maximum positive discriminating power.
The Item Analysis Card
Preparing a test item file:
Once the item analysis process is over we can get a list of effective items. Now the task is to make a file of the effective items. It can be done with item analysis cards. The items should be arranged according to the order of difficulty. While filing the items the objectives and the content area that it measures must be kept in mind. This helps in the future use of the item.

(b) Determining Validity of the Test:

At the time of evaluation it is estimated that to what extent the test measures what the test maker intends to measure.

(c) Determining Reliability of the Test:

Evaluation process also estimates to what extent a test is consistent from one measurement to other. Otherwise the results of the test can not be dependable.

(d) Determining the Usability of the Test:

Try out and the evaluation process indicates to what extent a test is usable in general class-room condition. It implies that how far a test is usable from administration, scoring, time and economic point of view.
CONTINUOUS AND COMPREHENSIVE EVALUATION, CCE (concept, need and relevance)
Continuous and Comprehensive Evaluation (CCE) system was introduced by the Central Board of Secondary Education (CBSE) in India for students of sixth to tenth grades. In this scheme the term `continuous’ means regularity of assessment, frequency of unit testing, diagnosis of learning gaps, use of corrective measures, retesting and for their self evaluation.
The second term `comprehensive’ means that the scheme attempts to cover both the scholastic and the co scholastic aspects of students’ growth and development.
The main aim is to evaluate every aspect of the child during their presence at the school.
1.      It assesses all aspects of a student’s development on a continuous basis throughout the year.
2.      The assessment covers both scholastic subjects as well as co-scholastic areas such as performance in sports, art, music, dance, drama, and other cultural activities and social qualities.
3.      It is a developmental process of assessment which emphasizes on two fold objectives. These objectives are continuity in evaluation and assessment of broad based learning and behaviourial outcomes on the other.
4.      This is non-threatening for all children including those with special needs as it discourages irrational comparison and labeling with no fear of examination.
5.      It brings a change to the usual chalk and talk method.
6.      It finds out the learning difficulties and can give remedial measures.
7.      It brings flexibility to plan academic schedules.
8.      It reduces workload on students and improves overall skill and ability of the student by means of evaluation of other activities.
9.      In this the marks of the students are replaced by grades. Grades are awarded to students based on work experience, skills, dexterity, innovation, steadiness, team work public speaking, behaviour etc to evaluate and present an overall measure of the students ability. This helps the students who are not good in academics to show their talent in other fields such as arts, humanities, sports, music athletics etc.
10.  It is done in projects, assignments, practical, seminar records and collections which are graded based on specific grading indicators.
11.  Co scholastic abilities are also considered in terms of work experience, art education health and physical education.
12.  It makes children and parents as active participants in learning and development of children.
13.  Opportunities of self-assessments and peer assessments enable children take charge of their learning and gradually progress towards self-learning.
14.  Sharing of their learning progress with timely feedback during teaching learning and constructive suggestions during quarterly Parent-Teacher Meetings (PTMs) makes them aware of the extent of accomplishment and be prepared for the further efforts required to be undertaken.
15.  Rational division of the syllabus to be covered in each quarter may be planned in advance for the yearly academic calendars.
16.  Teachers’ suggestions and participation towards development of such plans needs to be ensured. If possible, such a planning may be done at the school level.
17.  Resources and activities may only be suggestive and teachers need to be given freedom to chose or devise new learning aids or strategies.
18.  Assessment questions, exercises, assignments need to be process based and allow children to think critically and explore.
19.  They should not assess rote memory of children.
20.  The written tests if evaluated using marks or grades need to be supported with qualitative descriptions as marks or grades can help you to decide the learning level but remarks highlight the gaps and the suggestions for improvement.
21.  The levels assigned for different learning outcomes under different curricular areas provide useful information to the teachers on how many children are lagging behind on the specific learning outcome(s).
Hence, the data from the quarterly progress reports further provide insights to not just students but also the teachers on how to review their teaching learning to take steps (assessment for learning) for the next quarter.

Types of Grading Systems

There are 7 Types of grading systems available. They are :
1.      Percentage Grading – From 0 to 100 Percent
2.      Letter grading and variations – From A Grade to F Grade
3.      Norm-referenced grading – Comparing students to each other usually letter grades
4.      Mastery grading – Grading students as “masters” or “passers” when their attainment reaches a pre specified level
5.      Pass/Fail – Using the Common Scale as Pass/Fail
6.      Standards (or Absolute-Standards) grading – Comparing student performance to a pre established standard (level) of performance
7.      Narrative grading -Writing Comments about students

1.Grading System in India

Percentage
Grade Point
Grade
Classification/ Division
60–100
3.5–4.0
A or (O)
First class/ Distinction / Outstanding
55–59
3.15–3.49
B+
Second Class
50–54
2.5–3.14
B
Second Class
43–49
2.15–2.49
C+
Third Division
35*–42
1.5–2.14
C
Fail/Third Division
0–34
0–1.49
F
Fail





Grading
Grading in education is the process of applying standardized measurements of varying levels of achievement in a course. Grading system gives verbal description and symbols to the achievement rather than scoring numerically traditional marking scheme.
Grades can be assigned as letters (for example A through F), as a range (for example 1 to 6), as a percentage of a total number of questions answered correctly, or as a number out of a possible total (for example out of 20 or 100). Grading system is a method used by teachers to assess students’ educational performance. In early times, simple marking procedure was used by educators. But now, a proper grading system is followed by every educational institute. The grades such as A, A-, A+, B, B-, B+, C, D E and so on are used to evaluate the performance of a student in a test, presentation or final examination. Each grade contains a range of percentage or marks

Advantages of Grading System in Education:

1. Takes the pressure off from the students at certain levels:

In a general grading system as considered above, a student’s real scores and its associated marks are not accounted on the official transcript, which denotes that their GPA will not have an effect on either a pass or a fail mark category. This spares the students from getting preoccupied and become fussy about getting an elevated letter grade. 

2. Grading Pattern description:

Students are bundled and grouped according to the different types of grading scales they get which are entirely based on the marks that they get in each subject that is taught in school.
In case of India the general pattern is as follows
A1: 91 to 100
A2 : 81 to 90
B1: 71 to 80
B2: 61 to 70
C1 : 51 to 60
C2 : 41 to 50
D for 33 to 40 and lesser for E’s.
Another advantage of this method is that it has introduced the notion of measuring the students’ knowledge based on their internal assignments, projects, and their answering ability in class and their overall performance in all the major examinations. It is not just a solitary examination forced method. Earlier the marks that were obtained in the exams are the only indicator of whether a child is studying or not. But, this system analyzes whether a child understands the concept or not.

3. Gives the students an obvious idea about their weaknesses and strengths:

Knowing precisely which subject(s) are their weak spots, students can easily decide where to toggle their focal point on. In a grading system where the alphabets are the scales, a grade of C or grade of D is known to speak a lot.
So, when the total grades arrive these students can easily get to know their forte.

4. Make class work easier:

The student does not need to toil them to achieve the necessary minimum.

5. Leads to better ideas:

Classes or the courses that are often taught in a classroom medium within the confined premises of a school are highly difficult and are taken in the ultimate sense as getting a pass or a fail on a subject and this builds a sense of responsibility in their minds to work and train hard in their weak spots.

Disadvantages of Grading System in Education:

Also, the following points can be considered as worthy of our importance while considering the disadvantages of grading system in education. They are,

1. It doesn’t instill a sense of competition:

When all that required is a mere pass mark, we would neither have the urge to outperform others nor do we want to excel with the overall grades.
The A grade speaks a lot about our calibre than a D or an F. With a D or an F, we can be only satisfied that we are okay enough in studies, which will make us go lazy.

2. Not an accurate representation of the performance and the knowledge gained:

As we have said already, passing in an examination cannot be considered as plausible enough to declare that the same student has gained an immense amount of knowledge by these exams.
An alphabet cannot explain the inner knowledge gained by a student and there is no easy way of gauging a student’s level of performance and knowledge in the examinations.

3. It is not an exact scoring system:

the inner knowledge we have gained via these grades can be nil, as we may have attempted for learning without understanding the concept, with the sole perspective of getting an A or a C.
4. Demotivation: Grading system demotivates the students who perform higher because they stand equal to those making less efforts. For instance, grade A will be assigned to all those scoring from 90 to 100. So students who made no mistakes and those who made a few, all will stand equally at one grade.
5. Increased Lethargy: As grading system has divided the marks among different tasks such as assignments, presentations and final exams, the students become lethargic due to it. They score enough in assignments and projects and become lesser active in final exams.

What is GPA and CGPA?
Grade Point Average
The GPA is calculated by taking the number of grade points a student earned in a given period of time in school and in undergraduate, graduate and postgraduate courses in most universities. GPA is an abbreviation for Grade Point Average. It is a standard method of calculating a student’s average grade over a stipulated period, like one term/semester.
GPA is calculated by dividing the average of grade points a student achieves, by the total credit hours attended by the student.
GPA, or Grade Point Average, is a number that indicates how well or how high you scored in your courses on average. This number is then used to assess whether you meet the standards and expectations set by the degree programme or university

cumulative grade point average (CGPA) is a calculation of the average of all of a student's total earned points divided by the possible number of points. This grading system calculates for all of his or her complete education career.
CGPA refers to ‘Cumulative Grade Point Average’. It is used to denote a student’s overall average performance throughout their academic program in high school, Bachelor’s, or Master’s program. To start off with, credit hours are the total amount of time a student spends in classes. Grade points are the marks you receive for your subjects.


TO CALCULATE CGPA
Divide your total score of grade points for all subjects throughout your semesters by the total number of credit hours attended throughout your semesters. GPA and CGPA are indicated by a number as opposed to percentages, and grades are assigned under the Indian grading system.
DIRECT GRADING
Performance is assessed in qualitative terms
Evaluator gives grades such as A B C D E F according to the standards without assigning scores.
Preferred to non cognitive learning outcomes

ADVANTAGES OF DIRECT GRADING

 

Simplifies the process of assessment
Makes a raw assessment on a raw scale
Uses a uniform scale for the assessment of quality
Separates assessment of quality and range
INDIRECT GRADING
Evaluator gives grades through marks. Convert marks to grades
This is of two types
They are absolute and relative
Absolute grading
Relative grading
ABSOLUTE GRADING
Based on predetermined standard which becomes a reference point for assessing students performance. Direct conversion of marks into grades irrespective of the distribution of marks
For example, a common absolute grading scale would be
A = 90-100
B = 80-89
C = 70-79
D = 60-69
F = 0-59
Whatever score the student earns is their grade.  There are no adjustments made to their grade. For example, if everyone gets a score between 90-100 everyone gets an “A” or if everyone gets below 59 everyone gets an “F.” The absolute nature of absolute grading makes it inflexible and constraining for unique situations.
RELATIVE GRADING
Range varies in tune with the relative position of the group. The evaluation is done according to the performance of members. Relative grading allows for the teacher to interpret the results of an assessment and determine grades based on student performance.
A = Top 10% of students
B = Next 25% of students
C = Middle 30% of students
D = Next 25% of students
F = Bottom 10% of students
As such, if the entire class had a score on an exam between 90-100% using relative grading would still create a distribution that is balanced.

WEIGHTED AVERAGES
We can calculate the arithmetic mean or elementary average of the measurements by summing them and dividing by the number of measurements. However, in certain situations, some measurements count more than others, and to get a meaningful average, we have to assign weight to the measurements. The usual way to do this is to multiply each measurement by a factor that indicates its weight, then sum the new values, and divide by the number of weight units we assigned.

Mathematically

When calculating an arithmetic average, first sum all the measurements (m) and divide by the number of measurements (n).
∑(m1...mn) ÷ n
where the symbol ∑ means "sum all the measurements from 1 to n."
To calculate a weighted mean, multiply each measurement by a weighting factor (w).
In most cases, the weighting factors add up to 1 or, if you are using percentages, to 100 percent. If they don't add up to 1, use this formula:
∑ (m1w1...mnwn) ÷ ∑(w1...wn) or simply ∑mw ÷ ∑w

 

Weighted Averages in the Classroom

Teachers typically use weighted averages to assign appropriate importance to classwork, homework, quizzes and exams when calculating final grades.
 For example, in a certain physics class, the following weights may be assigned:
  • Lab work: 20 percent
  • Homework: 20 percent
  • Quizzes: 20 percent
  • Final Exam: 40 percent
In this case, all the weights add up to 100 percent, so a student's score can be calculated as follows:
[(Lab work score) * 0.2 + (homework) * 0.2 + (quizzes) * 0.2 + (final exam) * 0.4]
If a student's grades were 75 percent for lab work, 80 percent for homework, 70 percent for quizzes and 75 percent for the final exam, her final grade would be:
= (75) * 0.2 + (80) * 0.2 + (70) * 0.2 + (75) * 0.4
= 15 + 16 + 14 + 30 = 75 percent.

WEIGHTED SCORE
A weighted score or weighted grade is merely the average of a set of grades, where each set carries a different amount of importance.
Suppose your final grade will be determined in this manner:
Percentage of your Grade By Category
  • Homework: 10%
  • Quizzes: 20%
  • Essays: 20%
  • Midterm: 25%
  • Final: 25%
Eg 1
Category Averages:
  • Homework average: 98%
  • Quiz average: 84%
  • Essay average: 91%
  • Midterm: 64%
  • Final: ?
To figure out the math and determine what kind of studying efforts,we need to follow a 3-part process:
Step 1:
Set up an equation with  goal percentage (80%) in mind:
H%*(H average) + Q%*(Q average) + E%*(E average) + M%*(M average) + F%*(F average) = 80%
Step 2:
Next, we multiply the percentage of  grade by the average in each category:
  • Homework: 10% of grade * 98% in category = (.10)(.98) = 0.098
  • Quiz average: 20% of grade * 84% in category = (.20)(.84) = 0.168
  • Essay average: 20% of grade * 91% in category = (.20)(.91) = 0.182
  • Midterm: 25% of grade * 64% in category = (.25)(.64) = 0.16
  • Final: 25% of grade * X in category = (.25)(x) = ?
Step 3:
Finally we, add them up and solve for x:
0.098 + 0.168 + 0.182 + 0.16 + .25x = .80
0.608 + .25x = .80
.25x = .80 – 0.608
.25x = .192
x = .192/.25
x = .768
x = 77%
Teacher uses weighted scores for final exam.

Marks:
  • It measures intelligence of students based on marks and ranks.
  • Students in this system are aim to get only one thing – good scores in each subject.
  • Students in the system are always encouraged to outperform each other which may not give fruitful result.
  • Many times it puts students as well as parents under a lot of pressure for the different reasons like low score in academic, rewrite mistaken task etc.
  • Always note that the passion to outperform makes students ready to take pressure in Higher classes. In this marks system pressure over students’ increases only.
Grades:
  • In the grade system one of the best thing that measures students intelligence based on their performance instead of marks and ranks.
  • Grade system believes to encourage overall development of students rather than only academic such as personality development, social development etc.
  • There is very poor competitiveness among students whether students are encouraged to focus on their aim for own success.
  • The system relieves students and parents from unnecessary pressure, it believes to set them free so that students will achieve their aim and parents will know what our kids like or love to do in their lives. In the long run School life become so easy for everyone.
  • It makes students to face difficulty in coping up with the pressure during higher studies.
 Non standardized test

Non-standardized assessment looks at an individual's performance, and does not allow us to compare that performance to another's. It allows us to obtain specific information about that particular student

Forms of Non-Standardized Testing

Forms include portfolios, interviews, informal questioning, group discussions, oral tests, quick pop quizzes, and exhibitions of work, projects and performance exams.

ESSAY

An essay is generally a short piece of writing outlining the writer’s perspective or story. It is often considered synonymous with a story or a paper or an article. Essays can be both formal as well as informal. Formal essays are generally academic in nature and tackle serious topics.

Types of Essays

There are broadly four types of essays.
1.      Narrative Essays: This is when the writer is narrating an incident or story through the essay. So these are in the first person. The aim when writing narrative essays is to involve the reader in them as if they were right there when it was happening. SO make them as vivid and real as possible. So you must involve the reader in the story.
2.      Descriptive Essays: Here the writer will describe a place, an object, an event or maybe even a memory. But it is not just plainly describing things. The writer must paint a picture through his words. One clever way to do that is to evoke the senses of the reader. Do not only rely on sight but also involve the other senses of smell, touch, sound etc. A descriptive essay when done well will make the reader feel the emotions the writer was feeling at the moment.
3.      Expository Essays: In such an essay a writer presents a balanced study of a topic. To write such an essay, the writer must have real and extensive knowledge about the subject. There is no scope for the writer’s feelings or emotions in an expository essay. It is completely based on facts, statistics, examples etc. There are sub-types here like contrast essays, cause and effect essays etc.
4.      Persuasive Essays: Here the purpose of the essay is to get the reader to your side of the argument. A persuasive essay is not just a presentation of facts but an attempt to convince the reader of the writer’s point of view. Both sides of the argument have to presented in these essays. But the ultimate aim is to persuade the readers that the writer’s argument carries more weight.

Format of an Essay

Now there is no rigid format of an essay. It is a creative process so it should not be confined within boundaries. However, a basic structure followed is

Introduction

The writer introduces his topic for the very first time about 4-6 lines. It gives a very brief synopsis of your essay. You can start with a quote or a proverb or with a definition or with a question.

Body

The body is between the introduction and the conclusion. So the most vital and important content of the essay will be here. It can extend to two or more paragraphs according to the content. It is important to organize your thoughts and content. Write the information in a systematic flow so that the reader can comprehend. So, for example, you were narrating an incident. The best manner to do this would be to go in a chronological order.

Conclusion

This is the last paragraph of the essay. Sometimes a conclusion will just mirror the introductory paragraph but make sure the words and syntax are different. A conclusion is also a great place to sum up a story or an argument. You can round up your essay by providing some moral or wrapping up a story. Make sure you complete your essays with the conclusion, leave no hanging threads.

SHORT ANSWER TYPE
Short-answer questions are open-ended questions used in examinations to assess the basic knowledge and understanding (low cognitive levels) of a topic before more in-depth assessment questions are asked on the topic.
Structure of Short Answer Questions

Short Answer Questions do not have a generic structure. Questions may require answers such as complete the sentence, supply the missing word, short descriptive or qualitative answers, diagrams with explanations etc. The answer is usually short, from one word to a few lines. Often students may answer in bullet form.
Example
1.      MHz measures the _________________ of the computer.
2.      List the different types of plastic surgery procedures.
Advantages of Short Answer Questions
·         Short Answer Questions are relatively fast to mark
·         They are also relatively easy to set
·          Short Answer Questions can be used as part of a formative and summative assessment,
·         Unlike MCQs, there is no guessing on answers.
Disadvantages of Short Answer Questions
  1. short responses.
  2. the assessor is very clear on the type of answers expected
  3. students are not free to answer any way they choose
  4. short-answer questions can lead to difficulties in grading if the question is not worded carefully.
  5. Short Answer Questions are typically used for assessing knowledge only,
  6. students may often memorize Short Answer Questions with rote learning.
  7. Accuracy of assessment may be influenced by handwriting/spelling skills
  8. There can be time management issues when answering Short Answer Questions
How to design a good Short Answer Question
  1. Design learning objective
  2. Make sure the content of the short answer question measures knowledge appropriate to the desired learning goal
  3. Express the questions with clear wordings and language which are appropriate to the students
  4. Ensure there is only one clearly correct answer in each question
  5. Ensure that the item clearly specifies how the question should be answered
  6. Consider whether the positioning of the item blank promote efficient scoring
  7. Write the instructions clearly so as to specify the desired knowledge and specificity of response
  8. Set the questions explicitly and precisely.
  9. Direct questions are better than those which require completing the sentences.
  10. For numerical answers, let the students know if they will receive marks for showing partial work (process based) or only the results (product based), also indicated the importance of the units.
  11. Let the students know what your marking style is like, is bullet point format acceptable, or does it have to be an essay format?
  12. Prepare a structured marking sheet; allocate marks or part-marks for acceptable answer(s).
  13. Be prepared to accept other equally acceptable answers, some of which you may not have predicted.

True/False Test Taking Strategies

The following strategies will enhance your ability to answer true/false questions correctly:
1.      Approach each statement as if it were true.
Approach each statement as if it were true and then determine if any part of the statement is false. Just one false part in a statement will make the entire statement false.
2.      For a sentence to be true, every part must be "true".
At first glance, a sentence may appear to be true because it contains facts and statements that are true. However, if just one part of the sentence if false, then the entire sentence is false. A sentence may be mostly true because it contains correct information but it is ultimately false if it contains any incorrect information.
3.      Pay attention for "qualifiers".
Qualifiers words like: Sometimes, seldom, few, always, every, often, frequently, never, generally, ordinarily restrict or open up the possibilities of making accurate statements. More modest qualifiers, such as "sometimes, often, many, few, generally, etc", are more likely to reflect a true statement, sentence, or answer. Stricter qualifiers, such as "always" or "never", often reflect a false statement, sentence, or answer.
4.      Don't let "negatives" confuse you.
Negatives, such as "no, not, cannot", can be confusing within the context of a true/false sentence or statement. If at true/false sentence contains a negative, drop the negative word and then read what remains. Without the negative, determine whether the sentence is true of false. If the sentence (without the negative) is true, then the correct answer would be "false".
5.      Watch for statements with double negatives.
Statements with two negative words are positive. For example, "It is unlikely the car will not win the race." is the same as "It is likely the car will win the race. Negative words include not and cannot along with words beginning with the prefixes dis-, il-, im-, in-, ir-, non-, and un-.
6.      Pay attention for "absolute" qualifiers.
       As we already discussed, qualifiers open up or restrict the possibilities of a statement being true of false. Absolute qualifiers, such as: all, always, never, entirely, completely, best, worst, none, absolutely which do not allow for exceptions imply that the statement must be true 100% of time. In most cases, statements that contain absolute qualifiers are false.
7.      Thoroughly examine long sentences and statements.
Long sentences often contain groups of words and phrases separated or organized by punctuation. Read each word set and phrase individually and carefully. If one word set or phrase in the statement is false (even if the rest are true) then the entire statement is false and the answer is "false".
8.      Make an educated guess.
If it will not negatively impact your score, and you're unsure of the answer, make an educated guess. You have a 1 in 2 chance of being right. However, truth be told, often true/false tests contain more true answers than false answers. So if you're completely unsure, guess "true".
9.      Longer statements may be false.
The longer a true/false statement, the greater the likelihood the statement will be false. The longer the statement, the more chance one part will be false.
10.  Reason statements tend to be false.
Questions that state a reason tend to be false. Words including "because, reason, since, etc" often indicate a "reason" statement.
11.  Budget your time.
Before tackling even one true/false question, take a look at the entire test to see how many questions there are. If the test has 60 true/false questions, and you have a 1 hour time limit, then you should spend no more than 1 minute on each question. While some questions will require more time than others, remember, you can't spend a lot of time on any one question.
RATING SCALE
Definition
Rating scale is defined as a closed-ended survey question to rate an attribute or feature. Rating scale is a variant of the popular multiple-choice question which is widely used to gather information that provides relative information about a specific topic.

Types of Rating Scale: Ordinal and Interval Scales.

An ordinal scale is a scale the depicts the answer options in an ordered manner.
An interval scale is a scale where not only is the order of the answer variables established but the magnitude of difference between each answer variable is also calculable. Absolute or true zero value is not present in an interval scale. Temperature in Celsius or Fahrenheit is the most popular example of an interval scale. Net Promoter ScoreLikert Scale, Bipolar Matrix Table are some of the most effective types of interval scale.
There are four primary types of rating scales which can be suitably used in an online survey:
·         Graphic Rating Scale
·         Numerical Rating Scale
·         Descriptive Rating Scale
·         Comparative Rating Scale
1.      Graphic Rating Scale: Graphic rating scale indicates the answer options on a scale of 1-3, 1-5, etc.  Respondents can select a particular option on a line or scale to depict rating. Likert Scale is a popular graphic rating scale example.
2.      Numerical Rating Scale: Numerical rating scale has numbers as answer options
3.      Descriptive Rating Scale:  In a descriptive rating scale, each answer option is elaborately explained for the respondents. for example, a customer satisfaction survey, which needs to describe all the answer options in detail
4.       Comparative Rating Scale: Comparative rating scale, as the name suggests, expects respondents to answer a particular question in terms of comparison, i.e. on the basis of relative measurement or keeping other organizations/products/features as a reference.

Uses of Rating Scale

  1. Gain relative information about a particular subject
  2. Compare and analyze data: 
  3.  Measure one important product/service element
 Advantages of rating scale
  1. Rating scale questions are easy to understand and implement.
  2. Offers a comparative analysis of quantitative data 
  3. Using graphic rating scales, it is easy for researchers to create surveys
  4. Abundant information can be collected and analyzed using a rating scale.
  5. The analysis of answer is quick and less time-consuming.
  6. Rating scale is a standard for collecting qualitative and quantitative information

WHAT IS AN ANECDOTAL RECORD?

An anecdotal record (or anecdote) is like a short story that educators use to record a significant incident that they have observed. Anecdotal records are usually relatively short and may contain descriptions of behaviours and direct quotes.

Why use anecdotal records?

Anecdotal records are easy to use and quick to write, so they are the most popular form of record that educators use. Anecdotal records allow educators to record qualitative information, like details about a child’s specific behaviour or the conversation between two children. These details can help educators plan activities, experiences and interventions because they can be written on his break, or at the end of the day.

The Critical Incident Technique (or CIT) is a set of procedures used for collecting facts by direct observations of human behavior that have critical significance and meet methodically defined criteria. These observations are then kept track of as incidents, which are then used to solve practical problems and develop broad psychological principles and how to improve the performance of the individuals involved.
The investigator may focus on a particular incident or set of incidents which caused serious loss. Critical events are recorded and stored in a database or on a spreadsheet. Analysis may show how clusters of difficulties are related to a certain aspect of the system or human practice. Investigators then develop possible explanations for the source of the difficulty.
The method generates a list of good and bad behaviors which can then be used for performance appraisal.
CIT is a flexible method that usually relies on five major areas.
1.       The first is determining and reviewing the incident
2.       Then fact-finding, which involves collecting the details of the incident from the participants.
3.       The next step is to identify the issues.
4.       Afterwards a decision can be made on how to resolve the issues based on various possible solutions.
5.       The final and most important aspect is the evaluation, which will determine if the solution that was selected will solve the root cause of the situation and will cause no further problems.
SOCIOMETRY
Sociometry is the inquiry into the evolution and organization of groups and the position of individuals within them. Sociometric explorations reveal the hidden structures that give a group its form: the alliances, the subgroups, the hidden beliefs, the forbidden agendas, the ideological agreements, the ‘stars’ of the show.
One of Moreno's innovations in sociometry was the development of the sociogram, a systematic method for graphically representing individuals as points/nodes and the relationships between them as lines/arcs.

Objective Structured Clinical Examination(OSCEsis a form of performance-based testing used to measure candidates’ clinical competence. 
Originally it is described as ‘a timed examination in which medical students interact with a series of simulated patients in stations that may involve history-taking, physical examination, counselling or patient management.
The OSCE is a versatile multipurpose evaluative tool that can be utilized to evaluate health care professionals in a clinical setting. It assesses competency, based on objective testing through direct observation. It is comprised of several "stations" in which examinees are expected to perform a variety of clinical tasks within a specified time period against criteria formulated to the clinical skill, thus demonstrating competency of skills and/or attitudes.
 The OSCE has been used to evaluate the ability to obtain/interpret data, problem-solve, teach, communicate, and handle unpredictable patient behavior, which are otherwise impossible in the traditional clinical examination. Any attempt to evaluate these critical areas in the old-fashioned clinical case examination will seem to be assessing theory rather than simulating practical performance.
It has proved to be so effective that it is now being adopted in disciplines other than medicine, like dentistry, nursing, midwifery, pharmacy and event engineering and law.

Features of the Objective Structured Clinical Examination (OSCEs)

·         Stations are short, 
·         Stations are numerous 
·         Stations are highly focused ,
·          candidates are given very specific instructions
·         A pre-set structured mark scheme is used hence…
·          reduced examiner input and discretion
Emphasis on:  
·         What candidates can do rather than what they know
·         The application of knowledge rather than the recall of knowledge
Typically…
·         5 minutes most common (3-20 minutes)
·         (minimum) 18-20 stations/2 hours for adequate reliability
·         Written answer sheets or observer assessed using checklists
·         Mix of station types/competences tested
·         Examination hall is a hospital ward
·         Atmosphere active and busy
Additional options…
·         Double or triple length stations
·         Linked stations
·         Preparatory stations
·         “Must pass” stations
·         Rest stations

How is the OSCEs done?

 The following are steps in sequence:
   1. Registration: The first step is the registration.
  • Show your examination invitation card and identification.
  • Be reminded about the exam rules.
  • Be checked for things which are allowed and other not allowed things.
  • Receive your exam envelope which contains your ID badge, stickers, a pencil, a notebook or clipboard (both with numbered blank papers),.. etc.
   2. Orientation: The next step is orientation. An orientation video may be shown. Here:
  • Exam format, procedures and policies will be reviewed.
  • Introduced to your team and team leader.
  • Instructed about your starting station and how to proceed.
  • Your questions will be answered (and not allowed beyond this step).
   3. Escorting to exam position: Now it is exam time.
You will be escorted to your station. You will stop by the assigned room door until a long bell / buzzer announces the start of the exam.
   4. Station Instruction Time: 
This is one or two minutes to read the instruction about this station situation, patient, and required tasks. Read carefully. At the next bell / buzzer enter the room.
   5. The Encounter: 
Start your encounter with the SP. This is a 5-20 minute encounter. Perform the required tasks. Stop at the next bell / buzzer.
   6. Post Encounter Period: Next is a question period.
There are some differences here. Some OSCEs will have no post encounter periods. Some will have one or two minutes of the encounter period assigned to an oral questions asked by the examiner inside the exam room. No more communication is allowed with the SP. Others have written questions to be answered on paper or computer outside the exam room for 5-10 minutes. At the next long bell / buzzer, the first station ended as well as the next station has started. You have to proceed to the next station quickly as it is the same long bell / buzzer at step 4.
   7. Repeat Steps 4 to 6:
Steps 4 to 6 will be repeated until you have been in all the stations. Some OSCEs will offer one or two short rest periods.   
   8. Exam ended / Escorting to dismissal area: The exam is over.
You will be escorted back to the dismissal area for signing out. You will be asked to handle back all what you had received on signing in, the ID badge, remaining stickers, all the papers, and the pencil. You may also be asked to stay without outside contacts for some time (sometimes hours) for exam security reasons. 

The Objective Structured Clinical Examination is a versatile multipurpose evaluative tool that can be utilized to assess health care professionals in a clinical setting. It assesses competency, based on objective testing through direct observation. It is precise, objective, and reproducible allowing uniform testing of students for a wide range of clinical skills. Unlike the traditional clinical exam, the OSCE could evaluate areas most critical to performance of health care professionals such as communication skills and ability to handle unpredictable patient behavior.
Keywords: Objective, Examination, Clinical skills

Advantages and Disadvantages of OSCE

1.       The advantages of OSCE apart from its versatility
2.       Broadening scope are its objectivity, reproducibility, and easy recall.
3.       All students get examined on predetermined criteria on same or similar clinical scenario or tasks with marks written down against those criteria thus enabling recall, teaching audit and determination of standards.
4.       In a study from Harvard medical school, students in second year were found to perform better on interpersonal and technical skills than on interpretative or integrative skills. This allows for review of teaching technique and curricula.
5.       Performance is judged not by two or three examiners but by a team of many examiners in-charge of the various stations of the examination. This is to the advantage of both the examinee and the teaching standard of the institution as the outcome of the examination is not affected by prejudice and standards get determined by a lot more teachers each looking at a particular issue in the training.
6.       OSCE takes much shorter time to execute examining more students in any given time over a broader range of subjects.
7.       However no examination method is flawless and the OSCE has been criticized for using unreal subjects even though actual patients can be used according to need
8.       OSCE is more difficult to organize and requires more materials and human resources
THE OBJECTIVE STRUCTURED PRACTICAL EXAMINATION (OSPE)
The objective structured practical examination (OSPE) was used as an objective instrument for assessment of laboratory exercises in preclinical sciences, particularly physiology.
 It was adapted from the objective structured clinical examination (OSCE). The OSPE was administered to two consecutive classes in conjunction with the conventional examination in which the candidate is expected to perform a given experiment. The scores of the students in the two components of the examination were used to compare the OSPE with the conventional examination and to evaluate the new instrument of assessment. The OSPE appears to be a reliable device with a good capacity for discriminating between different categories of students. It is better in these respects than the conventional practical examination. Moreover, it has scope for being structured in such a way that all the objectives of laboratory teaching can be tested and each aspect can be assigned the desired weightage.
The assessment of practical skills is often neglected. A contributing factor is the unsatisfactory nature of the assessment instruments commonly used. The objective structured practical examination (OSPE) is a practical, reliable and valid alternative.
The main features of the OSPE are:
(1)    separate assessment of process and product through observation of performance and assessment of end result;
(2)    adequate sampling of skills and content to be tested;
(3)    an analytical approach to the assessment;
(4)    objectivity;
(5)     feedback to teacher and students.
The OSPE approach merits consideration in any subject where practical skills should be assessed.
DIFFERENTIAL SCALES (OR THURSTONE-TYPE SCALES)
The name of L.L. Thurstone is associated with differential scales which have been developed using consensus scale approach. Under such an approach the selection of items is made by a panel of judges who evaluate the items in terms of whether they are relevant to the topic area and unambiguous in implication. The detailed procedure is as under:
  • The researcher gathers a large number of statements, usually twenty or more, that express various points of view toward a group, institution, idea, or practice (i.e., statements belonging to the topic area).
  • These statements are then submitted to a panel of judges, each of whom arranges them in eleven groups or piles ranging from one extreme to another in position. Each of the judges is requested to place generally in the first pile the statements which he thinks are most unfavorable to the issue, in the second pile to place those statements which he thinks are next most unfavorable and he goes on doing so in this manner till in the eleventh pile he puts the statements which he considers to be the most favorable.
  • This sorting by each judge yields a composite position for each of the items. In case of marked disagreement between the judges in assigning a position to an item, that item is discarded.
  • For items that are retained, each is given its median scale value between one and eleven as established are arranged in random order of scale value. If the values are valid and if the opinionnaire deals with only one attitude dimension, the typical respondent will choose one or several contiguous items (in terms of scale values) to reflect his views. However, at times divergence may occur when a statement appears to tap a different attitude dimension.
Thurstone method has been widely used for developing differential scales which are utilized to measure attitudes towards varied issues like war, religion, etc. Such scales are considered most appropriate and reliable when used for measuring a single attitude.
  1. Requires more cost and effort.
  2. the values assigned to various statements by the judges may reflect their own attitudes.
  3. The method is not completely objective; it involves ultimately subjective decision process.
 Summated Scales (or Likert-type Scales) Summated scales (or Likert-type scales)
Summated scales consist of a number of statements which express either a favorable or unfavorable attitude towards the given object to which the respondent is asked to react. The respondent indicates his agreement or disagreement with each statement in the instrument. Each response is given a numerical score, indicating its favorableness or unfavorableness, and the scores are totaled to measure the respondent’s attitude. For this reason they are often referred to as Likert-type scales.
 In a Likert scale, the respondent is asked to respond to each of the statements in terms of several degrees, usually five degrees (but at times 3 or 7 may also be used) of agreement or disagreement.
Eg. i. strongly agree, ii. agree, iii. undecided, iv. disagree, v. strongly disagree. We find that these five points constitute the scale. At one extreme of the scale there is strong agreement with the given statement and at the other, strong disagreement, and between them lie intermediate points. It assigns a scale value to each of the five responses. The instrument yields a total score for each respondent, which would then measure the respondent’s favorableness toward the given point of view. If the instrument consists of, say 30 statements, the following score values would be revealing. 30 × 5 = 150 Most favorable response possible 30 × 3 = 90 A neutral attitude 30 × 1 = 30 Most unfavorable attitude. The scores for any individual would fall between 30 and 150. If the score happens to be above 90, it shows favorable opinion to the given point of view, a score of below 90 would mean unfavorable opinion and a score of exactly 90 would be suggestive of a neutral attitude.
Procedure:
    1. As a first step, the researcher collects a large number of statements which are relevant to the attitude being studied
    2. A trial test should be administered to a number of subjects.
    3. Each statement, included in the Likert-type scale, is given an empirical test for discriminating ability
    4. Likert-type scale can easily be used in respondent-centered and stimulus centered studies
    5. Likert-type scale takes much less time to construct, it is frequently used by the students of opinion research.
    6. It is most useful in a situation wherein it is possible to compare the respondent’s score with a distribution of scores from some well defined group.
Limitations:
  1. With this scale, we can simply examine whether respondents are more or less favorable to a topic, but we cannot tell how much more or less they are.
  2. There is no basis for belief that the five positions indicated on the scale are equally spaced.
  3. The interval between ‘strongly agree’ and ‘agree’, may not be equal to the interval between “agree” and “undecided”.
  4. The total score of an individual respondent has little clear meaning since a given total score can be secured by a variety of answer patterns.
  5. It is unlikely that the respondent can validly react to a short statement on a printed form in the absence of real-life qualifying situations. Moreover, there “remains a possibility that people may answer according to what they think they should feel rather than how they do feel.”

Standardised Test

A standardised test is one that has been carefully constructed by experts in the light of acceptable objectives or purposes; procedure for administering, scoring and interpreting scores are specified in detail so that the result should be comparable; and norms or average for different age or grade levels have been pre-determined. It requires more thinking, planning, exact preparation, scoring, analysis and refinement. It is a complex and multidimensional work.
Standardised tests are those tests
1.      which are constructed by individual or by a group of individuals
2.      are being processed and universalised for all the situations and for all the purposes.
3.      its content is carefully designed, carefully phrased and simultaneously pretested
4.      for all the situations inside and outside the educational institutions.
5.      Generally these tests are norm-referenced tests

A standardised test is one which passes through the following process:
(i) Standardisation of the content and questions:
Due weightage is given to the content and objectives. Items are to be prepared according to the blue-print. Relevant items are included and irrelevant items are omitted, giving due consideration to item difficulty and discriminating value. Internal consistency is also taken into account.
(ii) Standardisation of the method of administration:
Procedure of test administration, conditions for administration, time allowed for the test etc., are to be clearly stated.
(iii) Standardisation of the scoring procedure:
To ensure objective and uniform scoring, the adequate scoring key and detailed instruction for method of scoring is to be provided.
(iv) Standardisation of interpretation:
Adequate norms to be prepared to interpreted the results. Test is administered over a large sample (representative one). Test scores are interpreted with reference to norms. Derivation of norms is an integral part of the process of standardisation.

Characteristics of Standardised Tests:

1. They consist of items of high quality.
The items are pretested and selected on the basis of difficulty value, discrimination power, and relationship to clearly defined objectives in behavioural terms.
2. As the directions for administering, exact time limit, and scoring are precisely stated, any person can administer and score the test.
3. Norms, based on representative groups of individuals, are provided as an aid for interpreting the test scores. These norms are frequently based on age, grade, sex, etc.
4. The reliability and validity are established.
5. A manual is supplied that explains the purposes and uses of the test, describes briefly how it was constructed, provides specific directions for administering, scoring, and interpreting results, contains tables of norms and summarizes available research data on the test.
6. No two standardized tests are exactly alike. Each test measures certain specific aspects of behaviour and serves a slightly different purpose.
Thus, one has to be careful in selecting a standardised test.

Uses of Standardised Tests:

1. Standardised test assesses the rate of development of a student’s ability.
2. It checks and ascertains the validity of a teacher-made test.
3. These tests are useful in diagnosing the learning difficulties of the students.
4. It helps the teacher to know the casual factors of learning difficulties of the students.
5. Provides information’s for curriculum planning and to provide remedial coaching for educationally backward children.
6. It also helps the teacher to assess the effectiveness of his teaching and school instructional programmes.
7. Provides data for tracing an individual’s growth pattern over a period of years.
8. It helps for organising better guidance programmes.
9. Evaluates the influences of courses of study, teacher’s activities, teaching methods and other factors considered to be significant for educational practices.
Types of standardized tests
1-Achievement – tests of content knowledge or skills
2-Aptitude - tests which are used to predict future cognitive performance
3-Standards-based - criterion-referenced tests based on established standards
4-Domain-referenced
Standardized tests V/S  Informal Teacher-made tests.
  1. Standardized tests assess broad, general content while teacher-made tests tend to focus on specific objectives related to the instruction in a class
  2. Standardized tests are more technically sound than teacher-made tests.
  3. Standardized tests are administered in “standardized” manners while teacher-made tests tend to be administered informally
  4. Standardized tests are scored in consistent, reliable manners and produce sets of standard scores; teacher-made tests are scored in less reliable manners and generally are scored as the percentage of correct responses
Questionnaires
A questionnaire is an instrument containing statements designed to obtain a subject’s perceptions, attitudes, beliefs, values, opinions, or other non-cognitive traits
Personality inventories
Personality inventories are concerned with psychological orientation (i.e., general psychological adjustment) and Educational orientation (i.e., traits such as self-concept or self-esteem)
Attitudes, values, or interests
Attitudes, values, or interests are affective traits that indicate some degree of preference toward something.
Scales
Scales are  continuum that describes subject’s responses to a statement.
Likert Scales
Checklists
Ranked items
Observations
Interviews
Advantages
  • Establish rapport
  • Enhance motivation
  •  Clarify responses through additional questioning
  •  Capture the depth and richness of responses
  •  Allow for flexibility
  •  Reduce “no response” and/or “neutral” responses
Disadvantages
  •  Time consuming
  •  Expensive
  •  Small samples
  • Subjective
Scales of Measurement
Numbers can be grouped into 4 types or levels: nominal, ordinal, interval, and ratio.
Nominal
Not really a ‘scale’ because it does not scale objects along any dimension.
Nominal refers to quality more than quantity. A nominal level of measurement is simply a matter of distinguishing by name, e.g., 1 = male, 2 = female. Even though we are using the numbers 1 and 2, they do not denote quantity.
Ordinal
Ordinal refers to order in measurement. In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not have any meaning Ordinal refers to quantities that have a natural ordering. For example, we often using rating scales (Likert questions). This is also an easy one to remember, ordinal sounds like order.  An ordinal scale indicates direction, in addition to providing nominal information. Low/Medium/High; or Faster/Slower are examples of ordinal levels of measurement.” Many psychological scales or inventories are at the ordinal level of measurement.
An ordinal scale extends the information of a nominal scale to show order, i.e. that one unit has more of a certain characteristic than another unit. For example, an ordinal scale can be used
•             to rank job applicants from the best to the worst,
•             to categorise people according to their level of education, or
  • to measure people’s feelings about some matter using a measure like ‘strongly agree’, ‘agree’, ‘neutral’, ‘disagree’, ‘strongly disagree’
Interval
An interval scale is a scale on which equal intervals between objects, represent equal differences.
Interval scales provide information about order, and also possess equal intervals. Equal-interval scales of measurement can be devised for opinions and attitudes. Constructing them involves an understanding of mathematical and statistical principles.
Interval scales are not simply ordinal. They give a deeper meaning to order. An interval scale is a scale of measurement in which the magnitude of difference between measurements of any two units is meaningful. If weights are measured in kilograms (kg), then the difference in weights between two people whose weights are respectively 82 kg and 69 kg is the same as that between people whose respective weights are 64 kg and 51 kg. That is, the ‘intervals’ are the same (13 kg) and have the same meaning.
Ratio
A ratio scale is a special form of interval scale that has a true zero. For some interval scales, measurement ratios are not meaningful. For example, 40° C does not represent a temperature which has twice the heat of 20° C because the zero on the Celsius scale is arbitrary, and does not represent an absence of heat. However, when we consider the metric system for temperature (known as ‘degrees Kelvin’), then there is a true zero (called ‘absolute zero’). Therefore, a measure of 40K (i.e. 40 degrees Kelvin) is twice as hot as 20K.
SUBJECTIVE AND OBJECTIVE TESTS
Objective test: this is a test consisting of factual questions requiring extremely short answers that can be quickly and unambiguously scored by anyone with an answer key. They are tests that call for short answer which may consist of one word, a phrase or a sentence.
Subjective test: this is a type of test that is evaluated by giving opinion. They are more challenging and expensive to prepare, administer and evaluate correctly, though they can be more valid.
TYPES OF OBJECTIVE TEST ITEMS
They include the following:
I. True- false items
II. Matching items
III. Multiple choice items
IV. Completion items
1) True –false test items
Here, a factual statement is made and the learner is required to respond with either true or false depending on the correctness of the statement. They are easy to prepare, can be marked objectively and cover a wide range of topics
ADVANTAGES
 can test a large body of materia
 they are easy to score
DISADVANTAGES
 Difficult to construct questions that are definitely or unequivocally true or false.
 They are prone to guessing
2) MATCHING ITEMS
Involves connecting contents of one list to contents in another list. The learners are presented with two columns of items, for instance column A and column B to match content in both columns correctly.
Advantages:
a. Measures primarily associations and relationships as well as sequence of events.
b. Can be used to measure questions beginning with who, when, where and what
c. Relatively easy to construct
d. They are easy to score
Disadvantages:
 Difficult to construct effective questions that measure higher order thinking and contain a number of plausible distracters.
3) MULTIPLE CHOICE TEST ITEMS
In a multiple choice item, a statement of fact is made. It is followed by four or five alternative responses from which only the best or correct one must be selected. The statement or question is termed as ‘stem’. The alternatives or choices are termed as ‘options’ and the ‘key is the correct alternative. The other options are called ‘distracters’.
Advantages:
 Measures a variety of levels of learning.Ø
 They are easy to score.Ø
 Can be analyzed to yield a variety of statistics.Ø
 When well constructed, has proven to be an effective assessment tool.Ø
Disadvantages:
Difficult to construct effective questions that measure higher order of thinking and contain a number of plausible distracters.
4) COMPLETION ITEMS OR SHORT ANSWER TEST ITEMS
In this, learners are required to supply the words or figures which have been left out. They may be presented in the form of questions or phrases in which a learner is required to respond with a word or several statements.
Advantages:
• Relatively easy to construct.
• Can cover a wide range of content.
• Reduces guessing.
Disadvantages:
 Primarily used for lower levels of thinking.Ø
 Prone to ambiguity.Ø
 Must be constructed carefully so as not to provide too many clues to the correct answer.Ø
 Scoring is dependent on the judgment of the evaluator.

An intelligence quotient, or IQ, is a score designed to assess intelligence. The term "IQ," from the German Intelligenz-Quotient, was devised by the German psychologist William Stern in 1912 as a proposed method of scoring children's intelligence tests such as those developed by Alfred Binet and Théodore Simon in the early 20th Century.
IQ scores have been shown to be associated with such factors as morbidity and mortality, parental social status, parental IQ. as predictors of educational achievement , as predictors of job performance and income.

Types of Intelligence Tests:

Verbal or Language Tests:
In these the subjects make use of language in which the instructions are given in words, written, oral
The test content is designed with verbal material which may include varieties of items like:
a. Vocabulary tests:
In these the subject is required to give the meanings of words or phrases.
b. Memory tests:
These are designed to test the subjects immediate and long-term memory and include all recall and recognition type of items like telephone number, vehicle number, teachers, names, etc.
c. Comprehension tests:
By means of these, the subject is tested for the ability to grasp, understand and react to a given situation.
d. Information tests:
The subject is tested on his knowledge about the things around him by means of these tests.
e. Reasoning tests:
In these tests the subject is asked to provide answers which demonstrate his ability to reason logically, analytically, systematically, inductively or deductively as, for example 1, 2, 4, 7, 11, 16, 22, 29, ….
f. Association tests:
Through these test items the subject is tested for his ability to point out the similarities or dissimilarities between two or more concepts or objects.
Non-Verbal and Non-Language Tests:
These tests involve activities in which the use of language is not necessary. Performance tests are the typical examples for these type of tests. Here the individual is tested through material objects, where he is instructed orally and the reactions of the person is assessed with respect to the individual’s approach towards the work. Then needed directions are provided to him.
Individual Verbal Intelligence Tests:
Tests involving the use of language are administered to one individual at a time, e.g. the Stanford Binet scale, individual performance tests, Arthur point scale, Bhatia’s battery of performance test.
Group Verbal Intelligence Tests:
The tests which necessitate the use of language and are applied to a group of individuals at a time. For example,
1. Army alpha test (developed during World War I)
2. Army general classification Test (World War II).
Popular Indian tests of this nature are:
a. Group tests of intelligence prepared by Bureau of Psychology, Allahabad (Hindi).
b. Samuhik Budhi Pariksha (Hindi) prepared by PL Shrimali, Vidya Bhavan GS Teacher College, Udaipur.
Group Non-Verbal Intelligence Tests:
These tests do not necessitate the use of language and are applicable to a group of individuals at a time. The difference between performance tests (used for an individual) and non-verbal tests (used for a group) is one of the degree as far as their non-verbal nature is concerned.
The individual performance tests require the manipulation by the subject of concrete objects or materials supplied in the test. The responses are purely motor in character and seldom requires the use of paper and pencil by the testee.

Types of Intelligence Tests:

Intelligence tests may be classified under three categories:
1. Individual Tests:
These tests are administered to one individual at a time. These cover age group from 2 years to 18 years.
These are:
(a) The Binet- Simon Tests,
(b) Revised Tests by Terman,
(c) Mental Scholastic Tests of Burt, and
(d) Wechsler Test.
2. Group Tests:
Group tests are administered to a group of people Group tests had their birth in America – when the intelligence of the recruits who joined the army in the First World War was to be calculated.
These are:
(a) The Army Alpha and Beta Test,
(b) Terman’s Group Tests, and
(c) Otis Self- Administrative Tests.
Among the group tests there are two types:
(i) Verbal, and
(ii) Non-Verbal.
Verbal tests are those which require the use of language to answer the test items.
3. Performance:
These tests are administered to the illiterate persons. These tests generally involve the construction of certain patterns or solving problems in terms of concrete material.
Some of the famous tests are:
(a) Koh’s Block Design Test,
(b) The Cube Construction Tests, and
(c) The Pass along Tests.
Types of Intelligence Tests
1. Individual Tests:
The first tests that were prepared were individual. Binet’s test was individual, and so was Terman-Merril Stanford Revision. Individual tests are most reliable but these consume more time and energy. These are, however, useful in making case-studies or individual studies of behaviour problems or backwardness.
The child has to read the question or listen to the question and answer in language.
But suppose the child is not fully conversant with the language of the examiner, or he is illiterate, non-verbal or performance tests have been prepared. Here the tasks set up require the child to do ‘something’ rather than reply a question.
The child may, for instance, fit in a wooden board with depressions in some geometrical forms, some wooden shapes like triangles or rectangles or circles. He may put some cubes in descending or ascending order of size. He may assemble certain disintegrated parts to form full designs or pictures. No language is used here. Instructions also can be had through demonstration or action.
A number of performance tests have been prepared. The most important are:
1.      Alexander’s Pass-a-long test.
2.      Koh’s Block Design test.
3.      Weschlers Performance Test.
4.      Terman and Merill’s Performance Test.
5.      Kent’s Performance Test.
Kent’s test is used for clinical purposes. It consists of five oral tests and seven written tests, each requiring one minute.
Individual performance tests have the disadvantage that these take a lot of time. Their reliability is also questioned on the ground that temporary response sets or work habits may play a major role in determining score.
Again, the intelligence measured by performance tests is not quite the same as tested by Binet and others.
2. Group Tests:
These are more helpful as these deal with large masses of subjects such as in schools, industry, army and public. These are reliable and have high predictive validity, and can be compared favourably with individual tests.
The Army Alpha and Army Beta test were the most prominent tests
characteristics of group tests:
(i) Most of the group-tests have been standardised,
(ii) Most of the test items in group verbal tests are linguistic in character.
(iii) Some group verbal tests have been used in measuring scholastic aptitude
(iv) These are convenient in administration and scoring.
3. Comparison of Individual and Group Test:
Comparison Individual and Group Test
Comparison between Binet Test and Wechsler Scale

4. Performance Tests:
The importance of non-verbal or performance was discussed above.
Non-verbal tests include such items as:
(i) Relationship of figures, which may be either (a) functional or (b) spatial.
(ii) Drawing figures, especially human figures,
(iii) Completing pictures and patterns.
(iv) Analysing space relationship from diagrams
(v) Analysing cube relationship.
(vi) Drawing lines through figures to break them up into given section, as in Minnesota paper form board test.
(vii) Mechanical relationship, tracing relationship of interlocking gears-pulleys, shown in pictorial form.
(viii) Memory for design.
The following tests are examples where actual handling is needed:
(i) Assembly of objects from their disconnected pans
(ii) Kolhi’s Block Design,
(iii) Picture completion,
(iv) Cube construction,
(v) Form board paper pencil,
(vi) Pass along test,
(vii) Picture arrangement,
(viii) Mazes, and 
(ix) Cube imitation (tapping).
Progressive Matrices prepared by G.C. Raven at Dumfries are one of the widely used paper-pencil group performance tests. 
Advantages:
Performance tests have the following advantages:
(i) These are generally useful for measuring specific abilities include deaf persons, language difficulties, educationally backward, and those who are discouraged in verbal talks.
(ii) These are highly useful in vocational and educational guidance.
(iii) For the study of pre-school children, who have not begun reading and writing
(iv) These are useful for clinical purposes, for testing neurotics and mentally defective (or feeble-minded).
(v) These are useful for adults over 30, who have lost interest in numbers and words.
(vi) Performance tests are culture-free.



Limitations:
(i)                 Some test items do not have connection with life situations.
(ii)               Some call for speed rather the solution of problems.
(iii)             Enough of emphasis is not given to item difficulty.
(iv)             Performance tests do not measure exactly what Binet’s tests measure- reasoning, judgment and imagination.
(v)               Most of these tests do not require above-average thinking, so these are not suitable for higher levels.
(vi)             There are variations in the utility of different tests. Picture completion tests may suffer from poor material. Maze tests require continual adaptation and planning. Form-board tests tend to depend upon speed.
(vii)           performance tests are not so reliable. A battery of tests is needed, which makes the task mere complex.
(viii)         expensive

Uses of Intelligence Test

a. Use in selection:
Results of intelligence tests can be used for selection of suitable candidates for training in educational and professional skills
b. Use in classification:
Intelligence tests help in classifying individuals as backward, average, bright or gifted, and thus arrange for homogenous grouping to provide proper educational opportunities.
c. Use in assessment for promotion:
can be successfully used for promotion of students to the next higher grades of classes.
d. Use in provision of guidance:
in providing training to teachers and for personnel guidance.
e. Use for improving the learning process:
helpful to teachers to plan the teaching-learning skills.
f. Use for diagnosis:
to diagnose, distinguish and discriminate the differences in the mental functioning of individuals.
g. Use in research work:
The intelligence tests can be used in carrying out research in the field of education, psychology and sociology with different age groups for generalization.
h. For Determining the optimum level of work:
The mental age gives the mental level at which a child can be expected to work most efficiently in academic subjects.
i. Estimating the range of abilities in a class:
j. Determining the level of ability:
k. Measuring special abilities:
l. Predicting success in particular Academic Subjects:
Readiness and prognoses tests have been designed to give a high prediction of success in specific subjects, and provide useful basis for the selection of courses.
m. Diagnosing Subject-Matter Difficulties:
It gives the teacher information about the areas in which the child needs more training.
ATTITUDE
Perhaps the most straightforward way of finding out about someone’s attitudes would be to ask them. However, attitudes are related to self-image and social acceptance (i.e. attitude functions).
Attitude measurement can be divided into two basic categories
  • Direct Measurement (likert scale and semantic differential)
  • Indirect Measurement (projective techniques)

Evaluation of Direct Methods

An attitude scale is designed to provide a valid, or accurate, measure of an individual’s social attitude.  However, as anyone who has every “faked” an attitude scales knows there are shortcomings in these self report scales of attitudes.  There are various problems that affect the validity of attitude scales.  However, the most common problem is that of social desirability.
Socially desirability refers to the tendency for people to give “socially desirable” to the questionnaire items. 

Projective Techniques

projective test is involves presenting a person with an ambiguous (i.e. unclear) or incomplete stimulus (e.g. picture or words). The stimulus requires interpretation from the person.  Therefore, the person’s attitude is inferred from their interpretation of the ambiguous or incomplete stimulus.
The assumption about these measures of attitudes it that the person will “project” his or her views, opinions or attitudes into the ambiguous situation, thus revealing the attitudes the person holds.  However, indirect methods only provide general information and do not offer a precise measurement of attitude strength since it is qualitative rather than quantitative. This method of attitude measurement is not objective or scientific which is a big criticism.
Examples of projective techniques include:
• Rorschach Inkblot Test
• Thematic Apperception Test (or TAT)
• Draw a Person Task

Thematic Apperception Test

Thematic Apperception Test TAT
Here a person is presented with an ambiguous picture which they have to interpret.
The thematic apperception test (TAT) taps into a person’s unconscious mind to reveal the repressed aspects of their personality. Although the picture, illustration, drawing or cartoon that is used must be interesting enough to encourage discussion, it should be vague enough not to immediately give away what the project is about.
TAT can be used in a variety of ways, from eliciting qualities associated with different products to perceptions about the kind of people that might use certain products or services.
The person must look at the picture(s) and tell a story. For example:
o What has led up to the event shown
o What is happening at the moment
o What the characters are thinking and feeling, and
o What the outcome of the story was

Draw a Person Test

Figure drawings are projective diagnostic techniques in which an individual is instructed to draw a person, an object, or a situation so that cognitive, interpersonal, or psychological functioning can be assessed.  The test can be used to evaluate children and adolescents for a variety of purposes (e.g. self-image, family relationships, cognitive ability and personality).
A projective test is one in which a test taker responds to or provides ambiguous, abstract, or unstructured stimuli, often in the form of pictures or drawings.
In these tests, there is a consideration of how well a child draws and the content of a child's drawing.  In some tests, the child's self-image is considered through the use of the drawings.
In other figure drawing tests, interpersonal relationships are assessed by having the child draw a family or some other situation in which more than one person is present. Some tests are used for the evaluation of child abuse.  Other tests involve personality interpretation through drawings of objects, such as a tree or a house, as well as people.
Finally, some figure drawing tests are used as part of the diagnostic procedure for specific types of psychological or neuropsychological impairment, such as central nervous system dysfunction or mental retardation.
Despite the flexibility in administration and interpretation of figure drawings, these tests require skilled and trained administrators familiar with both the theory behind the tests and the structure of the tests themselves.  Interpretations should be made with caution and the limitations of projective tests should be considered.

Evaluation of Indirect Methods

The major criticism of indirect methods is their lack of objectivity.
Such methods are unscientific and do not objectively measure attitudes in the same way as a Likert scale.
There is also the ethical problem of deception as often the person does not know that their attitude is actually being studied when using indirect methods.
The advantages of such indirect techniques of attitude measurement are that they are less likely to produce socially desirable responses, the person is unlikely to guess what is being measured and behavior should be natural and reliable.
Aptitude test, examination that attempts to determine and measure a person’s ability to acquire, through future training, some specific set of skills (intellectual, motor, and so on). The tests assume that people differ in their special abilities and that these differences can be useful in predicting future achievements.
General, or multiple, aptitude tests are similar to intelligence tests in that they measure a broad spectrum of abilities (e.g., verbal comprehension, general reasoning, numerical operations, perceptual speed, or mechanical knowledge).
Aptitude tests also have been developed to measure professional potential
The Differential Aptitude Test (DAT) measures specific abilities such as clerical speed and mechanical reasoning as well as general academic ability.
An aptitude test is designed to assess what a person is capable of doing or to predict what a person is able to learn or do given the right education and instruction. It represents a person's level of competency to perform a certain type of task. Such aptitude tests are often used to assess academic potential or career suitability. Such tests may be used to assess either mental or physical talent in a variety of domains.

A Few Examples of Aptitude Tests

  • A test assessing an individual's aptitude to become a fighter pilot
  • A career test evaluating a person's capability to work as an air traffic controller
  • An aptitude test is given to high school students to determine which type of careers they might be good at
  • A computer programming test to determine how a job candidate might solve different hypothetical problems 
  • A test designed to test a person's physical abilities needed for a particular job such as a police officer or firefighter

Meaning of Interest:

An interest is a subjective attitude motivating a person to perform a certain task. It affords pleasure and satisfaction. It results in curiosity towards the object of interest, enthusiasm to be attached to the object, strength of will to face difficulties while engaged in the task of one’s interest, a definite change in behaviour in the presence of the object characterised by attention and concentration.
Definitions of interest
Jones states, “Interest is a feeling of likening associated with a reaction, either actual or imagined to a specific thing or situation.”
Bingham defines: “Interest is a tendency to become absorbed in an experience and to continue it, while an aversion is a tendency to turn away from it to something else.”

Types of Interest:

Jones mentions two distinct types of interests- extrinsic and intrinsic.
The former are pleasurable emotions connected with a purpose or goal of an activity. It may involve fame, name, money, victory or such external motives of conduct.
But the latter are connected with the activity itself, being basic and real attraction without any external motive, This intrinsic interest is continuous and permanent, even if the immediate goal is reached. The extrinsic interest, dies as soon as the goal is reached.
Super and some other guidance experts have classified interests into:
(i) Expressed interest,
 (ii) Manifest interest, and
(iii) Measured interest.
In the expressed interest the person expresses his personal likings through such sentences as ‘I love sports’. Although, it is the first source of knowing the interest of a person yet much reliance cannot be based on it, as such expressions like permanency and are prone to vary from time to time depending upon the maturity of the person.
Manifest interest is the interest that is not expressed but observed by others while the person is engaged and absorbed in an activity. Newton forgot his meals while engaged in scientific experiments.
The measured interest is the estimate and account of a person’s interest as revealed by some psychological tests or interest inventories.

Types of Tools for Measuring Interest:

The tools for measurement of interest are of two types – formal and informal.
The formal methods are specialised and standardised measuring instruments such as interest inventories, interest test batteries.
The informal methods include the person’s own statement, a record of his activities and observation by the parents and the teachers. The former i.e., the informal methods are usually supplemented by the informal methods.
Three notable formal methods universally employed are:
1. Strong Vocational Interest Blank,
2. Kuder Preference Record, and
3. Thustone’s Vocational Interest Schedule.
1. Strong Vocational Interest Blank:
Prof. Strong of Stanford University California designed and standardised this check list. The check list contains 400 separate items. It is presented to the individual and he is simply asked to indicate whether he likes, dislikes or is indifferent, on a three point scale.
The test reveals the interest maturity of the individual, his masculinity and of femininity, and his occupational level. The 400 items include 100 occupations, 49 recreations, 36 school subjects, 48 activities and 47 peculiar interests. As such it is useful for both educational and vocational guidance.
2. Kuder Preference Record:
This has been prepared by G. Frederic Kuder. This test covers a wider field, comprising of nine separate scales of occupations, viz. mechanical, computational, scientific, persuasive, artistic, literary, musical, social and clerical. Kuder presupposes three major interests viz. mechanical, literary and artistic. So when the same task is presented to the subject, with three related activities, the subject will select the activity that relates one of the three interests that he possesses.
For instance, three choices are given about one item viz. building a bird house, writing articles about birds and drawing sketches about birds. If the subject opts for the first, his interest is mechanical.
 Another example is presented.
The subject is asked to select the activity that he would prefer the most, and the activity he would prefer the least out of the following three:
(i) Visit an art gallery.
(ii) Browse in a library.
(iii) Visit a museum.
A triple activity regarding collections is:
(i) Collect autographs.
(ii) Collect coins.
(iii) Collect butterflies.
A detailed scoring system is employed for analysis and interpretation. A percentile of 75 or above is considered significantly high. If a person goes beyond P 75 in any of the areas, all the occupations in that area are attractive for him.
3. Thurston’s Vocational Interest Schedule:
This test has been devised by Thurstone. He administered a comprehensive test to 3400 college students who expressed their Likeness (L). Indifference (I) and Dislike (D) to each of the items in the test.
He analysed the test scores and through the techniques of factor analysis, arrived at 8 factors of interest viz.;
(i) Commercial Interest,
(ii) Legal,
(iii) Athletic,
(iv) Academic,
(v) Descriptive,
(vi) Biological,
(vii) Physical Science,
(viii) Art.

Limitation of Interest Inventories:

1. Some of the tests reveal ability rather than interest. But interest is not the same thing as ability. So some tests are not fully valid or reliable.
2. The tests presuppose that the subject possesses a particular interest. But it can reveal the interest that is present at the time of test, and not afterwards. The interests revealed may not remain permanent. Moreover the interests are cultivable also. At the time of testing a particular interest may not have developed fully, but it may develop afterwards. It has been seen that some interests develop during the vocation.
3. The interest inventories reveal facts on the basis of the report given by the subject. The accuracy of the report is still a problem. Some people do not reveal facts.
4. The questions in the inventories deal with certain types of activities, and not all these lead to clear-cut vocations. Again, there is much overlapping between one activity and another. An occupation is not one interest but a combination of activities or interests.
5. The predictictive side of the inventories have also been tested. On investigation Proctor found that these have 25% permanence in school studies. Strong finds correlation with future vocation as 0.75, i.e., less + 1.
Inspite of the above limitations, Interest Inventories are very useful in determining the future trends of the individual’s vocational life.
ACHIEVEMENT TESTS
The achievement tests that most people are familiar with are the standard exams taken by every student in school. Students are regularly expected to demonstrate their learning and proficiency in a variety of subjects. In most cases, certain scores on these achievement tests are needed in order to pass a class or continue on to the next grade level.

Examples of Achievement Tests

  • A math exam covering the latest chapter in your book
  • A test in your social psychology class
  • A comprehensive final in your Spanish class
  • A skills demonstration in your martial arts class
Each of these tests is designed to assess how much you know at a specific point in time about a certain topic. Achievement tests are not used to determine what you are capable of; they are designed to evaluate what you know and your level of skill at the given moment.
Achievement tests are widely used in a number of domains, both academic- and career-related. Students face an array of achievement tests almost every day. Such tests allow educators and parents to assess how their kids are doing in school, but also provide feedback to students on their own performance.

When Are Achievement Tests Used?

Achievement tests are often used in educational and training settings. In schools, for example, achievements tests are frequently used to determine the level of education for which students might be prepared. Students might take such a test to determine if they are ready to enter into a particular grade level or if they are ready to pass of a particular subject or grade level and move on to the next.
Each grade level has certain educational expectations, and testing is used to determine if schools, teachers, and students are meeting those standards.

Measuring Socioeconomic Status and Subjective Social Status

One objective of the Stop Skipping Class campaign is to provide best practices for measuring socioeconomic status (SES) and subjective social status (SSS). 
An important determinant of the approach you will use to measure SES and SSS is the level at which you plan to assess its effects — the societal level, the community or neighborhood level, or the individual level.
Education can be measured using continuous variables (e.g., highest year of school completed) or categorical variables (e.g., 1-6 scale indicating the highest grade completed). Higher levels of education are often associated with better economic outcomes, as well as the expansion of social resources

Income

Income can be measured in a variety of ways, including family income, assessments of wealth and subjective assessments of economic pressure. At the neighborhood and societal level, federal poverty thresholds, supplemental poverty measures and school and neighborhood level indicators of poverty can be assessed. Lack of income has been found to be related to poorer health, mainly due to reduced access to goods and services (such as health care) that can be beneficial to health

Occupation

Occupation can be assessed by asking participants to note their current or most recent occupation or job title, or to indicate their occupational category from a list. Aside from financial benefits, employment can improve one's physical and mental health and expand social networks. However, the nature of lower SES positions can undermine these benefits, as the job itself may be hazardous or monotonous 

No comments:

Post a Comment