Saturday 4 June 2011

TYPE OF TEST ITEMS


A written test composed of a number of test items.  Items differ on the basis of the type of responses they involve.  There are items in which responses are free.  They are called free response type test items. Here respondents have freedom to respond in his own way.  In fixed response type test items he does not have this freedom.  There are different types of test items coming under these two categories.  Generally there are three types of test items viz. objective type items, short answer type items and essay type items.
1.  Objective Type items
            An objective type test item is one in which the response and hence the scoring will be objective.  In order to ensure such objectivity, the responses are made fixed whereby the freedom of the respondent to deviate subjectively is restricted.  Objective type test items can be broadly classified into
                                 i.        Supply type (Recall type)
                               ii.        Selection type (Recognition type)
For supply type test items the respondents have to supply the response where as for selection type they have to select the responses among the given responses.  The generally used objective type items are true-false items, multiple choice items, matching type and completion type.  Of these completion type items are supply type while the remaining belongs to selection type.
a)  True-false items: In this form, the respondent is asked to read a statement and indicate in some specific manner suggested, whether it is true or false. When a number of such items are included in a test, some of them may be true and the remaining ones are false, arranged in random order.
It is advisable that avoid ambiguity in the statement as much as possible. For this avoid words like all, never, always, frequently, in most cases, etc. Another point should be kept in mind that uses positive statements as far as possible. It is also desirable to make the given statements in a set of true-false items almost equal in length. Likewise, the number of true items and false items should be balanced.
The greatest drawback of true-false items is that the examinee guesses and even when one does not possess the knowledge about the subject matter. He gets ½possibiolity to get it right.
b) Multiple Choice items: Multiple choice item has been designed t reduce the limitation of true-false item to the minimum.  In this type, a question had to be answered by the respondent by selecting the correct response out of the alternative responses provided for.  If carefully prepared, such items can be used to appraise even the most complex behavoural changes associated with instructional objectives.
            A multiple choice item has two parts- the stem and the options.  The stem presented in the form of an incomplete sentence or an interrogative sentence. The number of options may be generally four or five, out of which one is the correct answer and hence it is called keyed response and the all other options are called distractors.
            The point should be kept in mind that the distractors should be meaningfully connected in the sense that they really distract the thoughts and mode of reasoning of the respondent. Another point is that the stem should be short and at the same time, reflect the real problem in precise manner.  Avoid unnecessary adjectives and complications that might confuse the examinee.  It is advisable to avoid negative statement, as far as possible, as stems.
c) Matching type items: This is a modified form of multiple choice items.  In fact matching type comprises an economical from of combining a number of multiple choice items in the same question.  This is more appropriate where the options for a number of stems are mutually related and sometimes repeating.  In this type the student has to associate a word or a phrase given in another column containing options.
d) Completion type items: A test item which requires the examinee to fill-in the blank is called a completion test item.  He has to supply the word or phrase omitted. Care should be taken to avoid ambiguous statements.  The answer should be only a key word rather than unimportant details.
Advantages of Objective Type Items
  1. It will ensure the objectivity of the test.
  2. It will ensure the coverage of the content area.
  3. It will reduce the language expertise, speed in writing good handwriting, neatness, etc. of the learner.
  4. It promotes economy of time, and ensures the practicability of the test.
  5. It will ensure the reliability of the test.
  6. It will ensure the validity of the test.
  7. A well prepared objective type item will be able to evaluate the higher order objectives also.
Limitations of Objective Type Items
  1. It is very much difficult to prepare a good objective type test item. It demands very much time, talent and imagination of the examiner.
  2. It creates a problem of guessing. 
  3. It creates an opportunity to cheating (copy answers from other scripts).
  4. This type of test item is not able to check the depth of the knowledge of the learner.
  5. It is very much difficult to test complicated skills of the examiner with objective type items.
  6. It demands high printing cost.
2. Short Answer type items
            A question having two or three value points at the most may be regarded as a short answer type test item.  The term value point indicate that a point to be given credit in the expected answer. 
Advantages
  1. A relatively large proportion can be covered in a test.
  2. It is easy to construct.
  3. It provides little opportunity for guessing because the examinee demands to supply specific information.
  4. If proper care gave at the time of preparing the items, objectivity can be ensured to a certain extend.
Limitations
  1. It is more subjective than objective type test item.
  2. Excessive use of this type test item may encourage the rote memorization of the learners.
  3. Mechanical scoring is not possible.



3. Essay type items
            This is the conventional type of question calling for a rather long answer covering a number of pints and a variety of objectives.  This type test items will help in evaluating complex skills and other behavioural pattern.
Advantages
  1. It is easy to prepare.
  2. The method of administration of test item is rather simple.
  3. It is useful to measure complex skills.
  4. It is very much useful to evaluate the depth of knowledge of the examinee.
  5. The possibility of guess work and spot copying can be reduced to the maximum.
  6. Comprehensiveness of the subject matter is ensured.
Limitations
  1. They are very much subjective in nature. Mark allotment will vary widely.
  2. It is very difficult to ensure the comprehensiveness of the content area.  It is very difficult to prepare essay type items from every content area.
  3. Linguistic ability, speed in writing, handwriting ability, neatness, etc. of the examinee will influence very much in this type of test items.
  4. It is very difficult to administer every time.
  5. Proper evaluation of the specific abilities is very difficult.
  6. It is very difficult to ensure the reliability and validity of these type of test items.


Reliability



Reliability of a tool refers to the degree of consistency and accuracy with which it measures what it is intended to measure.  If the evaluation gives more or less the same result every time it is used, such evaluation is said to be reliable. Consistency of a tool can be improved by limiting subjectivity of all kinds.  Making items on the basis of pre-determined specific objectives, ensuring that the expected answers are definite and objective, providing clearly spelt-out scheme for scoring and conducting evaluation under identical and ideal condition will help in enhancing reliability.
            Reliability is expressed numerically, usually as a coefficient. A high coefficient indicates high reliability and vice versa. If a test were perfectly reliable, the co-efficient will be 1. However, perfect reliability is impossible in practical situations.
Following are the most commonly used methods for assessing the reliability of a test. 
1.      The Test- Retest method
In this method the same test is re-administered shortly after the first administration and the two sets of scores are correlated to obtain the reliability of the test.  The chief disadvantage of this method is that if the time interval between the two administrations of the test is short, the immediate memory affects, practice and confidence induced by familiarity with the test material may over estimate the reliability of the test.  On the other hand, if the time interval is long, the real changes in terms of growth may under estimate the reliability of the test. Test re-test method is generally less useful than the other methods. The procedure for determining test-retest method is basically very simple:
a.       Administer the test to an appropriate group.
b.      After some time has passed, say a week, administer the same test to the same group.
c.       Correlate the two sets of scores.
d.      Evaluate the results.
2.         The Equivalent/ Alternative/ Parallel Form Method
In this method two equivalent or parallel test are prepared and administered to the same group of subjects and the results in terms of two sets of the test scores are correlated to obtain the reliability of the test.
Here the two sets of the tests are identical in every respect such as they measure the same objectives, form the same content area, same number of items, same structure, same difficulty level and same directions for administration and scoring except the nature of the actual items.
Care should be taken to match test materials for content, objectives, difficulty and form and precautions must be taken not to have the items in the two forms too similar.
Even though it is too difficult to prepare two sets of test exactly in parallel in all its sense, it is one of the widely used methods to ensure the reliability.
Following procedure is adopted for the administration of this method:
a.       Prepare two sets of test with same design and blue print.
b.      Administer one form of the test to an appropriate group.
c.       At the same session, or shortly thereafter, administer the second test to the same group.
d.      Correlate the two sets of scores.
e.       Evaluate the results.
3.      The Split half method
In this method, the test is divided into two equivalent halves and it administered simultaneously to the same group. Then the scores of the half test are converted into the scores of the whole test by using Spearman- Brown Prophecy formula and then the two scores are correlated.




Spearman –Brown Prophecy formula   

The items of the test can be divided into two sets in a variety of ways.  This method of reliability measures the internal reliability of the test and if the two halves do not correlate highly it suggest that they are not measuring the same thing.
The procedure for determining reliability under split-half method is as follows:
a.       Split the test into two equivalent halves according to any approved or logical order
b.      Administer the test simultaneously into the same group ( most common approach is to divide the test as odd and even)
c.       Compute each subjects scores (two scores) separately
d.      Correlate the two sets of scores.
e.       Convert the coefficient of the half test into that of the whole test by using the formula.
f.        Evaluate the result

Characteristics of a Good Evaluation Tool


1.  Objective-basedness:  Evaluation is making judegement about some phenomena or performance on the basis of some pre-determined objectives.  Therefore a tool meant for evaluation should measure attainment in terms of criteria determined by instructional objectives.  This is possible only if the evaluator is definite about the objectives, the degree of realization of which he is going to evaluate. Therefore each item of the tool should represent an objective.
2.  Comprehensiveness: A tool should cover all pints expected to be learnt by the pupils.  It should also cover all the pre-determined objectives.  This is referred to be comprehensiveness.
3.  Discriminating power: A good evaluation tool should be able to discriminate the respondents on the basis of the phenomena measured.  Hence while constructing a tool for evaluation, the discrimination power has to be taken care of. This may be at two levels- first for the test as a whole and then for each item included.   
4.  Reliability: Reliability of a tool refers to the degree of consistency and accuracy with which it measures what it is intended to measure.  If the evaluation gives more or less the same result every time it is used, such evaluation is said to be reliable. Consistency of a tool can be improved by limiting subjectivity of all kinds.  Making items on the basis of pre-determined specific objectives, ensuring that the expected answers are definite and objective, providing clearly spelt-out scheme for scoring and conducting evaluation under identical and ideal condition will help in enhancing reliability. Test-retest method, split-half method and equivalent form or parallel form method are the important methods generally used to determine the reliability of a tool.
5.  Validity: Validity is the most important quality needed for an evaluation tool.  If the tool is able to measure what it is intended to measure, it can be said that the tool is valid.  It should fulfill the objectives for which it is developed.  Validity can be defined as “ the accuracy with which it measures what it is intended to measure or as the degree in which it approaches infallibility in measuring what it purports to measure Content validity, predictive validity, construct validity, concurrent validity, congruent validity, factorial validity, criterion-related validity, etc. are some of the important types of validity which is needs to fulfill by a tool for evaluation.
6.  Objectivity: A tool is said to be objective if it is free from personal bias of interpreting its scope as well as in scoring the responses. Objectivity is one of the most primary pre-requisites required for maintaining all other qualities of a good too.
7.  Practicability: A tool, however, well it satisfies all the above criteria, may be useless unless it is not practically feasible. For example, suppose, in order to ensure comprehensiveness, it was felt that thousand items should be given to be answered in ten hours.  This may yield valid result, but from practical point of view it s quite impossible.

TOOLS FOR EVALUATION


1.  Check-list: It is a two dimensional chart, on one side the trait or other phenomena which is intend to measured are noted and on the other name or identification number of the student is written. Observation made by the teacher with predetermined objectives both from within and outside the class. The results of the observations are recording against each of the behaviour noted in case of particular student.
2.  Rating Scale: It is a modified check-list.  In check list, the variables observed are merely checked as either yes or no.  There is no provision for determining the degree to which that variable exists.  This limitation is overcome here.  Rating means the judging of one person by another.  Rating scale refers to a scale with a set of points which describe varying degrees of the dimensions of the attribute which is intended to measure being observed.  It shows how much or how well or a particular behaviour is exhibited by the pupil. Such precise rating will help the teacher to select most appropriate students who can be entrusted with responsible tasks and also to record the levels at the time of certification. It may be three point scale or five point scale or seven point scale or nine point scale or even eleven point scale.
3.  Inventory: Inventory is a tool usually used to assess personality traits, interests, attitude, etc. through responses of the subjects evaluated, to questions or statements depicting opinions to the issues concerned. It is a self appraising tool. It may be in the form of a questionnaire, an opinionnaire or an attitude scale. (A questionnaire is used when factual information is desired.  When opinions rather than facts are desired, an opinionnaire or attitude scale is used.) In certain cases, the respondent is asked to arrange a set of given items in the order of preference. These type of inventories normally used to determine the interests of the individual in a specific area.
4.  Test: It is the most popular tool for evaluation.  It is the most important tool for the class room teacher. Test is defined as a series of questions on the basis of which some information are sought.  It may be constructed for the purpose of measuring the achievement of the pupil or to identify his weakness or to measure his intelligence, aptitude, attitude and so on.
Test may be practical or performance test, oral test and written test.
            Performance test include activities involving psychomotor behaviour.  In such a test, the examinees are generally required to do something, using objects or apparatus given to them.  Setting up an apparatus, conducing an experiment, developing a pattern using geometrical shapes, arranging given objects according a required sequence of pattern, etc. may be cited as examples. Over and above these performance tests are used for measuring intelligence, interests, personality traits, etc. also.
            In Oral tests the examiner asks questions or test items prepared in advance, demanding explanation by the examinee to which they respond orally.  Measurement of the desired objective is done on the basis of these oral responses.  There are so many skills that can be assessed by such tools.  The ability to read with speed, accuracy and good pronunciation can be measured only orally.  Same is the case of speaking with correct stress, accent, etc. These can never be tested effectively by written tests.
            Written test is obviously the most popular type of test universally used for evaluating students.  This is because of its convenience and economy.  The script can be scored and evaluated according to the convenience.  It is because of the practical considerations that written tests have become so popular. 

PURPOSE OF EDUCATIONAL EVALUATION


In the realm of education evaluation has to serve a number of functions. Following are some of the important among them:
1.     Assessment Function: Developmental education aims at maximizing the output in the form of total and transferable development of the learner.  Evaluation will have ultimately to assess the final performance of the learner, the value judgement being made in terms of the quantity and quality of the total attainment with respect to a specific curriculum area.
2.     Diagnosis Function: Development is dynamic as it is a progressive movement towards the desired goal. The goal can be reached only if the progress is checked and monitored at every stage so that immediate goals are appropriately realized which in turn will ensure the realization of the final goal.  This verification of the progress is attempted by immediate feedback leading to knowledge of results and is intended to identify or detect a variety of factors. On the basis of this detection, difficulties can be spot out and remedial measures can be taken to overcome these difficulties.
3.     Placement Function: Evaluation may conduct to determine whether the student can be promoted to a further stage, based on the realization o of the expected level.  This function of the final evaluation is called the placement function.
4.     Prediction Function: Another important function of evaluation is to predict whether the student is capable to undertake the anticipated objective or not.  If a student succeeds in the evaluation it can be predict that he will be a successful candidate. On the other hand, if a student is found to be too poor, it can be predict that he will be a failure.

TYPES OF EVALUATION


Formative Vs Summative Evaluation
            The content to be taught is presented in the form of small points at the time of teaching with an intention to facilitate easy assimilation. At the end of each of such item, learners have to be evaluated with respect to the anticipated objectives.  Evaluation of this type is meant for maximizing the output by giving immediate feedback. This will ensure the diagnosis and remediation, if any. This type of evaluation is developmental or formative in nature and hence is often designated as formative evaluation
            The success of the working of a system is decided by how far the expected outcome could be realized.  IT is this function of evaluating the final outcome that is served by summative evaluation.   As the term indicated summative evaluation is done at the end of something attempted.  It may be conducted at the end of a unit or at the end of a term or at the end of an academic year and so on.



Internal Vs External Evaluation
            Value judgement of the performance of the learners may be done by internal or external agencies. Internal evaluation of pupils’ performance refers to the evaluation made by the individuals or institutions that is responsible for instruction or curriculum transaction. Class test, term end evaluation, etc. are the examples of internal evaluation.  On the other hand external evaluation is the evaluation done by an external agency at the end of a course for the purpose of certification, placement, etc.  S.S.L.C. examination and Higher Secondary examination, etc. are the examples for external evaluation.

Criterion referenced Vs Norm referenced Evaluation
            Sometimes value judgement should be made with predetermined and well defined specific goals or criteria.  The objective of such types of evaluation is to check whether the criteria attained or not. This type of evaluation done on the basis of selected criteria indicating specific changes brought about in the learner that is said to be criterion referenced evaluation. Here the importance is given to mastery learning.
            Whenever the attainment of the individual has to be compared with that of a group or such a comparison is to be made among different sub groups within a given group, we look for expected standards or norms that would help to judge the performance of the individuals or sub groups with that of the target group.  These norms will be widely accepted and well defined in the form of grades or scores.  An evaluation based on these approved norms is said to be norm referenced evaluation. For example, a student may be certified to have passed S.S.L.C. examination with distinction, first class or second class on the basis of norms given as scores percent( 80%, 60%, 50%) or grades (A+, A, B+, etc.)

Revised Taxonomy

Later Bloom's Taxonomy edited by his disciple Anderson and his fellow worker Krathwohl.......

it can be view in the following link