# Validity And Reliability Of The Likert Scale Psychology Essay

Common definition for measurement is the process of determining the magnitude of a quantity, for example length, temperature, or mass. This magnitude is represented by a unit of measurement, such as a meter, Celsius or a kilogram. On the other hand, measurement is also known as a procedure for assigning symbols, letters, or numbers to empirical properties of variables. However, in research, measurement is defined as the process of observing and recording the observations that are collected as part of a research effort (http://www.socialresearchmethods.net/kb/measure.php). The “levels of measurement”, or also known as scales of measure are the term that normally refer to the theory of scale which was founded and developed by the psychologist Stanley Smith Stevens in 1946. (http://en.wikipedia.org/wiki/Level_of_measurement). His theory in an article named “On the theory of scales of measurement”, Stevens proposed that scientifically, all measurement can be classified into four different types of scales namely “nominal”, “ordinal”, “interval” and “ratio”.

The measurement scale was first derived from the field of psychology with measuring the attitude. Attitude measurement differs from scholars to another. Earlier attitude measurement is championed amongst others by Louis Leon Thurstone in 1928, Louis H. Guttman in 1916, and Rensis Likert in 1932. Their measurement techniques are known as Thurstone Scale, Guttman scale, and Likert scale respectively.

In psychology field, Thurstone was the first to develop the measurement for measuring an attitude and it was the first formal technique applied during which to measure attitudes of those people towards their religion. This scale was implemented using the issues that have been translated into statement which posses numerical value. This numerical value would be used by these people to indicate their judgment of either favorable or unfavorable. However, Thurstone scale of attitude measurement is normally be used with empirical behaviors of agree-disagree responses. It is equal-appearing intervals constructed from a big pool of statements or issues regarding an attitude that are ranging from strongly negative via neutral to strongly positive. Besides being one of the first, Thurstone was the most renowned productive scaling theorist. He had invented three famous different methods known as the method of equal-appearing intervals, successive intervals, and the paired comparisons. These methods are used to develop a unidimensional scale.

On different notes, Guttman measures attitude differently from Thurstone. Guttman scale emphasis on the hierarchical order of the items measured. This hierarchy of an item measured normally implies the probable agreement with items below it and is constructed with the scalogram analysis method. This method analyses a large pool of attitudes statements object that are administered by a group of respondents. These respondents will mark the items that they agree, which later the set of the responses will be arranged into a hierarchy. Their agreements with one particular item will also agree with lower rank-order items of that particular item. The Guttman scale applies to series of items that have binary results such as an achievement test. Known as cumulative scalling or scalogram analysis, Guttman scale establishes a one-dimensional continuum that is used mostly on short questionnaires design with constructs that hierarchical and highly structured such as the survey on relationship hierarchies.

Notwithstanding the above, one popular measurement that has been used widely by many researchers is Likert Scale. It was first introduced to measure attitudes and was termed as “A technique for the Measurement of Attitudes”. This technique portrays a set of attitude statements or issues. Actually, it is popularly used scale in questionnaires and in survey research. Since the focus of this excerpt is on Likert scale, we will discuss about it further in the following sections.

## 2. Levels of measurement

In research, level of measurement refers to the relationship between all the values of the variable which interconnect through attributes of that variable. These values that representing those attributes are normally been assigned with number. For instance, let assumes school association as the variable and the attributes for this variable are English, Mathematics, Physics, and Geography association. For the purpose of analyzing the outcome of this variable, we then assign those attributes with the numerical value of 1, 2, 3, and 4. As a result, level of measurement describes the relationship among these four values can be depicted as in figure 1 below:

Figure 1, relationship among variable, attributes andvalues

Variable

Attributes

Relationship

ValuesSchool’s Association

English Association

Association

Mathematics Association

Physics Association

Geography Association

## 1

## 2

## 3

## 4

In statistic, scales of measurement refer to ways in which variables/numbers are defined and categorized with certain properties. These properties determine which statistical analysis would be used (http://www.webster.edu/~woolflm/statwhatis.html). In other word, the appropriateness of applying certain statistical analyses relies on these properties. Typically, there are four scales of measurement namely nominal, ordinal, interval, and ratio scales of measurement.

2.1. Nominal scale

This type of scaling is said to be the most basic measurement. It is a categorical scale with its basic applied rule of different entities receive different value (Meyer, Gamst & Guarino, n.d). This type of scale is commonly used to measure items that classify individuals, firms, brands, product or any other entities into categories where order is not important (http://www.fao.org/docrep/W3241E/w3241e04.htm). For this reason nominal scale has been known as a categorical scale, due to its nature of measuring entities in a cluster.

Taking figure 1 as an example, the numbers are only representing the values of all the attributes. It cannot be assumed that higher numerical value place representing certain attribute would mean “more” of something or otherwise. For instance, in nominal scale number 3 which represents Geography association does not have higher actual value than number 2 that represents Mathematics. In nominal scale, the numbers have no arithmetic properties, instead they just merely act as the labels (http://www.fao.org/docrep/W3241E/w3241e04.htm). There is neither significant quantitative dimension implied in this type of scaling nor the implication that one entity is in any way “more” than another (Meyer, Gamst & Guarino, n.d). To sum up, each of the observation belongs to its own category and such an an observation does not represent “more” or “less” than another observation.

Upon completion of the questionnaires, each item response will be analysed or in certain cases summated in order to obtain the score. Individual Likert items could be considered as interval level data, ordinal data, or in some cases as nominal data. Such classification would depend on the arrangement of the individual Likert items. When using Likert items of only five levels, researcher cannot make assumption that respondents perceive all the adjacent levels as being in equal distant, thus regard item as an ordinal data (http://en.wikipedia.org/wiki/Likert_scale). This statement is in agreement with Jamieson (2004) that Likert scales fall within the ordinal level of measurement because response categories have the rank order, however, the interval between values cannot be presumed equal. On the other hand, often (referring to the example above) the wording of response levels about a middle category clearly implies symmetry of response levels, hence, such an item would fall between ordinal-level and interval-level measurement. Thus, by treating it as merely ordinal would be inappropriate as the information would be lost (http://en.wikipedia.org/wiki/Likert_scale). The advantage of treating this individual Likert items as ordinal data is that Likert responses can be portrayed into several statistical charts including bar charts. Also for ordinal data, the median or the mode can be employed as the measure of central tendency (Clegg 1998). This is because the calculation for the mean and the standard deviation are inappropriate for ordinal data (Blaikie 2003) as the numbers used to represent the data are generally the verbal statements (Jamieson 2004).

Treating ordinal scales as interval scales has long been controversial. Responses to several Likert questions may be summed, provided that all the questions use the same Likert scale and the scale is an approximation to an interval scale (http://en.wikipedia.org/wiki/Likert_scale). Recently, Santina (2003) and Hren (2004) had published their papers which had used Likert scale to describe data using means and standard deviation. They had assumed Likert-type data to be measured as an interval-level measurement. The controversial in this issue has attracted many researchers to argue about the permissibility of treating ordinal scales as interval scales. Jamieson (2004) is among researcher who was against this assumption. In addition, data from Likert scales are sometimes reduced to the nominal level by combining all “agree” and “disagree” responses into two categories of “accept” and “reject”. The chi-square, Cochran Q, or McNemar test is common statistical procedures used after this transformation (http://en.wikipedia.org/wiki/Likert_scale).

3.4. Validity and reliability

According to Thanasegaran (2009), reliability is the degree to which measures are free from error and therefore yield consistent results. In the nutshell, reliability is the consistency of a measurement procedure on a measuring instrument. The measurement is considered reliable if the test procedure consistently yields the same result or score. Therefore, the researcher can expect consistency or constant deviation outcome throughout testing scenarios on the same testing instruments. On the contrary, Gregory (1992) defined validity as “the extent to which (a test) measures what it claims to measure”. This clearly explains that if a measure is measuring what it is supposed to measure, then a measure is valid. In scientific term, validity refers to the degree to which evidence and theory support the interpretations of test scores. In classical measurement theory observed score includes true score and observed score. True score exist for the concept being measured and rarely observable. Relatively an observed score would consist of true score plus error score and it is an actual observed score. According to Bidin (2010), error score consists of method error and trait error and it is derived from the difference between observed and true score, whereas true score is derived from perfect reflection of true value for individual and is also known as theoretical score. Method error happens when there are issues in testing situation or unfavourable characteristics of the test. On the other hand, issues in individual characteristics would possible lead to trait error. Hence, for an effective research, measurement errors can greatly affects the ability to search for expected results in one’s data as well as greatly damage the interpretability of resultant score derive from the testing instrument. For this reason the reliability and validity of the measurement is very crucial in research.

Error of measurement has long been known as a major problem in social psychological research. It has been detected as early as Liket scale’s time by his inventor (Bardo, Yeager, & Klingsporn (1982). Their research found that data for 4-, 5-, and 7-position Likert formats revealed systematic error varied among formats that are central tendency errors are likely to increase with increasing numbers of categories. The issue in Likert scale is the scale reliability. Bardo, Yeager, & Klingsporn (1982) noted that the scale reliability decreases as the number of choice-points exceeds two. This is supported by Lissits and Green (1975) that Likert scale reliability increased from 2 to 5. In addition, according to Gliem and Gliem (2003), single item scale is unreliable as compared to multi item scale. Likewise, for validity measure, study by Chang (1994) indicates that 4 points and 6 points format are equally valid. According to Jacoby and Matell (1971), validity need not be considered when determining the number of steps in a Likert scale due to no consistent relationship with the number of scale steps utilize. Dawes (2008) noted, that both simulation and empirical studies have concurred that reliability and validity improved by using 5- to 7-point scales instead of using fewer scale points. However, it is noted that more finely graded scales do not further improve scales reliability and validity.

## 4. Conclusion

Measurement is very important in conducting research and it is used widely in statistics for what is termed as scales of measurement. Scale of measurement pointed out ways the variables and/or numbers are categorized and defined with each scale has its own properties that determine which statistical analysis would be appropriate to use. It has been established by Stevens, the founder of the theory of scales type or “levels of measurement” that the scales of measurement are nominal, ordinal, interval, and ratio and this is according to hierarchy order. The lowest in hierarchy, nominal, has fewer mathematical properties than those in higher hierarchy. Nominal gives data on categories, ordinal on sequences, interval reveals the magnitude between points on the scale, and ratio describes both order and the absolute distance between any two points on the scale. Likert scale is a unidimensional scaling method often known as a “summative response scale”. It is commonly used in questionnaires and widely used in survey research. It is common to misinterpret between Likert scale with Likert item. Likert scale is just the total sum of responses of Likert items on Likert scale. Likert represented by a horizontal line called a visual analogue scale in which the respondent indicates his response by circling or tick-mark the item. Normally the scale is arranged by 5-level ordered response. Nevertheless, there are also 7-level, and 10-level ordered response are used. Regardless of the levels the items showed similar means, variance, skewness, and kurtosis when transformation was applied. The scale reliability decreases as the number of choice-points exceeds two and when determining the number of steps in a Likerts scale rating format. On the contrary, validity issue needs not be considered when determining the number of steps in a Likert scale due to no consistent relationship with the number of scale steps utilize. In addition, more finely graded scales do not further improve scales reliability and validity