RESEARCH METHODS AND BIOMETRY

 SA’ADATU RIMI COLLEGE OF EDUCATION, KUMBOTSO

SCHOOL OF SCIENCE EDUCATION


DEPARTMENT OF BIOLOGY


RESEARCH METHODS AND BIOMETRY

Meaning, purpose and relevance of research methods and biometry

Research method refers to the techniques for investing phenomenon through experimentation. An experiment is carried out in order to investigate a phenomenon which can lead to the acquisition of new knowledge or it can be carried out to confirm or to correct a previous finding. 

Biometry, Biometrics or Biostatistics is the application of statistics to a wide range of topics in biology and its related discipline particularly agriculture and medicine to distinguish it from statistics used in other field of human endeavour e.g. economics business, education etc. The subject encompasses the design layout, conduct of experiments, collection, compilation and analysis of numerical data as well as interpretation of the results and drawing valid conclusion for a particular situation time. 

Biometry or Biostatistics is relevant to the field of Biology Agriculture and Medicine; where the subjects of inquiries are variable in nature and the extent of variation coupled with the relative importance of the different causes of the variations are the primary concerns. Three, eminent scientist Sir R.A. Fisher, S.D. Wright and J.B.S. Haldane were the founding fathers of Biometry. 

TYPES OF RESEARCH

Generally, research is divided into two main types these are: - 

Qualitative Research and 

Quantitative Research 

Qualitative Research: is a type of research that aim to investigate a question without attempting to quantifiable measure variable or look to potential relationship between variables. Example of qualitative research include case study.

Quantitative Research: is a systematic empirical investigation of quantitative properties and phenomena and their relationships, there by asking a narrow question and collecting numerical data to analyse using a statistical methods. Example include: experimental, correlational and survey. 

Choice of Research Topic 

The most successful research topics are narrowly focused and carefully defined. Before a researcher chose a research topic, several factors are taken into consideration. Some of them have to do with your particular interests, capabilities and motivation. Other focused on areas that will be of great interest to both the academic and private sectors. 

The chemistry professor and author, Robert Smith in his book Graduate Research: A guide for students in sciences (1984) listed some point to be consider in finding and developing a research topic: 

Can it be enthusiastically pursued? 

Can interest be sustained by it? 

Is the problem solvable? 

Is it worth doing? 

Will it lead to other research problems? 

Is it manageable in size? 

If the problem is solve, will the result be reviewed well by scholars in your field? 

Are your, or will you become, competent to solve it? 

What is the potential for making an original contribution to the literature in the field? 

Hypothesis: (Types source, Formulation) 

Hypothesis is one of the concepts frequently used by scientist and other researchers in their scientific writings and works. 

Hypothesis is a proposed explanation for a phenomenon. It is simply defined as the educated guess that is based upon observation of an event or phenomenon. It is a possible explanation of an event.

That is it provides an explanation as to why or how an observed event happen or what makes it happen but which has not been proved. Therefore it is simply based on intuition or reasoning. Most hypothesis can be supported or rejected by experimentation or continued observation. 

Types of Hypothesis 

Types of hypothesis include: - Null hypothesis, alternate hypothesis and scientific hypothesis. Null and alternate hypothesis are the two types of hypothesis found in statistics hypothesis testing. One is often just the contradiction of the other. Scientific hypothesis is not commonly known as an educated guess, it is a scientific theory that has not yet been proven to be true. 

Null hypothesis always predicts the absence of relation between two variable e.g. “there is no relationship between education and income” while. 

The alternate hypothesis states an actual expectation such as “Higher levels of education increase the likelihood of earning a higher income”. 

Sources of Hypothesis 

Hypothesis can be derived from many sources: 

Theory: Theory on the subject can act as source of hypothesis e.g. providing employment opportunity is an indicator of social responsibility of a government enterprises from the above many hypothesis can be deduced. 

Public enterprise has greater social concern than the enterprises 

People’s perception of government enterprise is a social concern. 

Observation: People’s behaviour is observed. In this method we use observed behaviour to infer the attitudes. 

Past experience: Here researcher goes by past experience to formulate the hypothesis. 

Case studies: Case studies published can be used as a source for hypothesis. 

Formulating Hypothesis 

Hypothesis can be explained as a statement that can be used to predict the outcome of future observations. It stated that you predict will happen based on changes in variables. Variables are things that change. A hypothesis is stated in order to be tested by an experiment. It can come as answers or explanations to the question you have raised since from the beginning. It can also be a statement that explains or describe the way you think the observation you made are brought about, that is the mechanism underlying the phenomenon. 

Once you have identified your research question, it is time to formulate your hypothesis. An experiment is carried out to test the validity of the hypothesis formulated through acceptance or rejection. Hypothesis are form in order to be discredited and when they are not rejected they become accepted. 

Formulating a valid hypothesis takes practice but it is important to the success of any experiment. The following steps need to be consider when formulating a hypothesis. 

Great your hypothesis in a form of a question. E.g. “Does smoothing cause lung cancer?”

Formulate the hypothesis by making it a conditional statement e.g. “smoking may cause lung cancer”. 

Write a formalized hypothesis e.g. “If smoking cause lung cancer, then individuals who smoke have a higher frequencies is consider as the most useful. 

Make sure that your hypothesis contains variable the researcher is always in control of the independent variable in the experiment. The dependent variables are mostly observed in the context of the experiment. For an experiment to be valid, it must contain at least two variables. 

Make sure that your hypothesis include subject group. A subject group defines who or what the researcher is studying in our example the subject group are the smokers. 

Include a treatment or exposure in the experiment. A treatment is literally what is being done to the subject group. In our example, the exposure is smoke or smoking. 

Prepare for an outcome measures, which is a measurement concerned with how the treatment is going to be assessed. In our example the outcome of smoking is the frequency of smoke developing cancer in subject population. 

Understanding your control group. Control group is a group similar to the subject group but does not receive the treatment i.e. is a population that the subject group is compared to. In our example, the control group is non-smokers. 





DATA COLLECTION (TYPES AND SOURCES)

Data (Singular Datum) is a numerical statement of fact in a specific field of enquiry. Simply, data means numbers and these numbers are obtained from research carried out in a specific area of investigation. For example a research may be interested in finding out how many students are admitted into the various departments in S.R.C.O.E Kano annually for the last five years (2008 – 2013). This will involve checking the records at students’ affairs and counting numbers of students admitted into each department during the stated period. The number counted for each department is then counted. This record forms your data. The research may also involve distribution of questionnaires and getting responses from the respondents. Example ministry of health is interested in knowing how many people use treated nets in an area that is known to have a higher prevalence rate of malaria in an attempt to popularize its use. The researcher may have to cover the area with questionnaires, and use that to count how many people use the nets, how many people use other methods of prevention and how many people do not use any method. This also form a data from which you can carry out further analysis. So also study may involve taking measurement on blood pressure, parasite count, height weight, yield temperature humidity etc. the measurements records from these parameters all comprise data. 

Therefore, data may be collected by two methods, counting or taking measurements. Data collected from counting (e.g. number of students in SRCOE) is taken in whole number (1,2,5,8,11 and so on) and is referred to as DISCRETE or DISCONTINUOUS data. On the other hand data collected from taking measurements (such as blood pressure is termed as Continuous data (2.3, 2.34, 2.345) depending on the accuracy of the measuring device. It is very important to know the kind of data one is collecting whether Discrete or Continuous because it is an important determinant of the kind of statistical test to be used for data analysis. 

Population Sample and Sampling Techniques

Due to limitation of resources and time as well as variation in Biological materials, it is difficult to collect information from the whole population during research. Therefore during investigation, information or fact are collected from a samples. By definition, sample in defined as the time representation of entire population, when collecting information, sampling must be made randomly in thought bias in the size of sample must be large enough to be a good representation of the whole population. Sampling size must be determine before the commencement of the experiment or investigation. 

Sampling Techniques 

This refers to the techniques of taking a portion of the population as a representative of that population. The researcher has to decide on whether every member of the population will be studied i.e. sample. In most areas, sampling is done when the researcher wants to generalize the findings of his/her research to the entire population: the following techniques are employed when sampling. 

Simple random sampling 

Systematic sampling 

Stratified sampling 

Cluster sampling 

Simple Random Sampling 

This is a method of selecting a sample from a population so that all members of the population have an equal chances of being selected. This indicates that no member of the population has been omitted deliberately except by chance. This is most applied when the population is homogenous. For example, suppose a researcher wants to choose 10 students at random out of a population of 50 within a given class each students has one chance of being included in the sample.

  

Stratified Sampling: Stratified sampling is applied when you are dealing with a heterogeneous population. It involves grouping men or mutually exclusive groups before sampling. In stratified sampling, an individual’s first of all identifies the strata of interest and divides the population into sub-group or strata depending on the number and types of sub-group that exist in the population and the objectives of the study. For example stratification may include variables of gender i.e male and female, income, tribe, religion etc. The idea behind using stratified sampling techniques is to examine each sub-group separately.   

Systematic Sampling: Systematic sampling techniques is best applied to a homogenous population. It involves selecting samples at regular intervals or fixed distances that are equally spaced. This is a method of selecting a sample at fixed intervals from a population that is systematically arrange or in alphabetical order or in some natural sequence. It is employed when all the population or subjects are put in order, for example systematic sampling can be used in selecting 100 people out of 300. 

Cluster Sampling: In cluster sampling, the population or habitat or field, is divided into smaller groups or clusters and some of these cluster are randomly selected to form the sample. In each cluster selected, every member would be taken to be part of the sample and included in the study. In cluster sampling, each cluster chosen must be a representative of the population. When the population to be sampled is vast and spread over a wide geographical areas a cluster sampling is used. In this type of sampling, the population is first divided into sample units or cluster. For example state within the country, local government area within the state etc.


DATA PRESENTATION

Raw data, that is data recorded directly from the laboratory or field in the record book is unorganized and because of that it make it difficult to comprehend and draw information from it. Therefore, raw data collected from experiment have to be properly organized, summarized and present. Presented and statistically analysed for any meaningful interpretation to be made out of it. 

The first step in organizing data is the preparation of an ordered array. This explain arranging the data in an ascending order. An ordered array enables one to determine quickly the value of the smallest measurements and that of the highest as well as other facts or information that may be need from the given data. Data may be organized and summarized through the following processes. 

Organize the data by putting them in tables such as frequency tables, and other forms of summary tables. 

Illustrate the data using graphs such as histograms, bar charts line graphs etc. 

Analyse the data using mathematical approach. 

Once the data is summarized an appropriate statistical test can be carried out in order to draw conclusion about the investigation. 

 Consider the following example. 

Example 1.0: Number of pods produced per plant by certain species of legume in Kadawa research station. 

69 74 71 74 71 71 

69 72 72 78 72 68 

70 67 72 70 75 72

71 70 74 73 75 75

75 74 74 73 74 72

67 76 75 75 68 78

69 68 75 78 69 78

73 69 75 68 71 78

The fact that the above data is unorganized make it difficult to draw information from it, and so an ordered array can be prepared as follows: - 

Table 1.0 

67 67 68 68 68 68

69 69 69 69 69 70

70 70 71 71 71 71

71 72 72 72 72 72

72 73 73 73 74 74

74 74 74 74 75 75

75 75 75 75 75 75

76 78 78 78 78 78

From this ordered array one can easily find out the minimum number of pods/plant (67) or the maximum (78). 

However an ordered array may not be adequate in offering all the information that are needed from the data obtained. Therefore further summary may be required and one way is by tabulating your data according to frequencies. 



The Frequency Distribution 

Classifying data according to frequencies enables further summary as well as facilitates the computation of various descriptive measures like the percentages, average and so on. To get the frequency of number, the first step is to get the range of the figures in the table and then you take a tally and record the score of each number that is the frequency of occurrence of each number. In example 1.0, the range is from 67 to 78 and the frequencies are obtained as followed: 

Table 1.2: A tally of frequency of occurrence number of pods/plant

No. of Pods/plant

Tally

Frequency


67

68

69

70

71

72

73

74

75

76

77

78

II

III

IIII

IIII

IIII I

III

III

IIII I

IIII III

I

0

IIII

2

4

5

4

6

4

3

6

8

1

0

5


TOTAL

48



This frequency can then be presented in a table which is referred to as frequency distribution table. 

Table 1.3: Frequency distribution of the number of pods produced plant

No. of Pod/Plant (x)

Frequency (f)

Relative frequency (%)


67

68

69

70

71

72

73

74

75

76

77

78

2

4

5

3

5

6

3

6

8

1

0

5

2/48 x 100 = 4.17%

4/48 x 100 = 8.33%

5/48 x 100 = 10.42%

3/48 x 100 = 6.25%

5/48 x 100 = 10.42%

6/48 x 100 = 12.50%

3/48 x 100 = 6.25%

6/48 x 100 = 12.50%

8/48 x 100 = 16.67%

1/48 x 100 = 2.08%

0/48 x 100 = 0.00%

5/48 x 100 = 10.42%


Total

48

100%



Cumulative Frequency 

This is usually derived from the frequency distribution table by summing up the observed frequencies. It gives way of obtaining information regarding the frequency or relative frequency of value with two or more contiguous observation or group. From example 1.0 the following cumulative frequency distribution can be obtained. 

Table 1.4: Cumulative frequency of number of pods/plant 

No. of Pods/Plant (x)

Frequency (f)

Cumulative frequency

Cumulative relative frequency


67

68

69

70

71

72

73

74

75

76

77

78

2

4

5

3

5

6

3

6

8

1

0

5

2

6

11

14

19

25

28

34

42

43

43

48

4.1

12.5

22.9

29.2

39.6

52.0

58.3

70.8

87.5

89.6

89.6

100.0


 

THE FREQUENCY DISTRIBUTION FOR GROUP DATA

Another method of summarising a data is by classifying it into groups or classes from which the various descriptive measures can be easily calculated. To group data the first step is to select a class interval such that each value is placed in only one interval bearing in mind that there can be not less than 6 intervals and not more than 15. There are two methods of classification; exclusive which is best for discrete data and inclusive which is ideal for continuous data. 

Example 2.0 the weight of all NCE II research method and biometry students are measured and the results is expressed to the nearest kilogram (kg) as follows: 100, 95, 85, 75, 80, 78, 90, 93, 109, 104, 88, ...... (the dots represent value not shown) assuming we have 100 such measurement. The measurement can be seen to range from lowest value of 75kg to a higher value of 109kg. If we collect together the value in groups such that each value is placed in only one class interval a frequency distribution. 

Table 2.0: Frequency distribution of weight of NCE II Research method and Biometry students. 

Class interval of weight of students (kg)

Frequency (f)

Class Mark


75 – 79 

80 – 84  

85 – 89 

90 – 94 

95 – 99

100 – 104 

105 – 109 

2

2

7

11

28

30

20

77

82

87

92

97

102

107


Total 

100




Histograms: This is a graph of frequency or relative frequency distribution plotted in a form of rectangle or bars continuous data is best represented in the form of histogram. 







Bar-chart: This represent the frequency distribution of describe data and because the data is discontinuous, bar and columns are form, and exclusive type of classification is used. 

Example: Let us assume we have 37 observations on number of flowers produced per plant as follows: - 

Table 3.0: Frequency distribution of number of flowers/plant 

Class interval of no. of flowers/plant

Frequency (f) 

Class Mark


1 – 5 

6 – 10 

11 – 15

16 – 20 

21 – 25 

26 – 30 

5

8

4

7

10

3

8

13

18

23

28


Total 

37



A bar chart can be drawn from the above illustration 


Line Graphs: These changes in one variable in relation to another variable. In this case we have; 

Y = vertical axis as dependent variable. Dependent variable is the one which is not under the control of the experimenter, it changes in response to changes in the independent variable. 

X = horizontal axis, and is the independent variable. Independent variable is the variable which is under the control of the experimenter. 

Example: Measurement on height of maize plant were taken and the data presented in the table below:





Table 4.0: Height of maize plant at different number of days after planting. 

Time in days x (Independent variable)

Heights (cm) y (Dependent variable)


20 

40 

60 

80 

100

25

50

75

100

125



A line graphs of maize plants height could be drawn as follows: 



CLASS ACTIVITY

The number of flowers produced per plants by rose plants in garden are as follows: 

6 3 6 7 8 4 6 7 8 4

9 8 9 5 7 8 9 5 7 8

8 5 7 8 10 6 7 8 6 7

The data above can be organized by constructing 

A frequency distribution table 

Relative frequency distribution 

MEASURES OF CENTRAL TENDENCY

They are also known as measures of location. There are three most commonly used measures and these are; mean, median and mode. These measurement provide a general idea of the position of the center of a distribution. 

The Mean 

The most common of the measures of central tendency is the average or in mathematics term, the arithmetic mean or simply the mean. This is achieve by adding up all the observations and dividing by the total number of measurements obtain; thus: 

Mean = Sum of observation 

Total number of observation

The mean is donated by X (x-bar). The general formula for the calculation of sample mean can be given as: 

X =  

Where X represents the variable 

∑ = summation sign n = number of observation 

∑ = x1 + x2, x3 ............... xn

Example 3.0: To compute the mean of the number pods produced plant of 48 plants represented in table 1.0, which is a set of unsummarised measurements, we simply add them all up and divide by the total number. 

X =   = 69 + 74 + 71 .................. 71 + 78 

48

=   = 72.438 

X = 72.44 

Median 

To calculate the median, the data has to be arrange in an array, in an increasing order of size. When this is done then the median is the value which divide a set of values into two equal parts. When the numbers are odd, the median is the central or middle value, but when they are even, then it became mean of the two central figures. In others words median can be defined as: 

n + 1

   2 

Therefore, if we have 15 observations, the median is the () = 8th ordered observation. If we have 16 observation, then it the () = 8.5 ordered observation and this implies that the value is half way between the 8th and the 9th observation. You therefore take the average of the 8th and 9th observation as your median. 

Example 4.0: Taking the data presented in table 1.0 for illustration, since the value are already arranged in a sequence and it is an even number, the median is therefore the: 

  =    =   =  24.5th value 

Therefore we take the two middle values which are 72 and 72. 

The median therefore is   = 72 

The Mode 

It is defined as the most commonly occurring value in a data. Thus if all the figures are different e.g. 5, 6, 11, 15, 8 then there is no mode. Alternatively it is possible to get more than one mode e.g. 5, 5, 6, 8, 8, 9; here 5 and 8 are the modes. 

Example 5.0: The mode of the data presented in table 1.0 is 75. This is because it occurs most frequently (8 times). 


MEASURES OF DISPERSION

The observation making up a set of data are scattered about the mean. This scatter or dispersion reveals the degree of variability present in a set of measurements. If all values are the same, then there is in dispersion in the data. If on the other hand, the value are not these then there is dispersion, and the amount of dispersion depend on spread of the figures, that is how close together or how scattered, they are: 

The main measures of dispersion encountered in biological estimates include the followings: 

Mean deviation 

Variance 

Standard deviation 

The Mean Deviation 

Dispersion can also be assessed by calculating deviation from the mean since the mean is a very useful measure of central tendency. The sum of the deviation from the mean is always zero, so one has to ignore the negative sign, since it is not used in Biology. The contributions an observation (x) to scatter is equivalent to its distance from the mean, i.e. X – X. Deviation from the mean (MD) can be calculated using this equation. 

MD = ∑(X – X) Where X = observation 

n X = mean 

n = No. observation  

Example 6.0: Let us select a range of 5 figures from the observation on number of pods produced/plant in table 1.0. These are 67, 70, 75, 78, 68. To calculate Mean Deviation (MD), we to first of all calculate the mean: as thus: 

X = ∑X = 67 + 70 + 75 + 78 + 68 = 358  

n 5     5 = 71.6   

MD = (67 – 71.6) + (70 – 71.6) + (75 – 71.6) + (78 – 71.6) + (68 – 71.6)

5

= (-4.6) + (-1.6) + (3.4) + (6.4) + (-3.6)

5

Ignoring the minus signs 

MD = 19.6 = 3.92

   5 

The Variance

As ready mentioned that the amount of variation in a given data depends on how close or scattered the values are from the mean. The closer they are to the mean the less the dispersion while the further they are from the mean, the greater the variation. Therefore it is common to measure dispersion relative to the scattered of the value about their mean. One measure through which this is achieved is through the computation of the VARIANCE. Variance is the sum of the squares (S2) of the deviations of the values from the means divided by the size of the sample (n) minus 1. If it is a population, you divide by the size of population (N). The procedure for the computation of variance is therefore, to compute the mean, then subtract the mean from each of the values, square the resulting differences, sum up the squared differences and divide by the size of the sample or population. The variance of a sample can be represented as follows: 

2 = .........................................(1)

When you have a large data the computation of sum of squares from the above equation can be tedious, and so an alternative formula, which may be less familiar can be adopted. This is 

2 = ......................................(2)

Example 7.0: Let us compute the variance of the sample discussed in example 6.0. we substitute the value in the first equation. 

2 = (67 – 71.6)2 + (70 – 71.6)2 + (75 – 71.6) + (78 – 71.6)2 + (68 – 71.6)2 

5 – 1 

2 = (-4.6)2 + (-1.6)2 + (3.4)2 + (6.4)2 + (-3.6)2

4

2 = 89.2 

4 = 22.3

The fact that n – 1 is used in the division instead of n as would have been expected is due to a concept known as degree of freedom (df). 

The Standard Deviation 

This is the positive square root of the variance and therefore has the same unit of the original measurement unlike the variance. Standard deviation of the sample is given as: 

 = 

 Or the alternative formula for a large data; 

 = 

Example 8.0: To compute the standard deviation of the sample discussed in example 7.0. Since the variance (S2) of the same sample has already been calculate in example 7.0 and it is 22.3 it follows that the standard deviation (S) is. 

 = 

 áµŸ = 4.722

MEAN, VARIANCE AND STANDARD DEVIATION FOR A GROUPED DATA

You may wish to compute the mean and variance from a frequency distribution table. In this case you assume that all values falling in a particular class interval are placed at the mid-point of the class interval. To determine the mid-point of a class interval, the average of the lower and upper and limits of the class interval is calculated and that taken as the mid-point (X). It is good to prepare a worktable as follow for convenience and ease in calculation. 

The Mean 

To complete the mean, the mid-point (x) of each class is multiplied by its corresponding frequency (f) and divided by the sum of the frequencies (∑f). This is represented as follows: 

X =  = ....................................... for group data 

MD = .................................. for group data


Variance for grouped data 

2 = 

Standard deviation for grouped data 

 = 

Example 8.0: Let us compute the mean, mean deviation, variance and standard deviation of the weight of NCE II research Method and Biometry students presented in the frequency distribution table (2.0). The work table is shown as follows: - 

Table 5.0: Workable for calculating the mean, mean deviation, variance and standard deviation from frequency data of table 2.0. 

Class Interval

Frequency  (f)

Mid-point (x)

Fx

x-x

(x-x)2

(x-x)2 f

(x-x)f


75 – 79

80 – 84

85 – 89

90 – 94

95 – 99

100 – 104

105 – 109

2

2

7

11

28

30

20

77

82

87

92

97

102

107

154

164

609

1012

2716

3060

2140

-21.5

-16.5

-11.5

-6.5

-1.5

3.5

8.5

462.3

272.3

32.3

42.3

2.3

12.3

72.3

924.6

544.6

926.1

465.3

64.4

369

1446

-43

-33

-80.5

-71.5

42.0

105

170


Total

 =100


 9855



4740

 (x-x)f=545.0



1. X =   =    =  98.5 

2. MD =   =   = 5.5

3. 2 =  =  =   = 47.88

4. ᵟ =   = 

MEASURES OF RELATIONSHIP

The Chi-Squared Test 

The statistical tests discussed in the previous sections are used to analyze differences between mean or to determine relation between two variable collected from every member of the population. The tests described are considered when the data collected is from continuous variables like height, weight, blood pressure age, yield e.t.c alternatively referred to as quantitative date but there are discrete date which cannot be measured but counter. In such, a ratio is found out known as the chi-squared (with the symbol x2) to goodness of fit. There are experiments that collect qualitative date, similarly such research seeks to determine association between two factors such as seedling vigor and leaf colour or eye colour and hair colour or flower pigmentation and seed germination and so on. In this case a contingency table is formed and chi-squared X2 is used to determine whether or not there is association between the factors. The X2 – test is also used to compare two proportions or percentages. It may be regarded as a non-parametric test as it does not require detailed information about the population from which the samples are drawn. 

Test of Goodness of Fit 

In situation where the basis of a hypothesis, which specifies the frequency of occurrence of an observation, it is tested, the test of goodness of fit is used. It is therefore frequently used in genetic studies. It is used to test whether the outcome of a breeding experiment in line with the predicted mendelian rations. In testing goodness of fit, the X2 determines how close the observes frequencies are to those expected on the basis of a hypothesis. It is calculated using the following expression; 

X2 = ∑

The computed value is compared with the table value of X2 at n-1 degree of freedom. Goodness of fit test may involve one – way classification in which the frequencies fall into only two classes that are mutually exclusive and than is the simplest test. Or the test may also involve more than two classes. 

Example 9.0: Test of goodness of fit with two classes. 

If in determination of sex ratio among a sample species of birds, it was discovered that out of 500 birds, 378 were male. Does this sample conform to the one male: one female ratio, that is 1.1? 





Solution 

Sex

Male

Female

Total


Observed no. of bird 

Expected no. of bird 

O – E 

X2 = ∑(O-E)2

378 

250

128

65.5

122

250

-128

65.5

500

500

0

131.0



Therefore the calculated value (131.0) exceeded the table value at 500% and 1 d.f. (that is 3.481) and even at 1& on the same do (that is 6.635), the difference is highly significant. We therefore conclude that this sample does not confirm to the 1:1 ratio. 


Example 10.0: Test of goodness of fit with more than two classes. 

Suppose a cross-involving two cowpea varieties (smooth white and rough brown) show the ratio of 9:3:3:1, if a sample of 550 or observations give the following frequencies 305 smooth white (sw), 110 smooth brown (sb), 105 rough white (rw) and 30 rough brawn (br). Test whether this agrees with the given ratio. 

Solution 

If the given ratio applies, it follows that the expected frequencies in the four groups will be. 

psw = 9/16 x 550; psb = 3/16 x 550; prw = 3/16 x 550; prb = 1/16 x 550 respectively, where p stands for probability. The sum of the expected frequencies must be the same as that of the observed frequencies the next step is to construct the table of observed and expected frequencies in the following way. 

Phenotype category

1

sw

9

2

sb

3

3

rw

3

4

rb

1

Total


Observed frequency (O)

Expected frequency (E) 

O – E 

X2 = ∑(O-E)2 

               E

305

309.42

-4.42

0.06

110

103.14

6.86

0.46

105

103.14

1.86

0.03

30

34.38

-4.38

0.56

550

500


1.11


Post a Comment

0 Comments