SA’ADATU RIMI COLLEGE OF EDUCATION, KUMBOTSO
SCHOOL OF SCIENCE EDUCATION
DEPARTMENT OF BIOLOGY
RESEARCH METHODS AND BIOMETRY
Meaning, purpose and relevance of research methods and biometry
Research method refers to the techniques for investing phenomenon through experimentation. An experiment is carried out in order to investigate a phenomenon which can lead to the acquisition of new knowledge or it can be carried out to confirm or to correct a previous finding.
Biometry, Biometrics or Biostatistics is the application of statistics to a wide range of topics in biology and its related discipline particularly agriculture and medicine to distinguish it from statistics used in other field of human endeavour e.g. economics business, education etc. The subject encompasses the design layout, conduct of experiments, collection, compilation and analysis of numerical data as well as interpretation of the results and drawing valid conclusion for a particular situation time.
Biometry or Biostatistics is relevant to the field of Biology Agriculture and Medicine; where the subjects of inquiries are variable in nature and the extent of variation coupled with the relative importance of the different causes of the variations are the primary concerns. Three, eminent scientist Sir R.A. Fisher, S.D. Wright and J.B.S. Haldane were the founding fathers of Biometry.
TYPES OF RESEARCH
Generally, research is divided into two main types these are: -
Qualitative Research and
Quantitative Research
Qualitative Research: is a type of research that aim to investigate a question without attempting to quantifiable measure variable or look to potential relationship between variables. Example of qualitative research include case study.
Quantitative Research: is a systematic empirical investigation of quantitative properties and phenomena and their relationships, there by asking a narrow question and collecting numerical data to analyse using a statistical methods. Example include: experimental, correlational and survey.
Choice of Research Topic
The most successful research topics are narrowly focused and carefully defined. Before a researcher chose a research topic, several factors are taken into consideration. Some of them have to do with your particular interests, capabilities and motivation. Other focused on areas that will be of great interest to both the academic and private sectors.
The chemistry professor and author, Robert Smith in his book Graduate Research: A guide for students in sciences (1984) listed some point to be consider in finding and developing a research topic:
Can it be enthusiastically pursued?
Can interest be sustained by it?
Is the problem solvable?
Is it worth doing?
Will it lead to other research problems?
Is it manageable in size?
If the problem is solve, will the result be reviewed well by scholars in your field?
Are your, or will you become, competent to solve it?
What is the potential for making an original contribution to the literature in the field?
Hypothesis: (Types source, Formulation)
Hypothesis is one of the concepts frequently used by scientist and other researchers in their scientific writings and works.
Hypothesis is a proposed explanation for a phenomenon. It is simply defined as the educated guess that is based upon observation of an event or phenomenon. It is a possible explanation of an event.
That is it provides an explanation as to why or how an observed event happen or what makes it happen but which has not been proved. Therefore it is simply based on intuition or reasoning. Most hypothesis can be supported or rejected by experimentation or continued observation.
Types of Hypothesis
Types of hypothesis include: - Null hypothesis, alternate hypothesis and scientific hypothesis. Null and alternate hypothesis are the two types of hypothesis found in statistics hypothesis testing. One is often just the contradiction of the other. Scientific hypothesis is not commonly known as an educated guess, it is a scientific theory that has not yet been proven to be true.
Null hypothesis always predicts the absence of relation between two variable e.g. “there is no relationship between education and income” while.
The alternate hypothesis states an actual expectation such as “Higher levels of education increase the likelihood of earning a higher income”.
Sources of Hypothesis
Hypothesis can be derived from many sources:
Theory: Theory on the subject can act as source of hypothesis e.g. providing employment opportunity is an indicator of social responsibility of a government enterprises from the above many hypothesis can be deduced.
Public enterprise has greater social concern than the enterprises
People’s perception of government enterprise is a social concern.
Observation: People’s behaviour is observed. In this method we use observed behaviour to infer the attitudes.
Past experience: Here researcher goes by past experience to formulate the hypothesis.
Case studies: Case studies published can be used as a source for hypothesis.
Formulating Hypothesis
Hypothesis can be explained as a statement that can be used to predict the outcome of future observations. It stated that you predict will happen based on changes in variables. Variables are things that change. A hypothesis is stated in order to be tested by an experiment. It can come as answers or explanations to the question you have raised since from the beginning. It can also be a statement that explains or describe the way you think the observation you made are brought about, that is the mechanism underlying the phenomenon.
Once you have identified your research question, it is time to formulate your hypothesis. An experiment is carried out to test the validity of the hypothesis formulated through acceptance or rejection. Hypothesis are form in order to be discredited and when they are not rejected they become accepted.
Formulating a valid hypothesis takes practice but it is important to the success of any experiment. The following steps need to be consider when formulating a hypothesis.
Great your hypothesis in a form of a question. E.g. “Does smoothing cause lung cancer?”
Formulate the hypothesis by making it a conditional statement e.g. “smoking may cause lung cancer”.
Write a formalized hypothesis e.g. “If smoking cause lung cancer, then individuals who smoke have a higher frequencies is consider as the most useful.
Make sure that your hypothesis contains variable the researcher is always in control of the independent variable in the experiment. The dependent variables are mostly observed in the context of the experiment. For an experiment to be valid, it must contain at least two variables.
Make sure that your hypothesis include subject group. A subject group defines who or what the researcher is studying in our example the subject group are the smokers.
Include a treatment or exposure in the experiment. A treatment is literally what is being done to the subject group. In our example, the exposure is smoke or smoking.
Prepare for an outcome measures, which is a measurement concerned with how the treatment is going to be assessed. In our example the outcome of smoking is the frequency of smoke developing cancer in subject population.
Understanding your control group. Control group is a group similar to the subject group but does not receive the treatment i.e. is a population that the subject group is compared to. In our example, the control group is non-smokers.
DATA COLLECTION (TYPES AND SOURCES)
Data (Singular Datum) is a numerical statement of fact in a specific field of enquiry. Simply, data means numbers and these numbers are obtained from research carried out in a specific area of investigation. For example a research may be interested in finding out how many students are admitted into the various departments in S.R.C.O.E Kano annually for the last five years (2008 – 2013). This will involve checking the records at students’ affairs and counting numbers of students admitted into each department during the stated period. The number counted for each department is then counted. This record forms your data. The research may also involve distribution of questionnaires and getting responses from the respondents. Example ministry of health is interested in knowing how many people use treated nets in an area that is known to have a higher prevalence rate of malaria in an attempt to popularize its use. The researcher may have to cover the area with questionnaires, and use that to count how many people use the nets, how many people use other methods of prevention and how many people do not use any method. This also form a data from which you can carry out further analysis. So also study may involve taking measurement on blood pressure, parasite count, height weight, yield temperature humidity etc. the measurements records from these parameters all comprise data.
Therefore, data may be collected by two methods, counting or taking measurements. Data collected from counting (e.g. number of students in SRCOE) is taken in whole number (1,2,5,8,11 and so on) and is referred to as DISCRETE or DISCONTINUOUS data. On the other hand data collected from taking measurements (such as blood pressure is termed as Continuous data (2.3, 2.34, 2.345) depending on the accuracy of the measuring device. It is very important to know the kind of data one is collecting whether Discrete or Continuous because it is an important determinant of the kind of statistical test to be used for data analysis.
Population Sample and Sampling Techniques
Due to limitation of resources and time as well as variation in Biological materials, it is difficult to collect information from the whole population during research. Therefore during investigation, information or fact are collected from a samples. By definition, sample in defined as the time representation of entire population, when collecting information, sampling must be made randomly in thought bias in the size of sample must be large enough to be a good representation of the whole population. Sampling size must be determine before the commencement of the experiment or investigation.
Sampling Techniques
This refers to the techniques of taking a portion of the population as a representative of that population. The researcher has to decide on whether every member of the population will be studied i.e. sample. In most areas, sampling is done when the researcher wants to generalize the findings of his/her research to the entire population: the following techniques are employed when sampling.
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Simple Random Sampling
This is a method of selecting a sample from a population so that all members of the population have an equal chances of being selected. This indicates that no member of the population has been omitted deliberately except by chance. This is most applied when the population is homogenous. For example, suppose a researcher wants to choose 10 students at random out of a population of 50 within a given class each students has one chance of being included in the sample.
Stratified Sampling: Stratified sampling is applied when you are dealing with a heterogeneous population. It involves grouping men or mutually exclusive groups before sampling. In stratified sampling, an individual’s first of all identifies the strata of interest and divides the population into sub-group or strata depending on the number and types of sub-group that exist in the population and the objectives of the study. For example stratification may include variables of gender i.e male and female, income, tribe, religion etc. The idea behind using stratified sampling techniques is to examine each sub-group separately.
Systematic Sampling: Systematic sampling techniques is best applied to a homogenous population. It involves selecting samples at regular intervals or fixed distances that are equally spaced. This is a method of selecting a sample at fixed intervals from a population that is systematically arrange or in alphabetical order or in some natural sequence. It is employed when all the population or subjects are put in order, for example systematic sampling can be used in selecting 100 people out of 300.
Cluster Sampling: In cluster sampling, the population or habitat or field, is divided into smaller groups or clusters and some of these cluster are randomly selected to form the sample. In each cluster selected, every member would be taken to be part of the sample and included in the study. In cluster sampling, each cluster chosen must be a representative of the population. When the population to be sampled is vast and spread over a wide geographical areas a cluster sampling is used. In this type of sampling, the population is first divided into sample units or cluster. For example state within the country, local government area within the state etc.
DATA PRESENTATION
Raw data, that is data recorded directly from the laboratory or field in the record book is unorganized and because of that it make it difficult to comprehend and draw information from it. Therefore, raw data collected from experiment have to be properly organized, summarized and present. Presented and statistically analysed for any meaningful interpretation to be made out of it.
The first step in organizing data is the preparation of an ordered array. This explain arranging the data in an ascending order. An ordered array enables one to determine quickly the value of the smallest measurements and that of the highest as well as other facts or information that may be need from the given data. Data may be organized and summarized through the following processes.
Organize the data by putting them in tables such as frequency tables, and other forms of summary tables.
Illustrate the data using graphs such as histograms, bar charts line graphs etc.
Analyse the data using mathematical approach.
Once the data is summarized an appropriate statistical test can be carried out in order to draw conclusion about the investigation.
Consider the following example.
Example 1.0: Number of pods produced per plant by certain species of legume in Kadawa research station.
69 74 71 74 71 71
69 72 72 78 72 68
70 67 72 70 75 72
71 70 74 73 75 75
75 74 74 73 74 72
67 76 75 75 68 78
69 68 75 78 69 78
73 69 75 68 71 78
The fact that the above data is unorganized make it difficult to draw information from it, and so an ordered array can be prepared as follows: -
Table 1.0
67 67 68 68 68 68
69 69 69 69 69 70
70 70 71 71 71 71
71 72 72 72 72 72
72 73 73 73 74 74
74 74 74 74 75 75
75 75 75 75 75 75
76 78 78 78 78 78
From this ordered array one can easily find out the minimum number of pods/plant (67) or the maximum (78).
However an ordered array may not be adequate in offering all the information that are needed from the data obtained. Therefore further summary may be required and one way is by tabulating your data according to frequencies.
The Frequency Distribution
Classifying data according to frequencies enables further summary as well as facilitates the computation of various descriptive measures like the percentages, average and so on. To get the frequency of number, the first step is to get the range of the figures in the table and then you take a tally and record the score of each number that is the frequency of occurrence of each number. In example 1.0, the range is from 67 to 78 and the frequencies are obtained as followed:
Table 1.2: A tally of frequency of occurrence number of pods/plant
No. of Pods/plant
Tally
Frequency
67
68
69
70
71
72
73
74
75
76
77
78
II
III
IIII
IIII
IIII I
III
III
IIII I
IIII III
I
0
IIII
2
4
5
4
6
4
3
6
8
1
0
5
TOTAL
48
This frequency can then be presented in a table which is referred to as frequency distribution table.
Table 1.3: Frequency distribution of the number of pods produced plant
No. of Pod/Plant (x)
Frequency (f)
Relative frequency (%)
67
68
69
70
71
72
73
74
75
76
77
78
2
4
5
3
5
6
3
6
8
1
0
5
2/48 x 100 = 4.17%
4/48 x 100 = 8.33%
5/48 x 100 = 10.42%
3/48 x 100 = 6.25%
5/48 x 100 = 10.42%
6/48 x 100 = 12.50%
3/48 x 100 = 6.25%
6/48 x 100 = 12.50%
8/48 x 100 = 16.67%
1/48 x 100 = 2.08%
0/48 x 100 = 0.00%
5/48 x 100 = 10.42%
Total
48
100%
Cumulative Frequency
This is usually derived from the frequency distribution table by summing up the observed frequencies. It gives way of obtaining information regarding the frequency or relative frequency of value with two or more contiguous observation or group. From example 1.0 the following cumulative frequency distribution can be obtained.
Table 1.4: Cumulative frequency of number of pods/plant
No. of Pods/Plant (x)
Frequency (f)
Cumulative frequency
Cumulative relative frequency
67
68
69
70
71
72
73
74
75
76
77
78
2
4
5
3
5
6
3
6
8
1
0
5
2
6
11
14
19
25
28
34
42
43
43
48
4.1
12.5
22.9
29.2
39.6
52.0
58.3
70.8
87.5
89.6
89.6
100.0
THE FREQUENCY DISTRIBUTION FOR GROUP DATA
Another method of summarising a data is by classifying it into groups or classes from which the various descriptive measures can be easily calculated. To group data the first step is to select a class interval such that each value is placed in only one interval bearing in mind that there can be not less than 6 intervals and not more than 15. There are two methods of classification; exclusive which is best for discrete data and inclusive which is ideal for continuous data.
Example 2.0 the weight of all NCE II research method and biometry students are measured and the results is expressed to the nearest kilogram (kg) as follows: 100, 95, 85, 75, 80, 78, 90, 93, 109, 104, 88, ...... (the dots represent value not shown) assuming we have 100 such measurement. The measurement can be seen to range from lowest value of 75kg to a higher value of 109kg. If we collect together the value in groups such that each value is placed in only one class interval a frequency distribution.
Table 2.0: Frequency distribution of weight of NCE II Research method and Biometry students.
Class interval of weight of students (kg)
Frequency (f)
Class Mark
75 – 79
80 – 84
85 – 89
90 – 94
95 – 99
100 – 104
105 – 109
2
2
7
11
28
30
20
77
82
87
92
97
102
107
Total
100
Histograms: This is a graph of frequency or relative frequency distribution plotted in a form of rectangle or bars continuous data is best represented in the form of histogram.
Bar-chart: This represent the frequency distribution of describe data and because the data is discontinuous, bar and columns are form, and exclusive type of classification is used.
Example: Let us assume we have 37 observations on number of flowers produced per plant as follows: -
Table 3.0: Frequency distribution of number of flowers/plant
Class interval of no. of flowers/plant
Frequency (f)
Class Mark
1 – 5
6 – 10
11 – 15
16 – 20
21 – 25
26 – 30
3
5
8
4
7
10
3
8
13
18
23
28
Total
37
A bar chart can be drawn from the above illustration
Line Graphs: These changes in one variable in relation to another variable. In this case we have;
Y = vertical axis as dependent variable. Dependent variable is the one which is not under the control of the experimenter, it changes in response to changes in the independent variable.
X = horizontal axis, and is the independent variable. Independent variable is the variable which is under the control of the experimenter.
Example: Measurement on height of maize plant were taken and the data presented in the table below:
Table 4.0: Height of maize plant at different number of days after planting.
Time in days x (Independent variable)
Heights (cm) y (Dependent variable)
20
40
60
80
100
25
50
75
100
125
A line graphs of maize plants height could be drawn as follows:
CLASS ACTIVITY
The number of flowers produced per plants by rose plants in garden are as follows:
6 3 6 7 8 4 6 7 8 4
9 8 9 5 7 8 9 5 7 8
8 5 7 8 10 6 7 8 6 7
The data above can be organized by constructing
A frequency distribution table
Relative frequency distribution
MEASURES OF CENTRAL TENDENCY
They are also known as measures of location. There are three most commonly used measures and these are; mean, median and mode. These measurement provide a general idea of the position of the center of a distribution.
The Mean
The most common of the measures of central tendency is the average or in mathematics term, the arithmetic mean or simply the mean. This is achieve by adding up all the observations and dividing by the total number of measurements obtain; thus:
Mean = Sum of observation
Total number of observation
The mean is donated by X (x-bar). The general formula for the calculation of sample mean can be given as:
X =
Where X represents the variable
∑ = summation sign n = number of observation
∑ = x1 + x2, x3 ............... xn
Example 3.0: To compute the mean of the number pods produced plant of 48 plants represented in table 1.0, which is a set of unsummarised measurements, we simply add them all up and divide by the total number.
X = = 69 + 74 + 71 .................. 71 + 78
48
= = 72.438
X = 72.44
Median
To calculate the median, the data has to be arrange in an array, in an increasing order of size. When this is done then the median is the value which divide a set of values into two equal parts. When the numbers are odd, the median is the central or middle value, but when they are even, then it became mean of the two central figures. In others words median can be defined as:
n + 1
2
Therefore, if we have 15 observations, the median is the () = 8th ordered observation. If we have 16 observation, then it the () = 8.5 ordered observation and this implies that the value is half way between the 8th and the 9th observation. You therefore take the average of the 8th and 9th observation as your median.
Example 4.0: Taking the data presented in table 1.0 for illustration, since the value are already arranged in a sequence and it is an even number, the median is therefore the:
= = = 24.5th value
Therefore we take the two middle values which are 72 and 72.
The median therefore is = 72
The Mode
It is defined as the most commonly occurring value in a data. Thus if all the figures are different e.g. 5, 6, 11, 15, 8 then there is no mode. Alternatively it is possible to get more than one mode e.g. 5, 5, 6, 8, 8, 9; here 5 and 8 are the modes.
Example 5.0: The mode of the data presented in table 1.0 is 75. This is because it occurs most frequently (8 times).
MEASURES OF DISPERSION
The observation making up a set of data are scattered about the mean. This scatter or dispersion reveals the degree of variability present in a set of measurements. If all values are the same, then there is in dispersion in the data. If on the other hand, the value are not these then there is dispersion, and the amount of dispersion depend on spread of the figures, that is how close together or how scattered, they are:
The main measures of dispersion encountered in biological estimates include the followings:
Mean deviation
Variance
Standard deviation
The Mean Deviation
Dispersion can also be assessed by calculating deviation from the mean since the mean is a very useful measure of central tendency. The sum of the deviation from the mean is always zero, so one has to ignore the negative sign, since it is not used in Biology. The contributions an observation (x) to scatter is equivalent to its distance from the mean, i.e. X – X. Deviation from the mean (MD) can be calculated using this equation.
MD = ∑(X – X) Where X = observation
n X = mean
n = No. observation
Example 6.0: Let us select a range of 5 figures from the observation on number of pods produced/plant in table 1.0. These are 67, 70, 75, 78, 68. To calculate Mean Deviation (MD), we to first of all calculate the mean: as thus:
X = ∑X = 67 + 70 + 75 + 78 + 68 = 358
n 5 5 = 71.6
MD = (67 – 71.6) + (70 – 71.6) + (75 – 71.6) + (78 – 71.6) + (68 – 71.6)
5
= (-4.6) + (-1.6) + (3.4) + (6.4) + (-3.6)
5
Ignoring the minus signs
MD = 19.6 = 3.92
5
The Variance
As ready mentioned that the amount of variation in a given data depends on how close or scattered the values are from the mean. The closer they are to the mean the less the dispersion while the further they are from the mean, the greater the variation. Therefore it is common to measure dispersion relative to the scattered of the value about their mean. One measure through which this is achieved is through the computation of the VARIANCE. Variance is the sum of the squares (S2) of the deviations of the values from the means divided by the size of the sample (n) minus 1. If it is a population, you divide by the size of population (N). The procedure for the computation of variance is therefore, to compute the mean, then subtract the mean from each of the values, square the resulting differences, sum up the squared differences and divide by the size of the sample or population. The variance of a sample can be represented as follows:
2 = .........................................(1)
When you have a large data the computation of sum of squares from the above equation can be tedious, and so an alternative formula, which may be less familiar can be adopted. This is
2 = ......................................(2)
Example 7.0: Let us compute the variance of the sample discussed in example 6.0. we substitute the value in the first equation.
2 = (67 – 71.6)2 + (70 – 71.6)2 + (75 – 71.6) + (78 – 71.6)2 + (68 – 71.6)2
5 – 1
2 = (-4.6)2 + (-1.6)2 + (3.4)2 + (6.4)2 + (-3.6)2
4
2 = 89.2
4 = 22.3
The fact that n – 1 is used in the division instead of n as would have been expected is due to a concept known as degree of freedom (df).
The Standard Deviation
This is the positive square root of the variance and therefore has the same unit of the original measurement unlike the variance. Standard deviation of the sample is given as:
=
Or the alternative formula for a large data;
=
Example 8.0: To compute the standard deviation of the sample discussed in example 7.0. Since the variance (S2) of the same sample has already been calculate in example 7.0 and it is 22.3 it follows that the standard deviation (S) is.
=
ᵟ = 4.722
MEAN, VARIANCE AND STANDARD DEVIATION FOR A GROUPED DATA
You may wish to compute the mean and variance from a frequency distribution table. In this case you assume that all values falling in a particular class interval are placed at the mid-point of the class interval. To determine the mid-point of a class interval, the average of the lower and upper and limits of the class interval is calculated and that taken as the mid-point (X). It is good to prepare a worktable as follow for convenience and ease in calculation.
The Mean
To complete the mean, the mid-point (x) of each class is multiplied by its corresponding frequency (f) and divided by the sum of the frequencies (∑f). This is represented as follows:
X = = ....................................... for group data
MD = .................................. for group data
Variance for grouped data
2 =
Standard deviation for grouped data
=
Example 8.0: Let us compute the mean, mean deviation, variance and standard deviation of the weight of NCE II research Method and Biometry students presented in the frequency distribution table (2.0). The work table is shown as follows: -
Table 5.0: Workable for calculating the mean, mean deviation, variance and standard deviation from frequency data of table 2.0.
Class Interval
Frequency (f)
Mid-point (x)
Fx
x-x
(x-x)2
(x-x)2 f
(x-x)f
75 – 79
80 – 84
85 – 89
90 – 94
95 – 99
100 – 104
105 – 109
2
2
7
11
28
30
20
77
82
87
92
97
102
107
154
164
609
1012
2716
3060
2140
-21.5
-16.5
-11.5
-6.5
-1.5
3.5
8.5
462.3
272.3
32.3
42.3
2.3
12.3
72.3
924.6
544.6
926.1
465.3
64.4
369
1446
-43
-33
-80.5
-71.5
42.0
105
170
Total
=100
9855
4740
(x-x)f=545.0
1. X = = = 98.5
2. MD = = = 5.5
3. 2 = = = = 47.88
4. ᵟ = =
MEASURES OF RELATIONSHIP
The Chi-Squared Test
The statistical tests discussed in the previous sections are used to analyze differences between mean or to determine relation between two variable collected from every member of the population. The tests described are considered when the data collected is from continuous variables like height, weight, blood pressure age, yield e.t.c alternatively referred to as quantitative date but there are discrete date which cannot be measured but counter. In such, a ratio is found out known as the chi-squared (with the symbol x2) to goodness of fit. There are experiments that collect qualitative date, similarly such research seeks to determine association between two factors such as seedling vigor and leaf colour or eye colour and hair colour or flower pigmentation and seed germination and so on. In this case a contingency table is formed and chi-squared X2 is used to determine whether or not there is association between the factors. The X2 – test is also used to compare two proportions or percentages. It may be regarded as a non-parametric test as it does not require detailed information about the population from which the samples are drawn.
Test of Goodness of Fit
In situation where the basis of a hypothesis, which specifies the frequency of occurrence of an observation, it is tested, the test of goodness of fit is used. It is therefore frequently used in genetic studies. It is used to test whether the outcome of a breeding experiment in line with the predicted mendelian rations. In testing goodness of fit, the X2 determines how close the observes frequencies are to those expected on the basis of a hypothesis. It is calculated using the following expression;
X2 = ∑
The computed value is compared with the table value of X2 at n-1 degree of freedom. Goodness of fit test may involve one – way classification in which the frequencies fall into only two classes that are mutually exclusive and than is the simplest test. Or the test may also involve more than two classes.
Example 9.0: Test of goodness of fit with two classes.
If in determination of sex ratio among a sample species of birds, it was discovered that out of 500 birds, 378 were male. Does this sample conform to the one male: one female ratio, that is 1.1?
Solution
Sex
Male
Female
Total
Observed no. of bird
Expected no. of bird
O – E
X2 = ∑(O-E)2
378
250
128
65.5
122
250
-128
65.5
500
500
0
131.0
Therefore the calculated value (131.0) exceeded the table value at 500% and 1 d.f. (that is 3.481) and even at 1& on the same do (that is 6.635), the difference is highly significant. We therefore conclude that this sample does not confirm to the 1:1 ratio.
Example 10.0: Test of goodness of fit with more than two classes.
Suppose a cross-involving two cowpea varieties (smooth white and rough brown) show the ratio of 9:3:3:1, if a sample of 550 or observations give the following frequencies 305 smooth white (sw), 110 smooth brown (sb), 105 rough white (rw) and 30 rough brawn (br). Test whether this agrees with the given ratio.
Solution
If the given ratio applies, it follows that the expected frequencies in the four groups will be.
psw = 9/16 x 550; psb = 3/16 x 550; prw = 3/16 x 550; prb = 1/16 x 550 respectively, where p stands for probability. The sum of the expected frequencies must be the same as that of the observed frequencies the next step is to construct the table of observed and expected frequencies in the following way.
Phenotype category
1
sw
9
2
sb
3
3
rw
3
4
rb
1
Total
Observed frequency (O)
Expected frequency (E)
O – E
X2 = ∑(O-E)2
E
305
309.42
-4.42
0.06
110
103.14
6.86
0.46
105
103.14
1.86
0.03
30
34.38
-4.38
0.56
550
500
1.11
0 Comments