Crop Genebank Knowledge Base

Genebank management strategies & principles

Characterization standards

Contact person for characterization standards: Hari D Upadhyaya, ICRISAT, India

Contributors to this page: ICRISAT, Patancheru, India (Hari D Upadhyaya, Shivali Sharma); Bioversity International, Montpellier (Elizabeth Arnaud); Bioversity International, Rome (Adriana Alercia); CIMMYT, Mexico (Suketoshi Taba); CIP, Peru (David Tay); ICARDA, Syria (Kenneth Street); IRRI, Los Baños, Philippines (Ruaraidh Sackville Hamilton) and crop experts.
External reviewer: Murari Singh.

Introduction

Plant genetic resources cover landraces, obsolete varieties and wild species, and provide basic materials to the crop experts to use genetic variability for the development of high yielding cultivars with a broad genetic base. However, the utilization of these genetic resources depends upon their efficient and adequate characterization and evaluation, which requires efficient characterization standards and appropriate strategies.

Characterization and evaluation document the diversity in descriptive traits which vary with the species. In order to facilitate standardization of information obtained during characterization, Bioversity International has been coordinating the development, publication and update of various plant descriptor lists in close cooperation with crop experts and genebank curators (see also the crop descriptors on the Bioversity website by clicking here). There are descriptor lists developed for more than 90 crops. Guidelines are also available for developing crop descriptor lists. Characterization is also increasingly done using complementary characterization methods to capture the full information on a broad range of traits.

Phenotypic characterization strategies and standards should be reviewed at regular intervals to determine their value and usefulness in characterization. This is best done as a collaborative process with collaborators and crop experts:

Establish a task force of crop experts for the crop.
Share existing descriptor lists and guidelines for developing descriptors with the task force.
Obtain opinions and feedback from the task force members and collaborators on the descriptors and their usefulness and propose changes.
Modify the descriptors' state and stage of recording data, based upon the responses.

Based on this process, modified descriptor lists were developed by crop experts and are available for chickpea, rice, maize, potato, Musa, pigeonpea, sorghum and sweet sorghum and are open for further discussion.

Analysis of diversity data

Analysis of trait data generated from characterization and evaluation studies is used to understand and use diversity. A large number of distance measures are available for analyzing similarity/dissimilarity among accessions based on different traits representing different types of variables, and the selection of the most appropriate distance measure for each trait is the prerequisite for diversity analysis studies. One of the approaches is to form clusters where accessions between clusters would be more diverse than the accessions within a cluster. The clustering algorithms require a distance/similarity matrix between the accessions which can be calculated depending upon the nature or type of traits such as morphological and agronomic traits and/or molecular markers. For more information click here.

Types of data

A. Morphological traits: Data recorded on morphological traits, such as flower colour, pigmentation, seed colour etc. represent discrete or categorical variables and can be grouped as:
a. Binary: presence or absence of a characteristic.
b. Nominal: colour or shape of a trait.
c. Ordinal: a visual scale arranged to represent the intensity of a trait.

B. Agronomic traits: Data recorded on agronomic traits such as plant height, 100-grain weight, yield per plant, etc. represent continuous variables.

C. Molecular marker: The data on molecular markers is recorded in the following two forms:
a. Binary data: presence or absence of molecular marker bands.
b. Allelic data (i.e. on allele size).

Strategy for data analysis

Each distance measure has its own properties and assumptions.
The genetical context and mathematical properties of similarity/dissimilarity measures should be given importance when choosing a measure.
Different distance measures provided different estimates of mean, minimum and maximum diversity.
Ward’s method: Useful in clustering accessions for morphological and agronomic traits.
Different distance measures resulted in different number of clusters for different traits, however, a relatively higher number of accessions tend to cluster together even when different matrices/methods were used.

A Helpdesk is available at http://220.227.242.211:9905/ to facilitate system-wide common procedures for diversity analysis across genebanks. This Helpdesk also serves the genebank community globally by providing basic information on various aspects of diversity analysis, such as selection of appropriate similarity/dissimilarity matrices, cluster analysis, analyzing diversity using individual trait and/or combination of traits.

Suggested analyses for trait types

Traits	Distance measure	Remarks
Morphological traits	Simple matching	Only studied distance measure for nominal traits.
Agronomic traits	Euclidean	Identified the same pair of accessions exhibiting minimum diversity but different pair of genotypes exhibiting maximum diversity.
	Manhattan
	D² Mahalanobis	Takes into account the correlations of the datasets and is scale-invariant i.e. not dependent on the scale of measurement.
Molecular markers
Allelic data	Simple matching	The mean, as well as range, of diversity was reduced, so could not discriminate the pair of accessions exhibiting maximum diversity.
	Euclidean	Identified the same pair of accessions exhibiting minimum and maximum diversity.
	Roger’s
	Chord 67	Assumption: mutation rate is small and variation in selection pressure is rapid and haphazard i.e. no constant direction in allele frequency changes, which is not fulfilled in seed banks and plant breeding material that have evolved due to directed selection pressure rather than rapid and haphazard changes.
	Chord 69
Binary data	Dice	Identified the same pair of accessions having minimum and maximum diversity.
	Jaccard	Jaccard is the most appropriate when the purpose of measure of similarity/dissimilarity is to indicate how similar/different the objects are with respect to attributes present (coded as 1) and to ignore the impact of attributes absent (0).
	Simple matching	Based on the assumption that all shared bands (both presence and absences) are taken into account irrespective of the reason why bands are absent.
Combination of traits
Morphological (nominal) + Agronomic traits (continuous)	Gower’s distance	Simultaneous use of variables of different scales of measurement (nominal, continuous, binary) in the estimation of similarity/dissimilarity has the ability to accommodate mixed data types and, due to its metric qualities and flexibility, it can be modified to include negative matches in the estimation of similarity by simply modifying the binary weighting system.
Morphological (nominal) + molecular data (binary)	Gower’s distance
Agronomic (continuous) + molecular (binary)	Gower’s distance
Morphological (nominal) + agronomic (continuous) + molecular data (binary)	Gower’s distance
Cluster analysis
Ward’s minimum variance method		Found more useful in chickpea as it grouped the genotypes into defined clusters.
UPGMA (Unweighted Pair Group Method using Arithmetic averages)		In chickpea, genotypes were not grouped into clusters.

References and further reading

Cavalli-Sforza LL, Edwards AWF. 1967. Phylogenetic analysis: Models and estimation procedures. American Journal of Human Genetics 19:233–257.

Dice LR. 1945. Measures of the amount of ecologic association between species. Ecology 26:297–302.

Gower JC. 1971. A general coefficient of similarity and some of its properties. Biometrics 27:857-874.

Hair JR, Anderson RE, Tatham RL, Black WC. 1995. Multivariate data analysis with readings. 4th Edition, Prentice Hall, Englewood Cliffs, NJ.

Jaccard P. 1908. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaud. Sci. Nat. 44:223–270.

Malecot G. 1948. Les Mathématiques de l'Hérédite. Masson et Cie, Paris.

Mohammadi SA, Prasanna BM. 2003. Analysis of genetic diversity in crop plants — Salient statistical tools and considerations. Crop Science 43:1235–1248.

Payne RW. 2009.The Guide to GenStat® Release 12, Part 2: Statistics. VSN International, 5 The Waterhouse, Waterhouse Street, Hemel Hempstead, Hertfordshire HP1 1ES, UK.

Reif JC, Melchinger AE, Frisch M. 2005. Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Science 45:1-7.

Rogers JS. 1972. Measures of genetic similarity and genetic distance. p.145–153. In Studies in genetics VII. Publ. 7213. Univ. of Texas, Austin, USA.

Sneath PHA, Sokal RR. 1973. Numerical taxonomy. Freeman, San Francisco, CA, USA.

Ward JH. Jr. 1963. Hierarchical grouping to optimize an objective function. J. Am. Statist. Assoc. 58:236-244.

Upadhyaya HD, Sarma NDRK, Ravishankar CR, Albrecht T, Narasimhudu Y, Singh SK, Varshney SK, Reddy VG, Singh S, Dwivedi SL, Wanyera N, Oduori COA, Mgonja MA, Kisandu DB, Parzies HK, Gowda CLL. 2010. Developing mini core collection in finger millet using multilocation data. Crop Science (Accepted).