Forty five natural populations of Drosophila ananassae, collected from entire geo-climatic regions of the India were analyzed to determine the distribution of genetic diversity relative to different eco-geographic factors. Quantitative data on the frequencies of three cosmopolitan inversions in the sampled populations were utilized to deduce Nei’s gene diversity estimates. Populations were grouped according to the time of collection (years and month); collection-regions like coastal and mainland regions, and collection-seasons. Further, data was subjected to network analysis to detect community structure in the populations and Modularity analysis to quantify the strength in community structure. Gene-diversity statistics revealed the presence of significant variability in the Indian natural populations of D.ananassae. Off all the parameters used to group the populations, geographical attributes seems to have maximum, while the time of collection and seasons have minimum influence on the genetic variability in Indian natural populations of D.ananassae. The results clearly link the association of genetic variability with environmental heterogeneity, elucidating the role of environment specific natural selection. The homogenizing effects could be due to genetic hitchhiking and canalization.
Chromosomal polymorphism due to inversions is one of the best-studied systems in Drosophila population genetics (1-3). In natural populations of Drosophila, it is common and is an adaptive trait so subject to natural selection (4-6). There is genetic differentiation of inversion polymorphism in D. ananassae, which suggests that chromosomal polymorphism may be adaptively important in a widespread domestic species and populations may undergo evolutionary divergence as a consequence of their adaptation to varying environments (7-8). The idea is that the greater environmental diversity a population faces, the more inversions it can maintain due to diversifying selection (Ludwig effect).
D. ananassae is a cosmopolitan and human commensal species distributed in the tropical, subtropical and mildly temperate regions (Figure 1). The genome of D. ananassae harbors a large number of inversions in its natural populations (9). Out of these reported from various parts of the world, three paracentric inversions namely, Alpha (AL) in 2L, Delta (DE) in 3L and Eta (ET) in 3R show worldwide distribution hence named cosmopolitan inversions by Futch (1966) (10). Population genetics of chromosomal polymorphism in Indian natural populations of D.ananassae has been extensively studied (7-8, 11). The results have clearly shown that there is geographic differentiation of inversion polymorphism. D.ananassae populations provide a unique and interesting opportunity to know the patterns of distribution of genetic variability and its relation to various eco-geographical factors, which has not been examined earlier to the present study. Earlier to this study, quantitative differences in frequencies of chromosome arrangements from different eco-geographic regions were taken as an evidence for geographic differentiation of inversion polymorphism (7-8). In another study, Nei’s (1973) genetic identity estimates (12) were applied to reveal the distribution of genetic diversity by clustering populations according to state / province (13). In study by Singh and Singh (14), pairwise FST estimates and gene flow were calculated to deduce population sub-structuring and gene flow (14).
Drosophila ananassae, male and female.
In order to understand the community structure in natural of populations of D.ananassae, network analysis was employed. Network analysis is useful tool to understand the complex system, where the system can be represented as nodes and their mutual interaction by edges. Complex networks are generally partitioned into the classes. These classes contain sets of nodes such that edges within the class are highly dense as compared to the edges outside the class. These classes are known as modules or community. Community detection is a fundamental problem in the network analysis. It provides better understanding of the organizational structure of the network. A vast variety of real-world problems such as co-authorship groups, metabolic networks, web communities etc. has been identified using various community detection algorithms. In the present problem, locations are considered as nodes. An edge is placed between nodes if the difference between two inversions (e.g. alpha (AL)) is less than a predefined number. The lesser difference is an indicative of the similarity between the populations of two locations, which is represented by the existence of edge between two nodes in our network. We construct these networks for alpha (AL), delta (DE) and eta (ET) frequencies. After construction of networks, communities are identified. The communities aim to characterize the groups of similar populations (15-16). Unlike conventional techniques such as principal component analysis or Bayesian approach, networks do not deal with the problem of dimension reduction. Thus, network analysis serves as a robust tool to analyze the communities based on AL, DE and ET frequencies.
Present communication employs network analysis and community detection to the frequencies of three cosmopolitan inversions to determine i) the genetic diversity maintained in the species, ii) pattern of distribution of genetic variation within and among populations, and iii) more specifically, the role of geographical regions, time of collection and seasons in influencing genetic variation in populations of D.ananassae.
D.ananassae flies were collected from forty-five different eco-geographical localities of India that includes Jammu in north to Kanniyakumari in south and Dwarka in west to Deemapur in east, thus including the entire geo-climatic heterogeneity of diverse country like India (Figure 2 and Table 1). Details of collections with the geographical locations of the forty-five populations are given in Singh and Singh (13). In each case, flies were collected from fruit and vegetable markets by ‘net sweeping’ method.
Location wise distribution of AL, DE and ET.
Name of the locality | State | Time of collection | Number of females analysed |
---|---|---|---|
Jammu (JU) | Jammu & Kashmir | October, 06 | 130 |
Dharamshala (DH) | Himachal Pradesh | October, 06 | 46 |
Kangra (KG) | Himachal Pradesh | October, 06 | 65 |
Dehradun (DN) | Uttaranchal | October, 05 | 54 |
Haridwar (HD) | Uttaranchal | October, 05 | 45 |
Mansa Devi (MD) | Uttaranchal | October, 05 | 30 |
Gangtok (GT) | Sikkim | June, 06 | 34 |
Lucknow (LK) | Uttar Pradesh | August, 05 | 48 |
Guwahati (GU) | Assam | June, 06 | 101 |
Raidopur (RP) | Uttar Pradesh | September, 05 | 25 |
Chowk (CW) | Uttar Pradesh | September, 05 | 71 |
Deemapur (DM) | Nagaland | September, 06 | 211 |
Shillong (SH) | Meghalaya | June, 06 | 47 |
Patna (PN) | Bihar | October, 06 | 211 |
Allahabad (AB) | Uttar Pradesh | September, 05 | 51 |
Imphal (IM) | Manipur | September, 06 | 119 |
Gaya (GY) | Bihar | October, 06 | 79 |
Ujjain (UJ) | Madhya Pradesh | November, 05 | 30 |
Bhopal (BP) | Madhya Pradesh | November, 05 | 58 |
Indore (IN) | Madhya Pradesh | November, 05 | 101 |
Jamnagar (JM) | Gujarat | December, 05 | 52 |
Howrah (HW) | West Bengal | June, 05 | 35 |
Sealdah (SD) | West Bengal | June, 05 | 11 |
Kolkata (KL) | West Bengal | June, 05 | 61 |
Rajkot (RJ) | Gujarat | December, 05 | 52 |
Dwarka (DW) | Gujarat | December, 05 | 90 |
Ahmedabad (AD) | Gujarat | December, 05 | 21 |
Paradeep (PA) | Orissa | May, 05 | 33 |
Bhubneswar (BN) | Orissa | May, 05 | 09 |
Puri (PU) | Orissa | May, 05 | 16 |
Shirdi (SI) | Maharashtra | June, 06 | 103 |
Nashik (NA) | Maharashtra | June, 06 | 134 |
Mumbai (MU) | Maharashtra | January, 06 | 99 |
Visakhapatnam (VP) | Andhra Pradesh | June, 05 | 33 |
Vijayawada (VD) | Andhra Pradesh | June, 05 | 26 |
Panaji (PJ) | Goa | February, 06 | 33 |
Madgaon (MA) | Goa | February, 06 | 78 |
Gokarna (GK) | Karnataka | February, 06 | 80 |
Manglore (ML) | Karnataka | February, 06 | 118 |
Banglore (BL) | Karnataka | April, 05 | 36 |
Yesvantpur (YS) | Karnataka | April, 05 | 15 |
Pondicherry (PC) | Tamil Nadu | April, 05 | 21 |
Ernakulam (ER) | Kerala | April, 06 | 58 |
Thiruananthapuram (TR) | Kerala | April, 06 | 54 |
Kanniyakumari (KR) | Tamil Nadu | April, 6 | 56 |
Ethics Statement: Drosophila ananassae is a domestic species and is usually present in house hold stuffs like, fruits (Banana, Oranges, Lime), vegetables (Tomatoes) and fruit and vegetable markets and is not endangered or protected species. Therefore, no permission is required to collect Drosophila flies from any locations.
Lacto-aceto-orcein (2% w/v) stain is a saturated solution of 2 gm of orcein (Loba chemie) in 1:1 ratio of 45 % acetic acid (made from glacial acetic, Loba chemie) and lactic acid. To prepare orcein stain 2 gm of natural orcein is dissolved in 100 ml of 45 % acetic acid and heated (not boiled) with reflux condenser till dissolved completely. Stain is cooled for 2 hours by placing the beaker in the melting ice-cube water in the petri bottom. Undissolved material is decanted and filtered through Whatman filter paper. If stain is too concentrated, as indicated by black-looking chromosomes that fail to spread properly, it may be diluted further with 45 % acetic acid. Stock solution was made by mixing the solution (orcein in 45 % acetic acid) with lactic acid in 1:1 ratio.
To estimate inversion frequencies, wild females collected from natural populations were kept individually in fresh food vial and F1 larvae were squashed by lactoaceto-orcein method. For the best results flies were reared on yeasted media at 18°C in an uncrowded condition to get healthy third instar larvae. Polytene chromosomes were dissected from the salivary glands of third instar larvae in ringer’s saline. Fat bodies were removed and the glands were treated with fixative (45 percent acetic acid) for about thirty seconds followed by staining in 2% lacto-aceto-orcein for about 5 minutes. Glands were then washed in by 45% acetic acid. A cover slip was placed over the glands and squashing was done through the gentle thumb pressure. Slides were observed under 40X or 100X power in green filter of a stereozoom binocular microscope. Voltage was maintained at 180-200 V.
The quantitative data is based on the identification of the karyotypes of only one F1 larva from each wild female. Break points were determined by comparing with the standard map of polytene chromosomes of D.ananassae constructed by Ray-Chaudhuri and Jha (17). The frequencies of inversions in these populations, reported earlier, are depicted in table S1 and Figure 3 in the form of pie chart (13).
Distribution of population on year, region and season basis.
To study and quantify the effect of temporal, spatial and environmental factors like time of collection, geographical regions and seasons on the distribution of genetic diversity between populations of different groups and within a group, populations collected across India were grouped into five major groupings, i) Populations collected in a particular year, ii) Populations collected in a particular month, iii) Populations from major geographic regions, iv) Populations from coastal and mainland region and v) Populations collected in a particular season as given in Table 2 and Figure 2. The significance of difference between two group means is tested using t-test. Nei’s gene diversity statistics (HT, HS, GST) was applied to these groupings to arrive at the distribution of within and among population diversity (13).
Time of Collection | Number of populations | HT | HS | GST |
---|---|---|---|---|
I. Time of collections in (yrs.) | ||||
April, 2005- December, 2005 | 25 | 0.451 | 0.332 | 0.263 |
January, 2006- October, 2006 | 20 | 0.459 | 0.278 | 0.394 |
Mean | 0.455 | 0.305 | 0.329 | |
II. Time of collection (months) | ||||
April, 2005 | 3 | 0.486 | 0.411 | 0.154 |
May, 2005 | 3 | 0.481 | 0.334 | 0.305 |
June, 2005 | 5 | 0.471 | 0.347 | 0.263 |
August, 2005 | 1 | 0.415 | 0.275 | 0.337 |
September, 2005 | 3 | 0.389 | 0.306 | 0.213 |
October, 2005 | 3 | 0.438 | 0.360 | 0.178 |
November, 2005 | 3 | 0.455 | 0.350 | 0.230 |
December, 2005 | 4 | 0.437 | 0.255 | 0.416 |
January, 2006 | 1 | 0.463 | 0.248 | 0.464 |
February, 2006 | 4 | 0.475 | 0.327 | 0.311 |
April, 2006 | 3 | 0.478 | 0.338 | 0.292 |
June, 2006 | 5 | 0.472 | 0.237 | 0.497 |
September, 2006 | 2 | 0.487 | 0.304 | 0.375 |
October, 2006 | 5 | 0.409 | 0.288 | 0.295 |
Mean | 0.456 | 0.321 | 0.296 | |
III. Regions | ||||
North | 13 | 0.428 | 0.333 | 0.221 |
South | 12 | 0.478 | 0.343 | 0.282 |
East | 8 | 0.450 | 0.294 | 0.346 |
North-East | 5 | 0.487 | 0.275 | 0.435 |
West | 7 | 0.444 | 0.241 | 0.457 |
Mean | 0.457 | 0.297 | 0.350 | |
IV. Coastal vs Mainland regions | ||||
Coastal Region |
14 |
0.465 |
0.312 |
0.329 |
V. Seasons | ||||
Summer | 9 | 0.482 | 0.361 | 0.251 |
Monsoon | 16 | 0.454 | 0.295 | 0.350 |
Winter | 20 | 0.442 | 0.295 | 0.332 |
Mean | 0.459 | 0.317 | 0.309 |
Quantitative data on inversion frequencies in forty-five natural populations of Indian D.ananassae were utilized to arrive at Nei’s gene diversity estimates (13). The partitioning of genetic diversity into its components, within populations, between populations, between populations of a group and among groups, was accompanied by using Nei’s gene diversity statistics (HT, HS, GST) (13).
A network is constituted by considering locations as nodes and edges are placed on the basis of difference between the frequencies. The lesser difference between frequencies is an indicator of more similarity. After construction of networks, communities are identified. In this direction (18-20), a wide range of community detection algorithms such as hierarchical divisive algorithm, fast greedy modularity optimization approach, Markov clustering technique, Cfinder, structural algorithm, infomap, spectral algorithm are available. Comparative studies show that out of these methods infomap is more efficient than other community detection algorithms methods (18-20).
Infomap was developed by Rosvall & Bergstrom (21) on the basis of information theory. In this approach, the community composition is defined by two-level classification based on Huffman coding (22). At first step, communities are identified in the network, second part deals with differentiating nodes in a community. Finding best community structure in a network via infomap can be seen as optimal compressing of the information of random walk on the structure of the graph. In this way, original structure of the graph can be recovered by decoding the compressed information. The process of optimal compression can be carried out by minimizing the description length of random walk. This optimization criterion can be achieved by using greedy search in conjunction with simulated annealing.
In order to quantify the strength of modular structure of the network, modularity is determined. Mathematically, modularity is defined as:
Where if there is an edge from ith node to jth node and is kronecker Delta function. denotes the degree of ith node and m represents the total number of edges in the graph. In other words, modularity is difference of fraction of edges that fall inside the community and expected number of edges in the community.
To understand the equation, we start with summation. The summation is taken over all possible pairs of , i.e. we consider every element in . The Kronecker Delta function enable us to choose only those pairs such that . Thus, summation runs over only those edges whose endpoints are of the same kind. The term represents the observed fraction of edges between i and j. The degrees of vertices and are and . If edges are distributed at random on respecting these degrees, then the probability that and are connected is denoted by Thus, modularity can also be defined as the sum of the differences between the actual and expected fractions of edges for each pair of nodes. The value of modularity () lies between -1 to 1. High modularity networks posses more links between the nodes within the community whereas links are sparse between nodes in different communities. We have utilized R software and R-package ‘igraph’ for the Network-analysis and Modularity analysis.
Our results show the existence of genetic differentiation in Indian natural populations of D.ananassae, the major proportion of which is distributed among populations of different groups than within populations of the same group. Gene diversity analysis enabled us to investigate the pattern and magnitudes of this differentiation.
When populations were grouped by time of collection (in years), diversity among-populations in a particular collection year was HS= 0.305, while among-group (between two collection years) diversity was GST= 0.329. When populations were grouped by time of collection (in months), diversity among-populations in a particular collection month was HS= 0.321, while among-group (among different collection months) diversity was GST= 0.296.
When populations were grouped by regions, diversity among-populations in a particular region was HS= 0.297, while among-group (different regions) diversity was GST= 0.350. When populations were grouped into mainland and coastal regions, within-group diversity was HS= 0.309, while among group-diversity was GST= 0.323 and when populations were grouped according to seasons, the diversity among-populations in a particular season was HS= 0.317, while among-group (different seasons) diversity was GST= 0.309. Mean of overall diversity for each grouping, i.e. time of collection in year and month; regions like coastal and mainland regions and seasons was HT= 0.455, 0.456, 0.457, 0.457 and 0.459 respectively as given in Table 2.
Network analysis was done to understand the organizational structure in networks formed in the natural populations of D.ananassae. The network is constructed on the assumption that population of two locations is connected by an edge if the difference between their inversion frequencies (AL, DE and ET) is less than a fixed number (d). Networks are constructed for various values of d and it is observed that structure and number of communities are not influenced by minor change in d. Thus results pertaining to the major changes in structure and communities of network are facilitated here (Figure 3).
Figure 4A, shows the network constructed on the assumption that two nodes are connected if difference between frequency of AL inversion is less than or equal to five. Four communities were observed in this network. First community (red) contains Imphal, Jamnagar, Howarh, Sealdah, Kolkata, Rajkot, Paradip, Bhubaneswar Pune, Shirdi, Nashik, Mumbai, Madgaon, Manglore, Ernakulam, Thiruvananthapuram and Kanyakumari populations. Second community (green) contains Jammu, Dharamshala, Kangra, Dehradun, Mansa Devi, Lucknow, Raidopur, Allahbad, Ujjain, Indore, Vishakhapatnam, Vijaywada, Yesvantpur and Pondicherry populations. Third community (cyan) includes Gangtok, Guwahati, Deemapur, Shillong, Patna, Gaya, Dwaraka, Ahemdabad, Panji and Gokarna populations. This community mainly shows the locations of north eastern and eastern region, except for Ahemdabad, Panji and Gokarna. Surprisingly, Haridwar and Chowk populations are not connected to any other node except with each other forming the fourth community (purple), i.e. the difference between AL inversion of Haridwar and Chowk populations are less than or equal to five but their difference with any other location is more than five. Out of four communities, two communities do not have any external link. Similarly Figure 4B, shows a network constructed on the assumption that connected nodes have difference less than or equal to ten in their corresponding AL frequency. It can be easily observed that this network contain more edges than the previous network. Intuitively, larger difference will allow more locations to connect with each other. Third community of the previous network merges into the first community (red) in this network. Haridwar and Chowk populations retained the separate community status. However, they have external links to a new location, namely, Kangra. It also shows that, when the criterion of frequency difference between AL is increased to 10, the numbers of communities are decreased to 3. The first two communities in the previous networks merged into one community as the difference between locations remain less than or equal to ten. In Figure 4C, when the difference between frequency of AL is considered to be less than or equal to 15 as criteria for connection between locations, we are left with two communities. The number of edges in this network is increased to 507.
A. Community with AL Difference is 5. B. Community with AL Difference is 10. C. Community with AL Difference is 15.
Networks were also constructed for DE inversion considering the similar criteria as above. Figure 5A, shows the network constructed on the basis that two locations are connected if the difference between their corresponding DE frequency is less than or equal to five. In community analysis, this network is showing seven classes, i.e. network is partitioned into seven modules. First community (red) consists of Dharamshala, Imphal, Bhopal, Jamnagar, Howarh, Sealdah, Kolkata, Rajkot, Paradip, Puri and Vishakhapatnam populations. Jammu, Gangtok, Deemapur, Shillong, Allahbad, Gaya, Dwarka, Ahemdabad, Nashik and Shirdi populations constitute the second community (yellow). Third community contains Kangra, Dehradun, Haridwar, Mansa Devi, Ujjain, Indore, Bhubaneswar and Madgaon populations (green). Fourth community is formed by Lucknow, Guwahati, Raidopur, Chowk, Patna, Mumbai and Manglore populations. Vijaywada, Panji, Banglore, Yesvantpur and Pondicherry populations constitute fifth community (blue). Gokarana, Ernakulam and Thiruvanthapuram form the sixth community (purple). Kanyakumari population alone form the seventh community (pink). Figure 5B shows four partitions when difference in DE frequency is considered to be less than or equal to ten for network construction. First community (red) contains highest number of locations, namely, Jammu, Dharamshal, Gangtok, Lucknow, Guwahati, Raidopur, Chowk, Deemapur, Shillong, Patna, Allahabad, Imphal, Gaya, Bhopal, Jamnagar, Howrah, Sealdah, Rajkot, Dwaraka, Ahemdabad, Paradip, Puri, Shirdi, Nashik, Mumbai, Vishakhapatnam and Manglore. Second community (green) consists of Kangra, Dehradun, Haridwar, Mansa Devi, Ujjain, Indore, Kolkata, Bhubaneswar, Vijaywada, Panji, Madgaon, Banglore and Yesvantpur populations. Gokarna, Pondicherry, Ernakulam and Thiruvananthapuram populations constitute third community (cyan). The Kanyakumari population continues to form a singleton in the fourth community (purple). As with AL, network could not be constructed for a situation where difference in frequency of DE inversion is less than or equal to 15.
A. Community with DE Difference is 5. B. Community with DE Difference is 10.
For eta (ET) inversion, which happens to be the smallest and least frequent, only one criteria was employed to construct the network, i.e difference between their ET frequencies to be less than or equal to five. This gave four classes in the community analysis as shown in Figure 6. First class (red) contains Jammu, Lucknow, Raidopur, Chowk, Patna, Allahabad, Jamnagar, Sealdah, Rajkot, Dwaraka, Ahemdabad, Mumbai, Vishakhapatnam, Mansa Devi, Ujjain, Indore, Kolkata, Bhubaneswar, Panji, Madgaon, Yesvantpur, Gokarna, Ernakulam and Thiruvananthapuram populations. Second community includes Dharamshala, Bhopal, Howrah, Shirdi, Nashik, Manglore, Kangra, Dehradun and Haridwar populations. Third community (cyan) contains Deemapur, Shillong, Gaya, Paradip, Puri, Banglore, Pondicherry and Kanyakumari populations while fourth community consists of Gangtok, Guwahati, Imphal and Vijaywada populations.
Community with ET Difference is 5.
Since, long inversions possess higher probability of seizing favorable sets of alleles solely as they encapsulate more of genome, whereas loss of favorable content owing to double crossovers poses a limitation. On the other hand, shorter inversions are less capable of seizing favorable combination of alleles. However after capturing, they retain them more efficiently than longer inversions. Thus, communities constructed under AL inversions are more likely representation of population groups, whereas communities based on DE and ET are efficient in terms of retaining the favorable alleles.
Modularity analysis was done for the constructed networks as shown in Figure 3 to 5 and Table 3. The networks AL and DE at the frequency difference of 5 show more modular structure than others. Higher modularity shows more links in intra community nodes. The number of edges was found to increase as we increase the difference criteria between AL, DE and ET of the locations as given in Table 3. This is simply because higher difference criteria will allow more locations to connect with each other. This, in result will reduce the number of communities in the network.
Frequency | Difference | No. of Edges | Modularity |
---|---|---|---|
AL | 5 | 221 | 0.57 |
AL | 10 | 387 | 0.40 |
AL | 15 | 507 | 0.34 |
DE | 5 | 170 | 0.66 |
DE | 10 | 337 | 0.28 |
ET | 5 | 319 | 0.34 |
However, for DE only two (less than or equal to 5 and 10) and for ET only one (less than or equal to 5) difference criteria could be used as these inversions are relatively smaller and show lesser abundance when compared to AL. This is because, as the length of particular inversion increases, chance of capturing gene or gene blocks with favorable epistatic interaction proportionately increases thus increasing the adaptation of individual carrying it i.e. the selective advantage gained by the inversion increases with recombination distance between them (23-25).
In one of the largest spatio-temporal study done to date, D. ananassae flies were collected from different eco-geographical regions of India. Nei’s gene diversity statistics, network analysis and modularity were applied to the frequencies of chromosomal inversions to arrive at the estimates of genetic diversity; distribution of genetic variation, community structure analysis and strength of communities formed in the populations of D.ananassae. Analysis show spatial and temporal characteristics of inversion polymorphism in Indian natural populations of D.ananassae. Network analysis and modularity has allowed the quantification of degree of clustering of D.ananassae populations. Spatial association or clustering provides the measure of geographical ‘closeness’ or spatial proximity. Earlier to this study, D. ananassae populations have not been investigated anywhere on such an enormous scale both spatially and temporally with respect to chromosomal polymorphism. This is, despite the fact that these flies are domestic and cosmopolitan in distribution. The present study tries to fill the void, by doing the collections from different corners of the country thus including the whole range of geo-climatic heterogeneity. This study has adopted mathematical analysis like Community detection and modularity to corroborate the findings from genetical analysis (Nei’s gene diversity estimates).
When populations were grouped by geographic regions of India (East, West, North, North-east, South, coastal and mainland) the diversity (HS) among populations from a particular region comes lower than among-group diversity (GST) between populations from different regions as given in Table 2. This means that populations from northern region are genetically more identical as compared to populations from northeast, west or east. Similarly, populations from coastal region differ from the populations from mainland region suggesting the role of eco-geographical parameters on the patterns of inversion frequencies and distribution of genetic diversity. This supports the theory that populations from a particular region are adapted to the microhabitat/niche of that region and inversions being recombination-suppressors have evolved to safeguard the co-adapted gene complexes from undergoing recombination and disrupting the co-adapted gene complexes. In the earlier companion study, we have the similar finding where populations from the similar eco-geographic regions i.e. from the same state or province show more or less similar trend in inversion frequencies and the level of inversion heterozygosity (13). Nei’s genetic identity estimates also reveal that populations from similar eco-geographical regions are more identical compared to those belonging to different regions (13). Some populations irrespective of occupying similar regions show genetic dissimilarity from one another, which is contrary to the pattern under environment-specific selection. This might be due to the inter-habitat/niche differences or environmental heterogeneity to which populations are exposed since historical past. By and large, populations from the identical geographic regions and so grouped together show higher genetic similarity than populations of different groups. This strengthens the role of natural selection in geographical differentiation of inversion polymorphism as also shown by previous studies on Indian natural populations of D.melanogaster (26-27). This reaffirms the theory that Indian natural populations of D.ananassae show geographical differentiation with respect to the inversion polymorphism. Da Cunha and Dobzhansky (28) reported the similar phenomena in D. willistoni where levels of inversion polymorphism in populations are directly related to the diversity of the habitat occupied by the populations. In Drosophila, fitness related traits show geographical variations as an adaptive response of plasticity to the experienced environment (29-30). Many studies have empirically established the linkage or epistatic association of inversions with fitness related traits and hence impact of natural selection on genetic variation. Adaptive response to the ecological variations driven by ecological factors that vary to a great extent in the region leads to marked genetic diversity within the species (31). The association of genetic variation with environmental and geographical heterogeneity could be due to natural selection operating on chromosomal variability in D. ananassae. Natural selection may be involved in generating and maintaining the genetic differentiation among populations (32-34).
The development of polymorphism through natural selection is one of the way through which a population may improve its capacity to utilize the environment and survive through temporal changes of it. In natural populations of Drosophila, chromosomal polymorphism due to inversions is common and is an adaptive trait (4-6). A number of adaptive functions have been found to be associated with inversion polymorphism. Inversions have also been used to study geographical clines, temporal cycles, meiotic drive and natural selection (2-3). These are of interest because of their unique origin and also because of the fact that functional coadaptations are likely to occur within an inversion so that rearrangements might also be involved in an adaptive polymorphism (35). The inversion karyotypes may differ in certain components of fitness such as fecundity, viability, rate of development, fertility, hatchability and sexual activity (4).
Season wise (summer, monsoon and winter) grouping of populations led to almost equal mean values of HS and GST. However, individual results for monsoon season show higher GST compared to HS as given in Table 2. This indicates the overall effect of humidity and temperature on the genetic variability in populations through its effect on type of vegetation. On the other hand, D. ananassae populations undergo drastic reduction during seasonal extremes (summer and winter) (36). The first unambiguous indication that inversions were subject to strong selection came from studies of temporal shifts in the inversion frequencies. Dobzhansky and coworkers have also reported similar seasonal and altitude dependent fluctuations in the frequencies of various gene arrangements of D. pseudoobscura from a number of different localities (37-44), whereas no such changes were observed at other localities (45). Other species in which seasonal changes in inversion frequencies have been reported are D. melanogaster (46-48). Further, Indian localities are highly variable in latitude and altitude and there are significant seasonal variations moving from south to north (29-30).
When populations were grouped according to the time of collection (in years and months), the overall diversity (HT) came similar in both the cases. This suggests the small effect of collection time on the overall genetic variability. This could be due to geo-climatic homogeneity in a particular collection time or year resulting into similar patterns of variability among populations belonging to a particular region collected and sampled at the same time. This suggests that sampling could affect the overall pattern of genetic diversity (49). One very important comparison that comes out from this study is that diversity among populations in a particular collection year is lower (HS=0.305; also, overall diversity, HT=0.455) than diversity among populations in a particular collection month (HS=0.321; also, total diversity HT=0.456). This is reasonable as seasonal fluctuations are more or less constant over a year than in a single month, thus leading to more or less similar patterns of genetic variability across a particular collection year. This testifies the ‘local adaptations’ theory leading to ‘local selection’. The overall value of gene diversity estimates for the two temporal parameters corroborate these findings. We have the similar findings when populations were grouped according to regions (mainland and coastal regions), i.e. the diversity among sites in a particular region comes lower (HS=0.297, while HT= 0.457) than when populations were grouped into mainland and coastal regions (HS=0.309, while HT= 0.457). Here also, the geographical factors over a broad region remain more or less similar but these may vary over small coastal or mainland regions, thus, affecting the variability pattern accordingly. D. ananassae being cosmopolitan in distribution have a broad geographic range, which itself testifies to the ‘local adaptations’ across the environmental heterogeneity. Variation in the degree of inversion polymorphism can be accounted for by genetic and demographic factors (50-53).
Selection history hypothesis explain the increase in genetic variance in novel and stressful environment as such situations are not encountered in historical past (50-53). Stressful environment induces an increase in genetic and phenotypic variations in fitness related quantitative characters (54-55). Geographical gradients are of special interest in the climatic adaptation because the climate varies strongly with geographical variables (56-57). Several environmental factors may impact the physiology of individuals; temperature is thought to be one of the strongest, and thereby of great selective importance (56-57). Thermal variations are linked with changes in relative humidity along altitudinal and latitudinal gradients. Localities with higher elevations along the south-north transect have lower relative humidity and ambient temperature than many other localities. By contrast, in the Indian tropical peninsula, low altitude localities are characterized by ambient temperatures of 25-30ºC and high relative humidity. Thus, Drosophila species and ectothermic insects face locality specific environmental stress (58-59). Further, natural populations display geographic population sub-structure, which is due to the differences in alleles and genotype frequencies from one geographic region to other. In addition, natural habitats are typically patchy with favorable areas intermixed with unfavorable areas. When there is population subdivision, there is almost inevitably some genetic differentiation among the subpopulations, which is the acquisition of allele frequencies that differ among the subpopulations (60). D. ananassae occurs in highly structured population throughout its geographic range (14, 31, 61-62). Ecological and demographic factors may have significant consequences for the short and long term evolutionary dynamics of inversion polymorphism and the manner with which they co-evolve with the rest of the genome (63).
Nei diversity index does not address the problem of community structure of population collected from various locations. Similarity, based on the difference between AL, DE and ET is not an indicative of high dimensional data. Thus, widely used methods such as principal component analysis and Bayesian technique are not suitable for this problem. To this end, network analysis serves as a potential tool to analyze the structure of populations and identification of similar groups of population. The communities detected at various difference scale of AL, DE and ET tends to encompass the identical populations. Further, these communities are largely consists of sites located nearby to each other with few exceptions. These findings connotes to the impact of regional specific factors such as climatic conditions on the populations of Drosophila.
Larger values of d delineate the weak level of similarity among the populations. More edges can be seen in the networks generated over the larger values of d in AL, DE and ET and yields a small number of communities with more dense edges.
In one of the largest spatio-temporal study, gene-diversity statistics, Community structure analysis and Modularity have revealed the presence of significant variability in the Indian natural populations of D.ananassae. In most of the cases the among-group diversity i.e. distribution of diversity among populations with respect to different regions; different seasons; different collection time was more compared to within-group diversity. Among the parameters chosen to analyze the components of genetic diversity, geographical attributes seems to have maximum, while the time of collection and seasons have minimum influence on the genetic variability in Indian natural populations of D.ananassae. Genetic variability associated with environmental heterogeneity clearly reveals the role of environment specific natural selection; however, the homogenizing effects observed could be due to genetic hitchhiking and canalization (i.e. genetic differences are canalized via rigid polymorphic system that represses deviation from phenotype that is optimal in common selecting environment.
Pranveer Singh and Pankaj Narula contributed equally to the work. Author thanks Prof. B.N. Singh, Banaras Hindu University (BHU), India for providing lab facility and guidance and Prof. B. Charlesworth, University of Edinburgh, UK, for suggesting Nei’s gene diversity statistics for the study.
AL
Appha
delta
Eta
Inbreeding coefficient due to population subdivision
Total diversity
Subpopulation diversity
Subpopulation diversity relative to total diversity