Can any body explain me about grid computing and suggest few web sites on the subject?

Question:

vamsi krishna

2007-04-14 04:01:57 UTC

Can any body explain me about grid computing and suggest few web sites on the subject

Three answers:

Ravinder C

2007-04-14 04:19:14 UTC

http://www.gridcomputing.com/

Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed "autonomous" resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements.

The concept of sharing distributed resources is not new. In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating "like a power company or water company."5 And in their 1968 article "The Computer as a Communications Device," J. C. R. Licklider and Robert W. Taylor anticipated Grid-like scenarios.6 Since the late 1960s, much work has been devoted to developing distributed systems, but with mixed success.

Now, however, a combination of technology trends and research advances makes it feasible to realize the Grid vision--to put in place a new international scientific infrastructure with tools that, together, can meet the challenging demands of 21st-century science. Indeed, major science communities now accept that Grid technology is important for their future. Numerous government-funded R&D projects are variously developing core technologies, deploying production Grids, and applying Grid technologies to challenging applications. (For a list of major Grid projects, see http://www.mcs.anl.gov/~foster/grid-projects.)

Technology trends

A useful metric for the rate of technological change is the average period during which speed or capacity doubles or, more or less equivalently, halves in price. For storage, networks, and computing power, these periods are around 12, 9, and 18 months, respectively. The different time constants associated with these three exponentials have significant implications.

The annual doubling of data storage capacity, as measured in bits per unit area, has already reduced the cost of a terabyte (1012 bytes) disk farm to less than $10 000. Anticipating that the trend will continue, the designers of major physics experiments are planning petabyte data archives. Scientists who create sequences of high-resolution simulations are also planning petabyte archives.

Such large data volumes demand more from our analysis capabilities. Dramatic improvements in microprocessor performance mean that the lowly desktop or laptop is now a powerful computational engine. Nevertheless, computer power is falling behind storage. By doubling "only" every 18 months or so, computer power takes five years to increase by a single order of magnitude. Assembling the computational resources needed for large-scale analysis at a single location is becoming infeasible.

The solution to these problems lies in dramatic changes taking place in networking. Spurred by such innovations as doping, which boosts the performance of optoelectronic devices, and by the demands of the Internet economy,7 the performance of wide area networks doubles every nine months or so; every five years it increases by two orders of magnitude. The NSFnet network, which connects the National Science Foundation supercomputer centers in the US, exemplifies this trend. In 1985, NSFnet's backbone operated at a then-unprecedented 56 Kb/s. This year, the centers will be connected by the 40 Gb/s TeraGrid network (http://www.teragrid.org)--an improvement of six orders of magnitude in 17 years.

The doubling of network performance relative to computer speed every 18 months has already changed how we think about and undertake collaboration. If, as expected, networks outpace computers at this rate, communication becomes essentially free. To exploit this bandwidth bounty, we must imagine new ways of working that are communication intensive, such as pooling computational resources, streaming large amounts of data from databases or instruments to remote computers, linking sensors with each other and with computers and archives, and connecting people, computing, and storage in collaborative environments that avoid the need for costly travel.8

If communication is unlimited and free, then we are not restricted to using local resources to solve problems. When running a colleague's simulation code, I do not need to install the code locally. Instead, I can run it remotely on my colleague's computer. When applying the code to datasets maintained at other locations, I do not need to get copies of those datasets myself (not so long ago, I would have requested tapes). Instead, I can have the remote code access those datasets directly. If I wish to repeat the analysis many hundreds of times on different datasets, I can call on the collective computing power of my research collaboration or buy the power from a provider. And when I obtain interesting results, my geographically dispersed colleagues and I can look at and discuss large output datasets by using sophisticated collaboration and visualization tools.

Although these scenarios vary considerably in their complexity, they share a common thread. In each case, I use remote resources to do things that I cannot do easily at home. High-speed networks are often necessary for such remote resource use, but they are far from sufficient. Remote resources are typically owned by others, exist within different administrative domains, run different software, and are subject to different security and access control policies.

Actually using remote resources involves several steps. First, I must discover that they exist. Next, I must negotiate access to them (to be practical, this step cannot involve using the telephone!). Then, I have to configure my hardware and software to use the resources effectively. And I must do all these things without compromising my own security or the security of the remote resources that I make use of, some of which I may have to pay for.

Implementing these steps requires uniform mechanisms for such critical tasks as creating and managing services on remote computers, supporting single sign-on to distributed resources, transferring large datasets at high speeds, forming large distributed virtual communities, and maintaining information about the existence, state, and usage policies of community resources.

Today's Internet and Web technologies address basic communication requirements, but not the tasks just outlined. Providing the infrastructure and tools that make large-scale, secure resource sharing possible and straightforward is the Grid's raison d'être.

Infrastructure and tools

An infrastructure is a technology that we can take for granted when performing our activities. The road system enables us to travel by car; the international banking system allows us to transfer funds across borders; and the Internet allows us to communicate with virtually any electronic device.

To be useful, an infrastructure technology must be broadly deployed, which means, in turn, that it must be simple, extraordinarily valuable, or both. A good example is the set of protocols that must be implemented within a device to allow Internet access. The set is so small that people have constructed matchbox-sized Web servers. A Grid infrastructure needs to provide more functionality than the Internet on which it rests, but it must also remain simple. And of course, the need remains for supporting the resources that power the Grid, such as high-speed data movement, caching of large datasets, and on-demand access to computing.

Tools make use of infrastructure services. Internet and Web tools include browsers for accessing remote Web sites, e-mail programs for handling electronic messages, and search engines for locating Web pages. Grid tools are concerned with resource discovery, data management, scheduling of computation, security, and so forth.

But the Grid goes beyond sharing and distributing data and computing resources. For the scientist, the Grid offers new and more powerful ways of working, as the following examples illustrate:

Science portals. We are accustomed to climbing a steep learning curve when installing and using a new software package. Science portals make advanced problem-solving methods easier to use by invoking sophisticated packages remotely from Web browsers or other simple, easily downloaded "thin clients." The packages themselves can also run remotely on suitable computers within a Grid. Such portals are currently being developed in biology, fusion, computational chemistry, and other disciplines.

Distributed computing. High-speed workstations and networks can yoke together an organization's PCs to form a substantial computational resource. Entropia Inc's FightAIDSAtHome system harnesses more than 30 000 computers to analyze AIDS drug candidates. And in 2001, mathematicians across the US and Italy pooled their computational resources to solve a particular instance, dubbed "Nug30," of an optimization problem. For a week, the collaboration brought an average of 630--and a maximum of 1006--computers to bear on Nug30, delivering a total of 42 000 CPU-days. Future improvements in network performance and Grid technologies will increase the range of problems that aggregated computing resources can tackle.

Large-scale data analysis. Many interesting scientific problems require the analysis of large amounts of data. For such problems, harnessing distributed computing and storage resources is clearly of great value. Furthermore, the natural parallelism inherent in many data analysis procedures makes it feasible to use distributed resources efficiently. For example, the analysis of the many petabytes of data to be produced by the LHC and other future high-energy physics experiments will require the marshalling of tens of thousands of processors and hundreds of terabytes of disk space for holding intermediate results. For various technical and political reasons, assembling these resources at a single location appears impractical. Yet the collective institutional and national resources of the hundreds of institutions participating in those experiments can provide these resources. These communities can, furthermore, share more than just computers and storage. They can also share analysis procedures and computational results.

Computer-in-the-loop instrumentation. Scientific instruments such as telescopes, synchrotrons, and electron microscopes generate raw data streams that are archived for subsequent batch processing. But quasi-real-time analysis can greatly enhance an instrument's capabilities. For example, consider an astronomer studying solar flares with a radio telescope array. The deconvolution and analysis algorithms used to process the data and detect flares are computationally demanding. Running the algorithms continuously would be inefficient for studying flares that are brief and sporadic. But if the astronomer could call on substantial computing resources (and sophisticated software) in an on-demand fashion, he or she could use automated detection techniques to zoom in on solar flares as they occurred.

Collaborative work. Researchers often want to aggregate not only data and computing power, but also human expertise. Collaborative problem formulation, data analysis, and the like are important Grid applications. For example, an astrophysicist who has performed a large, multiterabyte simulation might want colleagues around the world to visualize the results in the same way and at the same time so that the group can discuss the results in real time.

Real Grid applications will frequently contain aspects of several of these--and other--scenarios. For example, our radio astronomer might also want to look for similar events in an international archive, discuss results with colleagues during a run, and invoke distributed computing runs to evaluate alternative algorithms.

Grid architecture

Close to a decade of focused R&D and experimentation has produced considerable consensus on the requirements and architecture of Grid technology (see box 1 above for the early history of the Grid). Standard protocols, which define the content and sequence of message exchanges used to request remote operations, have emerged as an important and essential means of achieving the interoperability that Grid systems depend on. Also essential are standard application programming interfaces (APIs), which define standard interfaces to code libraries and facilitate the construction of Grid components by allowing code components to be reused.

Figure 2

As figure 2 shows schematically, protocols and APIs can be categorized according to the role they play in a Grid system. At the lowest level, the fabric, we have the physical devices or resources that Grid users want to share and access, including computers, storage systems, catalogs, networks, and various forms of sensors.

Above the fabric are the connectivity and resource layers. The protocols in these layers must be implemented everywhere and, therefore, must be relatively small in number. The connectivity layer contains the core communication and authentication protocols required for Grid-specific network transactions. Communication protocols enable the exchange of data between resources, whereas authentication protocols build on communication services to provide cryptographically secure mechanisms for verifying the identity of users and resources.

The resource layer contains protocols that exploit communication and authentication protocols to enable the secure initiation, monitoring, and control of resource-sharing operations. Running the same program on different computer systems depends on resource-layer protocols. The Globus Toolkit (which is described in box 2) is a commonly used source of connectivity and resource protocols and APIs.

The collective layer contains protocols, services, and APIs that implement interactions across collections of resources. Because they combine and exploit components from the relatively narrower resource and connectivity layers, the components of the collective layer can implement a wide variety of tasks without requiring new resource-layer components. Examples of collective services include directory and brokering services for resource discovery and allocation; monitoring and diagnostic services; data replication services; and membership and policy services for keeping track of who in a community is allowed to access resources.

At the top of any Grid system are the user applications, which are constructed in terms of, and call on, the components in any other layer. For example, a high-energy physics analysis application that needs to execute several thousands of independent tasks, each taking as input some set of files containing events, might proceed by

obtaining necessary authentication credentials (connectivity layer protocols)

querying an information system and replica catalog to determine availability of computers, storage systems, and networks, and the location of required input files (collective services)

submitting requests to appropriate computers, storage systems, and networks to initiate computations, move data, and so forth (resource protocols) and

monitoring the progress of the various computations and data transfers, notifying the user when all are completed, and detecting and responding to failure conditions (resource protocols).

Many of these functions can be carried out by tools that automate the more complex tasks. The University of Wisconsin's Condor-G system (http://www.cs.wisc.edu/condor) is an example of a powerful, full-featured task broker.

Authentication, authorization, and policy

Authentication, authorization, and policy are among the most challenging issues in Grids. Traditional security technologies are concerned primarily with securing the interactions between clients and servers. In such interactions, a client (that is, a user) and a server need to mutually authenticate (that is, verify) each other's identity, while the server needs to determine whether to authorize requests issued by the client. Sophisticated technologies have been developed for performing these basic operations and for guarding against and detecting various forms of attack. We use the technologies whenever we visit e-commerce Web sites such as Amazon to buy products online.

Figure 3

In Grid environments, the situation is more complex. The distinction between client and server tends to disappear, because an individual resource can act as a server one moment (as it receives a request) and as a client at another (as it issues requests to other resources). For example, when I request that a simulation code be run on a colleague's computer, I am the client and the computer is a server. But a few moments later, that same code and computer act as a client, as they issue requests--on my behalf--to other computers to access input datasets and to run subsidiary computations. Managing that kind of transaction turns out to have a number of interesting requirements, such as

Single sign-on. A single computation may entail access to many resources, but requiring a user to reauthenticate on each occasion (by, for example, typing in a password) is impractical and generally unacceptable. Instead, a user should be able to authenticate once and then assign to the computation the right to operate on his or her behalf, typically for a specified period. This capability is achieved through the creation of a proxy credential. In figure 3, the program run by the user (the user proxy) uses a proxy credential to authenticate at two different sites. These services handle requests to create new processes.

Mapping to local security mechanisms. Different sites may use different local security solutions, such as Kerberos and Unix as depicted in figure 3. A Grid security infrastructure needs to map to these local solutions at each site, so that local operations can proceed with appropriate privileges. In figure 3, processes execute under a local ID and, at site A, are assigned a Kerberos "ticket," a credential used by the Kerberos authentication system to keep track of requests.

Delegation. The creation of a proxy credential is a form of delegation, an operation of fundamental importance in Grid environments.9 A computation that spans many resources creates subcomputations (subsidiary computations) that may themselves generate requests to other resources and services, perhaps creating additional subcomputations, and so on. In figure 3, the two subcomputations created at sites A and B both communicate with each other and access files at site C. Authentication operations--and hence further delegated credentials--are involved at each stage, as resources determine whether to grant requests and computations determine whether resources are trustworthy. The further these delegated credentials are disseminated, the greater the risk that they will be acquired and misused by an adversary. These delegation operations and the credentials that enable them must be carefully managed.

Community authorization and policy. In a large community, the policies that govern who can use which resources for what purpose cannot be based directly on individual identity. It is infeasible for each resource to keep track of community membership and privileges. Instead, resources (and users) need to be able to express policies in terms of other criteria, such as group membership, which can be identified with a cryptographic credential issued by a trusted third party. In the scenario depicted in figure 3, the file server at site C must know explicitly whether the user is allowed to access a particular file. A community authorization system allows this policy decision to be delegated to a community representative.

Current status and future directions

As the Grid matures, standard technologies are emerging for basic Grid operations. In particular, the community-based, open-source Globus Toolkit (see box 2) is being applied by most major Grid projects. The business world has also begun to investigate Grid applications (see box 3 on page 46). By late 2001, 12 companies had announced support for the Globus Toolkit.

Figure 4

Progress has also been made on organizational fronts. With more than 1000 people on its mailing lists, the Global Grid Forum (http://www.gridforum.org) is a significant force for setting standards and community development. Its thrice-yearly meetings attract hundreds of attendees from some 200 organizations. The International Virtual Data Grid Laboratory is being established as an international Grid system (figure 4).

It is commonly observed that people overestimate the short-term impact of change but underestimate long-term effects.10 It will surely take longer than some expect before Grid concepts and technologies transform the practice of science, engineering, and business, but the combination of exponential technology trends and R&D advances noted in this article are real and will ultimately have dramatic impacts.

In a future in which computing, storage, and software are no longer objects that we possess, but utilities to which we subscribe, the most successful scientific communities are likely to be those that succeed in assembling and making effective use of appropriate Grid infrastructures and thus accelerating the development and adoption of new problem solving-methods within their discipline.

2016-12-20 19:31:29 UTC

Limewire. i've got been employing it for a whilst and it truly is been very effective. in case you opt for for to place the songs you download from Limewire into iTunes or living house windows media participant, you merely click on and drag them from the library as quickly as youve achieved downloading them. And additionally it unfastened!!! wish that helps!!!!

anonymous

2007-04-17 06:41:21 UTC

http://www.gridcomputing.com/

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.

about - legalese