This article discusses key issues related to the organization of cloud technologies. The main approaches to ensuring fault tolerance of cloud systems are described. The main types of clouds and the principles of organizing their work are discussed in detail. The features of methods for restoring the system after failures based on the preservation of process states are shown. The main problems and advantages of cloud computing are revealed.
Today cloud computing is becoming widespread. The main reason for the development of this technology is the ability to save resources (material, physical, etc.). More and more companies are using remote virtual desktops to organize the work of their employees. The use of cloud computing provides endless possibilities with native storage services and application virtualization. But at the same time, there are many questions related to the organization of uninterrupted operation and the implementation of fault tolerance of cloud computing, which determined the relevance of the topic of the selected article.
Cloud computing is a technology that provides remote access to firmware over the Internet or a local area network. For the deployment of most cloud infrastructures, data center servers are used, through the implementation of virtualization technology. This allows any user to work with a remote application without thinking about technological aspects. Therefore, the cloud can be understood as a single access to calculations from the user.
Working with cloud computing occurs in several stages. At the first stage, servers are rented from one of the companies providing this service, for example, from IT-GRAD, 1cloud, 3data, Amazon, Google. Then the user manages his rented servers via the Internet, paying only for their actual use for processing and storing data. In addition, it is possible to change the capacity of remote workstations, connect additional services for organizing security, and increase the leased space.
The use of cloud computing can significantly reduce the capital costs of building data centers, purchasing server and network equipment, as well as solving problems of ensuring continuity and availability. End users no longer need to spend huge amounts of money on creating their own servers and data centers. Now it is possible to automate tasks by purchasing ready-made packages: SaaS (renting IT applications), PaaS (developing new solutions based on cloud platforms), DaaS (renting a virtual workplace), IaaS (renting IT infrastructure).
The cloud infrastructure provides for the possibility of self-management and delegation of powers to organize secure access to the computing resources of all participants in virtual work:
- Cloud Broker (performance management intermediary between cloud providers and consumers);
- Cloud Consumer (service user);
- Cloud Provider (vendor of cloud services);
- Cloud Carrier (an intermediary between cloud providers and consumers, providing connectivity and transportation services);
- Cloud Auditor (a company or individual that independently evaluates cloud services).
The concept of cloud computing dates back to the 60s. XX century The evolution of information technology development has provided the most appropriate technical solutions for the most effective application. Today, cloud computing is viewed as the most promising strategic technology of the future, predicting the movement of most information technology to the cloud. There are several types of cloud organization.
Cloud Computing types
Cloud computing is implemented on the resources available to the company. The goal is to develop the company using fault tolerance technology. Management in this system can be carried out by internal specialists or an external provider.
This model assumes the joint use of cloud infrastructure by several organizations. It is characterized by the presence of general principles such as: mission, security requirements, policy, etc. Management is carried out by the organizations themselves, a third party or an external provider.
A hybrid cloud infrastructure is characterized by a combination of two or more clouds (private, communal or public) that remain unique entities. Combining rules are standardized. Portability of data and applications between such clouds is ensured.
A public cloud is an infrastructure that is accessible to a large group of consumers with no shared interests. The infrastructure is owned by the organization that sells the related cloud services / provides the cloud services.
Cloud computing technologies are beginning to conquer the worldwide market. Virtualization technologies are provided at such business sites as IT-GRAD. But domestic entrepreneurs are in no hurry to transfer their business to cloud services due to fear of losing confidential data. This technology is in greatest demand among IT entrepreneurs and various startups. This is due to the fact that there is no problem of purchasing expensive equipment and attracting a specialist to maintain it, because all information and programs are stored in the cloud. In addition, it makes it possible to have access to data anywhere there is an Internet connection.
To solve the problems of organizing a comfortable work in the cloud, the data stored in it is automatically distributed among several servers, which solves the problem of data loss. The user is provided with round-the-clock support of data centers, so the possibility of losing access to a remote workplace is minimized and even if the hardware becomes faulty, the distribution of capabilities will allow you to continue working. This determines the balance and fault tolerance of the system.
In modern cloud computing, one way to achieve fault tolerance is to use checkpoint-based process state persistence techniques. This method allows you to restore the state of processes in the event of a failure, while the processes exchange messages to monitor each other’s states.
Inconsistent persistence methods allow each process to create checkpoints on its own. At the same time, a global holistic structure is being formed. Fault tolerance in this model is based on the reliability of each computing node. The virtual machine executes the real-time application algorithm, after which the verification module is launched, which is responsible for the correct operation of the virtual machine. The module for checking the results of the virtual machine passes the results of the task execution to the time checking module, which checks the execution time against the set limits. The reliability assessment module, in turn, calculates and sets the reliability value for each virtual machine. The information is then passed to the decision-making engine, which selects the output of the node with the highest reliability, and a recovery checkpoint is created. System recovery involves rolling back to a checkpoint of a faulty process. The recovery of messages sent between the system failure and the checkpoint comes from a log that is stored in permanent memory and re-processed when transmitted to the user. Thus, the restored process regenerates copies of messages sent to failure.
Another way to organize resilient cloud computing is to use the message persistence method. The algorithm assumes writing each message to temporary memory asynchronously, without stopping the computation process. This allows you to recover the system after errors without redundant synchronous storage of messages in permanent memory.
After the failed process is restored, all messages will be sent again in any order. The virtual machine executes the real-time application algorithm, after which the verification module is launched, which is responsible for the correct operation of the virtual machine. The arbiter contains three modules: timing, reliability assessment, and decision making. Depending on the type of real-time application, the arbiter can be located on the cloud or user side, but, as a rule, it is located near sensors or control mechanisms.
The fundamental technology for cloud infrastructure is virtualization. It is she who forms the uncertainty of the cloud, allowing you to transfer functioning applications from one server to another, and without stopping the entire application. The cloud, as a rule, includes several nodes, each of which is located in different data centers, including those of other providers: companies offering cloud services may not have personal data centers and rent servers or racks from multiple hosting providers. Therefore, the components of the runtime environment can be different.
To prevent the loss of important information, the following are often used:
- RAID arrays;
- backup systems;
- duplication of information.
It should be noted that these methods increase the possibility of confidential information leakage due to the large number of copies.
Thus, in this work, the main aspects of organizing the balance in cloud computing, the problems of implementing fault tolerance of data centers were described. The article analyzes the application of cloud computing in real business, so it is very useful for both business startups and investors.
The increased interest in the most countries academic and business circles in the use of cloud computing will contribute to the development of new trends in the IT industry in our country.
Cloud computing has the following main advantages and benefits:
- availability and resiliency;
- use of remote terminals (savings on the purchase of expensive equipment);
- quick access to documents (storing documents in the cloud allows users to have access to data anytime, anywhere);
- resistance to data loss or theft of equipment (copies of data are automatically distributed across several servers);
- system reliability (data centers are managed by professional specialists who provide round-the-clock support).
Based on the analysis, the main ways of development of cloud technologies were predicted.