Build a resilient infrastructure
The tutorial discusses how to improve the fault tolerance of your service at the physical layer.
Fault tolerance — the property of a service to remain operational and continue to perform business tasks even in case of failure of individual components of the IT infrastructure.
The degree of service fault tolerance depends on the type of tasks performed. The need to improve fault tolerance arises when the damage from downtime exceeds the cost of ensuring uninterrupted operation. For example, the service provides uninterrupted access to important information, or the company itself directly depends on the uninterrupted operation of the IT infrastructure.
What Selectel provides
Selectel provides fault tolerance for your service at the server room level, where the racks with servers are located. To do this:
-
the server room receives power feeds, reserved at higher levels (Selectel transformer substations; industrial UPS including battery; emergency power supply — DGU), further, two independent power inputs are connected to each rack:
- for servers with one power supply unit, ATS (automatic backup input devices) are installed in the racks, to which two independent power inputs are connected; in case of disconnection of one input, electricity will continue to flow through the second one;
- For servers with two power supplies, they are connected to two independent socket blocks;
-
LAN and Internet network access is reserved at the aggregation switch level (and above), and:
- For off-the-shelf configuration servers, LAN access switches and Internet switches are installed in each rack;
- for some Chipcore Line servers there is no LAN connection, only Internet switches are installed in each rack;
- For servers of arbitrary configuration, you determine the redundancy of connection to the required network (local or Internet) yourself;
-
If technically possible, when ordering two or more servers, they are located in different racks. View the current location of the servers and send a request to move the server to another rack can be found in the control panel, section Servers and hardware → Serverstab Server locations.
Increase fault tolerance
Increasing the fault tolerance of the client service at the physical level can be achieved by reducing the number of probable points of failure. When building an IT infrastructure, probable points of failure can be:
- the server itself and its components (disks, power supplies, network interfaces, etc.);
- ATS (automatic standby input devices);
- Internet access switch and switching (copper and optical connection, transceivers, patch cords, etc.);
- LAN access switch and switching.
Increase resiliency at the rack level
To reduce the number of likely points of failure at the rack level. you can:
- choose servers of arbitrary and off-the-shelf configurations with server processors. In such servers, all components are optimized for high load and smooth operation and are less likely to fail;
- Use a server configuration with two or more power supplies. This method of power redundancy is more reliable because it does not have a single point of failure. Failure of one of the power supply units or ATS will not cause the server to shut down;
- for servers of arbitrary configuration to reserve connection to access switches of the required network (local or Internet) via MC-LAG;
- duplicate an existing server. In this case it is necessary to rack servers.
Increase fault tolerance of a client service consisting of multiple servers
To reduce the number of likely points of failure for your service hosted on multiple servers. you can:
- Distribute the load evenly across servers (for this purpose you can use fault-tolerant load balancer);
- rack servers;
- host servers in different poolahs and combine them with Selectel Global Router.