Skip to main content

Build a fault-tolerant infrastructure

Last update:

This instruction describes how to increase the fault tolerance of your service at the physical level.

Fault tolerance is the ability of a service to remain operational and continue performing business tasks even if individual components of the IT infrastructure fail.

The level of service fault tolerance depends on the type of tasks being performed. The need to increase fault tolerance arises when the cost of downtime exceeds the cost of ensuring uninterrupted operation. For example, the service provides continuous access to important information, or the company's own operations directly depend on the uninterrupted operation of the IT infrastructure.

What Selectel provides

Selectel ensures the fault tolerance of your service at the level of the server room where server racks are located.

Power supply

Power supply inputs enter the server room, reserved at higher levels (Selectel transformer substations; industrial UPS units, including batteries; emergency power sources — diesel generator sets), and then two independent power inputs are supplied to each rack:

  • for servers with a single power supply, ATS (automatic transfer switch) devices are installed in the racks, to which two independent power inputs are connected; if one input is disconnected, power will continue to be supplied through the second;

  • for servers with two power supplies, they are connected to two independent power strip blocks.

Local and Internet network access

Access to the local and Internet network is reserved at the aggregation switch level and above, as well as:

  • for prebuilt configuration servers, access switches for the local network and Internet switches are installed in each rack;

  • for some Chipcore Line servers there is no local network connection, only Internet switches are installed in each rack;

  • for custom configuration servers, you determine the redundancy of the connection to the required network (local or Internet) yourself.

Location in racks

If technically feasible, when ordering two or more servers, they are placed in different racks. You can view the current server location and send a request to move the server to another rack in the Control panel: in the top menu, click Products and select Dedicated ServersServers → the Server location **** tab.

Increase fault tolerance

You can increase the fault tolerance of a client service at the physical level by reducing the number of potential points of failure. When building an IT infrastructure, potential points of failure can be:

  • the server itself and its components (drives, power supplies, network interfaces, etc.);
  • ATS (automatic transfer switches);
  • internet access switch and switching (copper and optical connections, transceivers, patch cords, etc.);
  • local network access switch and switching.

Increase fault tolerance at the rack level

To reduce the number of potential points of failure at the rack level, you can:

  • choose servers of custom and prebuilt configurations with server-grade processors. All components in such servers are optimized for high loads and smooth operation, making them less likely to fail;
  • use a server configuration with two or more power supplies. This power redundancy method is more reliable as it eliminates a single point of failure. Failure of one power supply or ATS will not lead to server shutdown;
  • for custom configuration servers, reserve the connection to the required network (local or Internet) access switches via MC-LAG;

Increase fault tolerance of a client service consisting of multiple servers

To reduce the number of potential points of failure for your service hosted on multiple servers, you can: