Skip to main content
Build a resilient infrastructure
Last update:

Build a resilient infrastructure

The tutorial discusses how to improve the fault tolerance of your service at the physical layer.

Fault tolerance — the property of a service to remain operational and continue to perform business tasks even in the event of failure of individual components of the IT infrastructure.

The degree of service fault tolerance depends on the type of tasks performed. The need to improve fault tolerance arises when the damage from downtime exceeds the cost of business continuity. For example, the service provides continuous access to important information, or the work of the company itself directly depends on the smooth operation of the IT infrastructure.

What Selectel provides

Selectel provides fault tolerance for your service at the server room level, where the server racks are located. To do this:

  • The server room is supplied with power inputs, reserved at higher levels (Selectel transformer stations; industrial UPS including batteries; emergency power supply — DGU), then two independent power inputs are supplied to each rack:

    • For servers with one power supply unit, ATS (automatic backup input devices) are installed in the racks, to which two independent power inputs are connected; in case of disconnection of one input, electricity will continue to flow through the second one;

    • For servers with two power supplies, they are connected to two independent socket blocks;

  • LAN and Internet network access is reserved at the aggregation switch level (and above) and:

    • For prebuilt configuration servers, LAN access switches and Internet switches are installed in each rack;

    • for some Chipcore Line servers there is no LAN connection, only Internet switches are installed in each rack;

    • For servers of arbitrary configuration, you determine the redundancy of connection to the required network (local or Internet) yourself;
  • If technically possible, when ordering two or more servers, they are located in different racks. You can view the current server location and submit a server move request to another rack in Control Panel, Servers and HardwareServers, Server Location tab.

Increase fault tolerance

Increase the fault tolerance of the client service at the physical layer by reducing the number of likely points of failure. When building an IT infrastructure, likely points of failure could be:

  • the server itself and its components (disks, power supplies, network interfaces, etc.);
  • ATS (automatic standby input devices);
  • Internet access switch and switching (copper and optical connection, transceivers, patch cords, etc.);
  • LAN access switch and switching.

Increase resiliency at the rack level

To reduce the number of likely points of failure at the rack level. you can:

  • select servers of arbitrary and prebuilt configurations with server processors. In such servers, all components are optimized for high load and smooth operation and are less likely to fail;
  • Use a server configuration with two or more power supplies. This method of power redundancy is more reliable because it does not have a single point of failure. Failure of one of the power supplies or ATS will not shut down the server;
  • for servers of arbitrary configuration to reserve connection to access switches of the required network (LAN or Internet) via MC-LAG;

Improve fault tolerance of a client service consisting of multiple servers

To reduce the number of likely points of failure for your service hosted on multiple servers. you can: