Facebook EmaiInACirclel
Cloud Infrastructure

IT monitoring: towards 100% availability

PentaGuy
PentaGuy
Blogger

CEOs, CIOs and CTOs must be obsessed with the availability of their online services. Some of them because this is linked to client satisfaction and, the others, because this creates values. If services are not available, the effect will be quite the opposite.

100% availability is not wishful thinking. It can be reached by taking action at various levels.

To ensure continuity of its services, Pentalog assists its clients with the implementation of robust services, as well as flexible IT monitoring services with a mutual team.

An adapted environment

A 100% available solution without monitoring is an illusion. The requirement is threefold:

  • Internet continuity: make sure telecommunication operators (BGP, multihoming) are strong enough to support link breaking
  • Hardware redundancy: make sure there are no SPFs (Single Point of Failure). Redundancy must be everywhere
  • Robust architecture: make sure the integration of logic bricks takes place in a context that favors ramp-up

If, by default, these resources (Internet, hardware, software) are not stable or do not manage to ensure a very high availability, this is the starting point for wanting to reach 100% availability.
The first step on the way to reaching this very high availability consists in monitoring applications and the infrastructure by measuring the availability and load of these resources and by implementing the adapted tools.

An efficient and adapted IT monitoring

The implementation of an IT monitoring service cannot be limited to the implementation of tools. Redundancy at all levels counters equipment failure. However, unpredictable cases may arise and must be solved. The tool allows to alert for an immediate intervention to take place. The following process thus can be implemented:

  • Take into account the IT monitoring system’s alert (the siren is muted)
  • Dispatch the incident to the right team / resource
  • Implement the procedure prescribed for the identified problem
  • An expert should intervene if the problem is not to be found in the procedures (and documentation, if necessary)

This multi-layered approach allows to ensure an efficient intervention. In more modest environments, all these levels can obviously seem unjustified but this ITIL best practices approach limits errors.

To optimize these IT monitoring practices, one must be familiar with the practices of these users/clients for an availability adapted to the right usage time ranges.

Why ensure 24/7 IT monitoring when the service is used by offices? This is not systematically the case.

Insourcing or outsourcing?

The organization of a popular IT monitoring process consists in ensuring interventions in the working time range and a Best Effort organization in on-duty mode for the rest of the time (nights and weekends). But the Best Effort mode (remunerated or not) has its limits. The choice between team ramp-up to cover non-working hours and outsourcing depends on different criteria. For a 24/7 IT monitoring, outsourcing quickly becomes an inevitable choice:

  • IT monitoring is not the company’s core business.
  • The monitored activities are related to confidentiality.
  • Stable services allow sharing an IT monitoring budget with a partner organization.
  • The in-house IT monitoring effort must be transferred when the outsourcing takes place.

The flexible outsourcing model seems to be the most suited to face unpredictable situations and allow a good risk management:

  • An outsourced team is familiar with the services and their level of criticality
  • Outsourcing may only concern the non-working hours
  • In case of unplanned absences or holidays, the external team can cover the working hours

Trusting a partner

Whatever your IT monitoring needs (public cloud, servers, applications, infrastructure or offices) may be, the outsourcing activities cannot be performed in absence of trust. Attention, some precautions are needed to avoid blind trust:

  • Specify the responsibility and process scope in a PQP (Project Quality Plan).
  • Pay special attention to the contract and the reversibility clause.
  • Ensure regular communication and steering to remain in touch.
  • Implement individual authentication on systems for operation tracking.

What company can nowadays say that the unavailability of its information system or online services has no impact on its activity or brand image? 100% availability thus becomes an objective and the last hundredths are difficult to reach.

To ensure continuity of its services, Pentalog assists its clients with the implementation of robust services, as well as flexible IT monitoring services with a mutual team.

Contact me and let’s discuss about your problems. We will provide you with a solution perfectly tailored to your needs!

Learn more about Cloud Computing


Leave a Reply

Your email address will not be published. Required fields are marked *