The past months have been difficult and filled with many important lessons. As a business, the coronavirus pandemic has forced us to make quick decisions during erratic times – cutting expenses, improving business flexibility, and enhancing teamwork.
CTO Challenges 2020: We pivoted while also keeping our employees and clients safe. Most of us faced an abrupt transition to remote work while ensuring business continuity, mitigating risk, and rapidly adapting internal processes and protocols for different scenarios.
To perform as a CTO in a changing world requires the capacity to undertake multiple roles and to make decisions that are in the best interest of the business. In times of crisis, when management is under a lot of pressure, the CTO has the vital role of keeping the organization functioning and navigating unprecedented market conditions, with many restrictions and obstacles. The role’s importance will continue to grow as the pandemic unfolds, in particular, and every time there’s a new outbreak, of any kind, in general.
Read below about the challenges I faced as a CTO and what our team learned during the COVID-19 crisis.
5 CTO Advice to Overcome Unpredictable Times
1. Agile Security Strategy
Unavoidable disruptions such as switching to remote work during the COVID-19 pandemic have caused new types of security challenges, such as employees using personal computers or providing access to secure networks from home. At Pentalog, for example, we had to shift more than 1,100 employees to a remote environment in a matter of a few days, while maintaining team commitment and productivity to our clients.
Security strategy requires adaptability, by nature. Not only as new threats arise regularly, but also as technical and team environments continuously evolve. As a matter of evidence, the number of threats has increased exponentially.
Several concepts help companies achieve better adaptation and agility.
Today, the most popular is DevSecOps, which aims to include security within agile DevOps organizations.In other words, as part of the development process. Security becomes a shared responsibility and no longer belongs to a specific siloed specialty.
In parallel, automationenforces the testing, validation, and recovery processes. DevSecOps is obviously an investment to achieve, but it results in outstanding benefits regarding a platform’s ability to resist to breaches.
Other approaches, such as Adaptive Security Architecture, should also be considered. It is less prevalent, yet, at only a few years old, Gartner cited it as a top technology trend for 2017. The core concept is that security breaches are unpredictable. Therefore, rather than having an “incident response approach,” it is safer to run a “continuous response” stance. The Adaptive Security Architecture relies on four pillars: Predict, Prevent, Defect, and Response.
2. Ensure Visibility
In times of disruption and crisis, you can expect changes to many aspects of technical products. For example, an increase in traffic can necessitate an adjustment in the capacities of the infrastructure; teams may lose momentum; users might operate the application differently and expect new types of features.
To continuously anticipate and adapt, it is critical to understand a project’s underlying trends. Measurement and telemetry are the core practices to implement and constantly improve. They should cover as many fields as possible:
- Team performance (momentum, mood, obstacles)
- Product technical performances (resource consumption, response time, databases)
- Security (access logs, abnormal queries)
- Analytics (user profiles, userbehavior)
Telemetry requires not only a specific set of tools and technologies such as Centralized Log Management or Predictive Monitoring but also a cultural shift. Every project team member should be involved as a shared responsibility to suggest, implement, validate, and actively monitor project activities.
For example, every feature on a backlog should outline in its Definition of Done the criterion to measure its success, failure, or related risks. The process and organization to track these metrics should also be clear: who should review what, when, and how. Furthermore, any governance activity should be conducted around metrics reviews and regularly re-assess their quality and coverage.
Once again, telemetry is a continuous process. It is counterproductive to aim too high too fast. Define short and long-term objectives, then on a roadmap, clearly list small steps to accomplish each of those objectives. Regularly assess progress and refine goals from lessons learned, new technologies, or capacities. Artifacts such as Maturity Models can be an excellent basis to provide transversal visibility to provide awareness, motivation, and transparency to each person on the project.
3. Automation to Mitigate System Failures
Just think about it: Netflix requires very few people behind the scenes to support its infrastructure in case of failure. Due to automated systems, it can self-react and mitigate most shortcuts. Netflix is ready for failure, meaning they recognize that it’s pointless to fight against them – embracing Werner Vogel’s famous quote: “Everything fails all the time.” Instead, you should orient your effort to react to failure when it occurs.
For instance, when the authentication service is going down, Netflix has chosen to provide access to its platform for free, rather than shutting down the whole platform. They even have a practice they call ‘Chaos engineering,’ which helps them to test the stability and reliability of their production applications by constantly causing breakdowns in their production environment.
We should all learn from this example and build automated systems to take over in times of crisis. It’s not only a matter of reliability. Other benefits of automation are cost reduction, productivity, availability,and performance.
In the case of a pandemic like the coronavirus outbreak, when people get sick and need to go to the hospital, automation processes show their priceless value when systems crash, and there’s no one on-site to fix them.
At its core, a positive DevOps philosophy should promote frequent releases, high automation, and software reliability. Today, a wide range of technological tooling is largely accessible and easy to implement – solutions that increase the automated resilience of technical products: Cloud Platforms provide one-click self-healing options. You can quickly set up software such as Chaos Monkey to assess the service’s response to failure and more.
Furthermore, it’s advisable to share a high-level understanding of the DevOps culture among your larger business team. This understanding will promote the importance of stability and upgradability of applications and help you to align your development and operations environments with the greater goals of your business as you strive for success in the online world.
4. Automation to Ensure Scalable Systems
The overconsumption of resources can also cause system disruption and failure, as described in the previous chapter. One example is in the context of increased demand for digital communication and collaboration services around the globe in the coronavirus pandemic.
Since so many people started working from home at the same time, companies had to adopt enterprise mobility tools and services, and many internet-based solutions reached their limits during the outbreak. Microsoft Teams and Zoom platforms now have millions of daily active users. This growth illustrates how remote work is changing the way people use technology and how fast we adapt to a new lifestyle.
With traditional on-premise-based IT, adapting infrastructure to consumption needs takes days or weeks. We faced projects where it even took months to acquire a new machine, from commercial negotiations until finalizing the server setup, which is obviously a huge risk. In most cases, if the infrastructure is under-scaled for peak times, the process of adapting infrastructure might slow down or even breakdown the product. If it’s over-scaled the rest of the time, that’s a waste of money, which companies often cannot afford during a crisis.
Fortunately, an increasing number of Cloud and SaaS platforms offer services that accommodate real-time resource needs with a “pay as you go” model. During peak times, the infrastructure scales up to ensure a continuous flawless experience for the end-users, or it scales down to optimize costs.
As we speak, IT enterprises continue to rapidly adopt managed data center services to enhance security, avoid network downtime, and achieve operational efficiency. Therefore, it’s paramount now to move to a cloud platform and to almost instantly leverage your expenses and capacities to your needs and business realities to ensure a scalable infrastructure.
There’s no better time to look closely at robotics and process automation to handle this unprecedented level of change.
5. Manage Control Costs
In the previous chapter, you read how to optimize costs for the particular infrastructure field with an adequate scaling strategy. In fact, it’s not as simple as auto-scaling, which can also create waste if not set up properly.
Furthermore, the cloud platforms that support such technologies have grown in complexity and cost. To address these changes, specific practices and roles such as FinOps have risen. FinOps leverages cost optimization as a specific activity with a clear focus that dictates the project’s organization and governance.
If FinOps focuses on Cloud Platforms expenses, you could also apply the methodology and focus to other sources of expenses. More knowledge, control, and adaptation of expenses surely enhance the chances of companies to adapt during a crisis.
Flexibility and resilience will always be key in times of disaster. Now more than ever, it’s critical to build these qualities into your technology cost management processes.
“Business as Usual” Redefined
I’m confident that we’re going to turn this crisis into opportunity.
As the coronavirus pandemic has dramatically impacted the IT world, we must learn from the experience to prepare ourselves better for the future. With its ups and downs, our role as CTOs are becoming, more than ever, a driver in moving the organization forward. We are the first ones to adapt and respond. The time to react is now!