CTO Insights - Issue #139
Exploring Scalability, Engineering Team Dynamics, and UK's Air Traffic Control Meltdown
Hi, this is Tosho Trajanov from Adeva with CTO Insights #139.
Just got back from Hamburg where I attended the code.talks conference. It was my first in a while and a refreshing change—lots of great insights and inspiration to bring back.
Your support for CTO Insights makes all the difference. It's what fuels this engine, allowing me to continue hand-picking the best in software engineering and technical leadership for you every week.
If you enjoy CTO Insights, make sure to invite your friends in tech to subscribe and read along with us. By referring friends, you'll gain special access to CTO Insights benefit -
How to participate
1. Share CTO Insights. You'll get credit for new subscribers when you use the referral link below or the “Share” button on any post. Send the link in a text, or email, or share it on social media with friends.
2. Earn benefits. You'll receive benefits when more friends use your referral link to subscribe.
5 Referrals: Enjoy a 30-minute Virtual Coffee Chat where we can discuss tech trends, careers, or any topic of your choice.
10 Referrals: Your name will be featured on the CTO Insights Wall of Fame, visible to our community.
50 Referrals: Co-Author One Edition of CTO Insights, sharing your unique perspective with our audience.
I’m just rolling out the referrals, so we all start from zero. To learn more, check out Substack’s FAQ.
Thank you for helping get the word out about CTO Insights!
Featured
Tools for Running Engineering Teams - An Ultimate List
Discover the ultimate tech stack for engineering leaders in our comprehensive guide. It's a roadmap to help you navigate essential tools for strategy, team and process management, and more.
Elevate your team's productivity and align with your company's goals. Ideal for CTO Insights readers seeking targeted tool recommendations.
Let me know if I’ve missed some of your favorite tools in the comments below.
Architecture
Solving Scalability and Performance Challenges to Support LinkedIn
This article provides an in-depth look into how LinkedIn optimized its Espresso database for better performance, scalability, and resiliency by migrating from HTTP/1.1 to HTTP/2. You'll learn about the challenges they faced, the specific technical solutions implemented—from Netty pipeline optimizations to SSL performance—and how these changes led to a 75% boost in system performance.
Scalability
An end-to-end AI system performance simulator
This article delves into how Meta leverages its end-to-end AI system performance simulator, Arcadia, to optimize AI clusters for various applications such as generative AI, computer vision, and natural language processing. You’ll learn about the challenges in optimizing AI clusters, how Arcadia simulates performance factors across compute, memory, and network, and its various use cases for large-scale, high-performance clusters.
Culture
On Sizing Your Engineering Organizations
This article delves into the complexities of scaling an engineering team and offers a framework for understanding organizational growth. You'll learn about the three key ideas to consider when planning team size: treating teams as units of concurrency, the necessity of upgrading organizational infrastructure as the team grows, and the impact of turnover and onboarding on the team's effectiveness.
How to land the manager-to-IC pivot
This article challenges the conventional "career ladder" metaphor and explores the growing trend of professionals pivoting from managerial roles back to individual contributor (IC) roles. You'll learn why some are making the switch for greater job satisfaction, the considerations involved in such a career move, and how open communication and company support can make this transition rewarding for both the individual and the organization.
Leadership
Performance & Compensation for Engineering Executives
This post has covered the core challenges you’ll encounter when operating and evolving the performance and compensation processes for your Engineering organization. With this background, you’ll be ready to resolve the first batch of challenges you’re likely to encounter, but remember that these are extremely deep topics, with much disagreement, and many best practices of a decade ago are considered bad practice today.
Product
How To Set Your 2024 Product Strategy
The author, Brian de Haaff, discusses the challenges and best practices of setting an annual product and company strategy, specifically focusing on Aha!'s approach to roadmap planning for 2024.
You’ll learn about the importance of a "goal-first" mindset, the necessity of honoring both achievements and failures, and the practical steps to ensure a well-rounded and effective product strategy.
Other
Tracking the Laid-off Tech Engineers: Where Are They Now?
I wrote this article to demonstrate the seismic shifts in the tech industry's employment landscape, from rapid hiring during the pandemic to mass layoffs by 2023.
It investigates where laid-off engineers and tech professionals have redirected their careers—such as independent contracting, launching startups, or venturing into smaller firms—and offers insights on how enterprises and startups can leverage the available abundance of tech talent for various organizational benefits.
The Worst Programmer I Know
This article challenges conventional metrics for assessing developer productivity by sharing the story of Tim Mackinnon, a "zero-point" developer who significantly elevated his team's overall performance.
You'll learn why individual metrics can be misleading and how focusing on team-based outcomes can create a more effective and impactful development environment.
Threads: The inside story of Meta’s newest social app
This article provides an insider look into the development and rapid success of Threads, Meta's new social app that aims to revolutionize text-based conversations by making them decentralized and interoperable across platforms.
Read more about the engineering challenges the Threads team overcame, how they leveraged existing Instagram infrastructure, and the app's ambitious goals to integrate with the open, decentralized social networking protocol ActivityPub.
UK air traffic control meltdown
This article delves into the major technical incident that occurred at NATS, the UK's air traffic control operator, on August 28, 2023, which led to the cancellation of over 2000 flights and cost an estimated £100 million.
If you thought your system is well-tested, please check their statement -
This was a one in 15 million chance. We've processed 15 million flight plans with this system up until this point and never seen this before.
London robotic surgeon celebrates its 10,000th procedure
An outlier in the issue but worth mentioning, Da Vinci, the robotic surgeon at Guy's and St Thomas' Hospital in London, has marked a significant milestone by completing its 10,000th procedure after nearly 20 years in operation.
Closing Notes
Thank you for reading through this week's newsletter — I hope you found the content insightful.
Let me know your thoughts in the comments section below.
That is all, thank you, everyone.
I am wishing you a productive and fulfilling week ahead.