Respecting What Came Before, Premature Optimization, the Pitfalls of Sensationalism and Technology…

My response to the reaction generated by the Prime Video article published last week

May 10, 2023

“Life is not always perfect. Like a road, it has many bends, ups and down, but that’s its beauty.” ― Amit Ray — Photo by Derek Thomson on Unsplash

“Life is not always perfect. Like a road, it has many bends, ups and down, but that’s its beauty.” ― Amit Ray

Every now and then, the internet goes crazy over something someone said, did, or thought. Last week, it was an article from the Video Quality Analysis (VQA) team at Prime Video about the architecture evolution of their specialized live stream monitoring service. One of the hundreds of services that powers Prime Video.

The article talks about the architectural evolution of a tool developed by VQA for audio/video quality inspection. That tool, never intended nor designed to run at high scale, was now required to meticulously track all the live streams going to customers and automatically detect and address perceptual quality issues such as block corruption or audio/video synchronization problems.

This important detail easily missed from the article indicates that the architectural evolution was triggered by a change of requirements. A tool never originally designed or built for live monitoring was now being required to monitor all the live streams going to customers. In other words, the requirement went from asking an offline software architecture to now work in live mode.

That is a rather dramatic change of requirements.

The article also explains that the initial version of that tool was built using a Serverless-First approach, allowing the team to quickly explore and test their ideas.

A little but important detail: Prime Video Live serves millions of users for high-profile events such as The Grand Tour and Thursday Night Football, as well as exclusive streaming of the English Premier League.

With the added complexity of live events and popular streaming premieres, this monitoring tool had to be redesigned to support live streams. The team achieved this by consolidating their distributed serverless components (Lambdas orchestrated via Step Functions) into a single containerized process run in an ECS task. This change simplified the data transfer between components and kept it within the process memory, while also simplifying the orchestration logic.

The author of the article (questionably) labeled this architecture evolution “From distributed microservices to a monolithic application”.

And the internet went crazy.

Many claimed that the initial service architecture was bad to start with and that the Prime Video team that designed that monitoring tool was clueless.

Many claimed that Prime Video, as a business, was finally switching to containers and abandoning serverless.

Many claimed that it was the return of the monolith and the death of microservices and serverless.

Some even claimed that Amazon was finally ditching microservices in favor of a monolith.

The truth is that most of the internet didn’t read any further than the title of the article or blindly swallowed the narrative of a few people with vested interests.

As a Principal Engineer at AWS and working with plenty of teams at Amazon, including Prime Video, I want to address this episode by discussing topics that I believe are very important and that have been bothering me a lot while watching this episode unfold.

In an ever-evolving world of technology and engineering, it is imperative to respect the contributions of our predecessors, avoid the pitfalls of premature optimization, and overcome the challenges posed by sensationalism and technology enthusiasts.

Respecting what comes before.

Working systems are a testament to the efforts and expertise of previous engineering teams. These systems have been built through countless hours of design, development, testing, and iteration. Rather than dismissing or undervaluing them, we should appreciate the functionality and reliability they provide. Working systems represent a foundation upon which new innovations can be built, and they often contain valuable insights that can inform our future endeavors. The lessons embedded within existing systems are invaluable as they encapsulate the experiences and knowledge gained through successes and failures. Understanding the historical context and studying the decisions made by past engineering teams allows us to learn from their triumphs and challenges, thereby avoiding repeating mistakes.

Teams or individuals make decisions based on the best information available to them at the time, considering the constraints they face. When the Prime Video team made their initial design for the monitoring tool, going serverless made sense to them. It provided a quick way to explore and test their ideas. The tool was not intended to serve millions of live streams, so there was no need to overthink it and build a purpose-built state machine or manage any infrastructure. Using Step-Function and Lambda made sense in that context. Was it the only solution? No, but we can agree that their choice got the job done. In fact, it performed well enough that that monitoring tool became more popular than expected.

It is essential to acknowledge that hindsight judgments often fail to consider the unknown factors and complexities that influenced decisions made by engineers. Software engineering is a dynamic field with ever-evolving requirements, limited resources, and shifting priorities. Recognizing the constraints faced by our predecessors helps us appreciate the challenges they encountered and fosters empathy in our own decision-making processes. It is far easier to critique decisions made in hindsight than to make the right choices in critical moments.

Haven’t you ever made decisions that, in hindsight, were questionable? I know I have, and I probably will continue to do so. At Amazon, we learn to be comfortable with the idea of making decisions with only 70% of the data. The remaining 30% is intuition. This approach helps us avoid analysis paralysis and emphasizes the value of rapid delivery to solicit early feedback and iterate. Keeping this in mind, the Prime Video team made the appropriate decision.

As engineers, it is imperative that we steer clear of unwarranted criticism or belittlement of others’ decisions. Instead, we should approach the work of other engineers with humility and acknowledge the challenges they faced during their decision-making process. Treating others the way we wish to be treated, with empathy and understanding, is fundamental in fostering a collaborative and respectful engineering community.

More on empathy for engineers here.

Premature optimization

Premature optimization is a common issue in software engineering where code or systems are optimized without a good understanding of critical bottlenecks and performance issues. It divert valuable time and expensive resources away from more essential development tasks, leading to delays, decreased productivity, and lack of flexibility.

The main problem with premature optimization lies in the excessive fine-tuning of code or system components that often have little long-term impact on performance. Additionally, it leads to sub-optimal decision-making that sacrifices user experience and scalability for marginal long term performance gains.

This diversion from critical development tasks impacts project timelines, impact productivity, and creates complex and tightly coupled codebases since optimized code tends to rely on obscure language features that become difficult to modify or adapt in the future.

To avoid these pitfalls, a more pragmatic and measured approach to performance optimization is crucial. One effective strategy is to launch the service or feature to gather feedback and understand the system’s behavior. This allows for the identification of critical performance bottlenecks before diving into optimization efforts.

Profiling tools, performance testing, and empirical data are valuable in pinpointing the areas that truly require optimization, ensuring that efforts are focused where they will have the most significant impact. Implementing iterative development and establishing continuous feedback loops help address performance concerns incrementally at appropriate stages, without prematurely diverting resources from other critical development tasks.

Prime Video’s approach to launching their monitoring tool was a prime example (no pun intended) of avoiding the trap of premature optimization. They recognized the need to quickly explore and test their ideas, so they opted for a serverless architecture, which allowed them to launch their service rapidly without getting bogged down in extensive planning and optimization efforts upfront.

By taking this approach, they were able to gather valuable feedback and gain a deeper understanding of their use case and the challenges they faced. As the requirements dramatically changed to include monitoring all live streams viewed by customers, they realized that their initial serverless approach might not be the most efficient solution for the scale and complexity they were dealing with.

Instead of sticking to their initial architecture, they made the wise decision to refactor their system and consolidate the distributed serverless components into a monolithic application running in an ECS task. This change allowed them to simplify data transfer between components, streamline orchestration logic, and better handle the demands of monitoring millions of live streams.

The danger of surface-level conclusions

When we only skim the title without engaging with the entire article, we develop an incomplete and often inaccurate understanding of the subject matter, perpetuating surface-level understanding. Sadly, it often contributes to the spread of misinformation. Coupled with social media sensationalism, it is a recipe for disaster.

As I mentioned earlier, I do think the title and sub-titles of the Prime Video post are questionable. But I also understand the author’s choice. The analysis of the challenges encountered by the team can be approached from various angles, depending on our orientation or perspective as an engineer. Some opt for a macro-level approach, encompassing a broad, big-picture view of the operations, while others prefer a micro-level orientation, emphasizing attention to one service. Neither approach holds superiority over the other. Instead, understanding these two theoretical orientations provides a more comprehensive and nuanced understanding.

In the article, at the micro-level (the team level), the team did refactor their microservice architecture into a monolith. However, at a macro-level (the Prime Video business level), the monitoring service is part of a much larger distributed microservice architecture. It is all about perspectives. This perspective is what I think was missing or assumed in the article.

Engineering concepts are often complex and nuanced, requiring a deep dive into the content and paying attention to the details to gain a comprehensive understanding. Relying on titles and sub-titles alone leads to oversimplification, oversights, and misconceptions. It distorts engineering discussions by shaping perceptions without considering the full context. When we base our conclusions solely on attention-grabbing titles, it hampers meaningful dialogue and prevents the exchange of valuable insights. This can hinder the progress of engineering fields by perpetuating misconceptions and inhibiting the exploration of alternative viewpoints.

Technology enthusiasts

To serverless or container, that is not the question.

A peculiar problem exists with technology enthusiasts who are fans of technology not for its practical applications but for what it is in and of itself. While passion and enthusiasm for technology are generally commendable, this narrow perspective can give rise to several problems.

One key issue is the tendency to ignore the practical use cases of technology. When individuals solely focus on the technology itself, they may overlook or downplay its real-world applications. After all, the true value of technology lies in its ability to solve problems, enhance efficiency, and improve our lives. By disregarding its practical use cases, enthusiasts miss out on understanding its true potential and impact.

Another challenge stems from an overemphasis on features and specifications. Technology enthusiasts often become fixated on the intricate details, specifications, and technical aspects of a product or service. While these aspects are undoubtedly important, an excessive focus on them can lead to a skewed perception where superficial characteristics overshadow the broader understanding of how the technology can address real needs and provide tangible value for customers.

The consequences of this perspective extend to adoption and user experience. When enthusiasts prioritize the technology itself rather than its practicality, there is a risk of developing products and services that cater solely to a niche audience. This limited adoption results in a failure to meet the needs and expectations of a broader user base. To ensure widespread acceptance and impact, user experience and usability should be at the forefront of technological advancements.

Perhaps one of the most significant downsides of this fan culture is the missed opportunities for collaboration. By obsessing over the technology itself, enthusiasts may inadvertently isolate themselves from other domains and industries. And indeed, research shows that innovation arises from the convergence of different disciplines and perspectives. By broadening our outlook and considering the practical applications, we can unlock valuable insights and partnerships that have the potential to lead to groundbreaking advancements.

The irrelevance of microservice architecture vs. monolith debate

Ever since I started my career, the debate between building a microservice architecture or a monolith has captivated the engineering community, often overshadowing the fundamental objective of solving problems.

Whether it is developing a complex system or designing an application, the focus should be on addressing the challenges at hand and solving a customer’s problem. In a rapidly evolving technological landscape, the ability to deliver functional, reliable, and scalable solutions outweighs the choice between microservices and monoliths. The emphasis should be on meeting user needs and achieving business goals rather than fixating on architectural debates.

Every engineering project is unique, and there is no one-size-fits-all solution. The suitability of a microservice architecture or a monolithic approach depends on various factors, including the project’s size, complexity, team expertise, and business requirements. Flexibility is key, as it allows us to adapt our approach to the specific needs of the project. A pragmatic mindset that encourages evaluating architectural choices based on their alignment with the problem at hand enables us to make informed decisions.

Architectural decisions involve trade-offs, and it is essential to assess the advantages and disadvantages of each approach. While microservice architectures offer scalability and modularity, they may introduce complexity and operational overhead. On the other hand, monolithic architectures simplify deployment and maintenance but may face challenges with collaboration and development velocity. Understanding the trade-offs allows us to make informed choices that align with the specific needs and constraints of the project.

Rather than prescribing a specific architectural paradigm, we, the engineering community, should embrace the diversity of approaches. Different projects may require different solutions, and exploring various architectural styles can lead to valuable insights and innovation.

By appreciating the multitude of approaches, we can draw upon a rich pool of ideas and experiences, fostering a culture of inclusion, collaboration, innovation, and continuous improvement.

That’s all, folks. Thank you for reading this far!

Adrian

—

Subscribe to my stories here.

Join Medium for $5 — Access all of Medium + support me & others!

The Cloud Engineer

Discussion about this post