AWS Well-architected framework - points to be noted
- Aniket Patel
- Aws , Well architected , Security , Cost optimization
- August 22, 2023
Introduction
I’m passionate about learning different frameworks for creating checklists. These checklists can help you make cost-effective, resilient, and secure infrastructure. One of the frameworks that can help you build great SaaS software is the 12 Factors app . This framework emphasises automation, maximum portability, minimising divergence, and infrastructure and software that can scale without significant changes to tooling, architecture or development practices. It proven to be really helpful to use these frameworks.
In India, There is story called “Samundra-Manthan” (Churning of the ocean of milk) which churned out fourteen different things. “Dhanvanthri” (the physician of gods) was rose from the water holding the supreme treasure “the Amrita” was one of those. “the Amrita” was elixir of immortality to help devas gain there power back(Good gods). [Read more about the story ]
Same way AWS and AWS solutions architect have years of experience architecting solutions across wide variety of business verticals and use-cases. Just like churning out of ocean AWS have churned out this AWS well architected framework - This framework outlines best practices and core strategies for designing systems using AWS. It provides a consistent set of practices for customers and partners to evaluate architecture and a set of questions for evaluating how well an architecture aligns with AWS best practices.
The AWS well architected Framework is based on six pillars:
Operational excellence is about supporting development, effectively running workloads, gaining insight into operations, and improving supporting processes and procedures to deliver business value.
Security describes how to use cloud technologies to protect data, systems, and assets to improve security posture.
Reliability is the ability of a workload to perform its intended function correctly and consistently, and to operate and test the workload throughout its lifecycle.
Performance efficiency is the ability to use computing resources efficiently to meet system requirements and maintain that efficiency as demand changes and technologies evolve.
Cost optimization is about running systems to deliver business value at the lowest price point.
Sustainability is about continually improving sustainability impacts by reducing energy consumption and increasing efficiency across all components of a workload, maximizing the benefits from the provisioned resources, and minimizing the total resources required.
General Design Principles
- Stop guessing your capacity needs: Cloud computing allows you to use as much or as little capacity as you need, without worrying about sitting on expensive idle resources or dealing with limited capacity.
- Test systems at production scale: In the cloud, you can create a complete testing environment on demand, simulate your live environment for a fraction of the cost, and decommission the resources when you’re finished.
- Automate with architectural experimentation in mind: Automation allows you to create and replicate your workloads at low cost, track changes, audit the impact, and revert to previous parameters when necessary.
- Consider evolutionary architectures: In the cloud, you can automate and test on demand, which lowers the risk of impact from design changes. This allows systems to evolve over time, so businesses can take advantage of innovations.
- Drive architectures using data: Collect data on how your architectural choices affect your workload’s behavior. This allows you to make fact-based decisions on how to improve your workload.
- Improve through game days: Regularly schedule game days to simulate events in production. This will help you understand where improvements can be made and develop organizational experience.
Six Pillers
Operational Excellence
The operational excellence pillar focuses on how to run and monitor systems effectively, and how to continuously improve processes and procedures. It covers topics such as automation, change management, incident response, and standards. Some of the best practices for achieving operational excellence are:
- Implementing a code deployment pipeline that automates testing, integration, delivery, and deployment of changes.
- Using configuration management tools to manage the state of the system and ensure consistency across environments.
- Monitoring key performance indicators (KPIs) and metrics to track the health and performance of the system and identify issues or anomalies.
- Implementing feedback loops and mechanisms to collect and analyze data from customers, users, and stakeholders.
- Applying the Plan-Do-Check-Act (PDCA) cycle to continuously review and improve the system based on the feedback and metrics.
Security
The security pillar focuses on how to protect information and systems from unauthorized access, disclosure, modification, or destruction. It covers topics such as identity and access management, encryption, network security, logging and auditing, and incident response. Some of the best practices for achieving security are:
- Implementing the principle of least privilege, which grants only the minimum permissions required for each user or role to perform their tasks.
- Encrypting data at rest and in transit using strong encryption algorithms and keys.
- Using firewalls, security groups, network access control lists (NACLs), and other network security features to control the traffic flow between different components of the system.
- Enabling logging and auditing features to record and monitor the activities and events in the system.
- Implementing a security incident response plan that defines roles, responsibilities, procedures, and tools for handling security incidents.
Reliability
The reliability pillar focuses on how to ensure that the system performs its intended functions correctly and consistently under different conditions. It covers topics such as fault tolerance, scalability, load balancing, backup and restore, and disaster recovery. Some of the best practices for achieving reliability are:
- Designing the system with redundancy and replication to eliminate single points of failure and increase availability.
- Using auto-scaling features to adjust the capacity of the system based on the demand or load.
- Using load balancing features to distribute the traffic across multiple instances or endpoints of the system.
- Implementing backup and restore strategies to protect data from loss or corruption.
- Implementing disaster recovery strategies to recover from major failures or disasters that affect the entire system or region.
Performance Efficiency
The performance efficiency pillar focuses on how to optimize the use of computing resources to deliver the best performance for the system. It covers topics such as resource selection, scaling, caching, content delivery networks (CDNs), and performance testing. Some of the best practices for achieving performance efficiency are:
- Selecting the right type and size of resources (such as instances, storage, databases, etc.) that match the requirements of the system.
- Using horizontal scaling (adding more resources) or vertical scaling (upgrading resources) to increase or decrease the capacity of the system as needed.
- Using caching features to store frequently accessed data or content in memory or edge locations to reduce latency and improve responsiveness.
- Using CDNs to deliver static content or media files from locations closer to the end users to reduce network bandwidth and latency.
- Conducting performance testing to measure and benchmark the performance of the system under different scenarios and loads.
Cost Optimization
The cost optimization pillar focuses on minimizing or avoiding unnecessary costs while maintaining or improving the quality of the system. It covers topics such as resource utilization, pricing models, budgeting, cost monitoring, and cost-saving opportunities. Some of the best practices for achieving cost optimization are:
- Monitoring and analyzing the utilization of resources (such as CPU, memory, disk space, network bandwidth, etc.) to identify underutilized or overprovisioned resources that can be reduced or eliminated.
- Choosing the right pricing model (such as on-demand, reserved, spot, etc.) that suits the usage pattern and demand of the system.
- Setting up a budget and alert mechanism to track and control the spending on AWS services and resources.
- Taking advantage of cost-saving opportunities (such as discounts, credits, free tiers, etc.) that AWS offers for various services and resources.
- Reviewing and optimizing the architecture of the system regularly to identify potential areas for improvement or innovation.
Sustainability
The sustainability pillar focuses on how to design and operate sustainable workloads that reduce their environmental impact, optimize resource usage, and promote social sustainability2. It provides design principles, best practices, operational guidelines, and improvement plans that help cloud architects to meet their sustainability targets for their AWS workloads2. Some of the topics covered by the sustainability pillar are:
- Energy consumption and efficiency: How to measure and optimize the energy usage of the system and leverage AWS features and services that support renewable energy sources2.
- Resource utilization and waste reduction: How to monitor and analyze the resource utilization of the system and identify opportunities to reduce or eliminate unnecessary or underutilized resources2.
Some of the best practices to follow are:
- Set long-term sustainability goals for your workloads and model your return on investment (ROI). This will help you align your business objectives with your sustainability targets and measure your progress over time.
- Use AWS services and features that enable you to optimize your resource utilization and reduce waste. For example, you can use Amazon EC2 Auto Scaling, AWS Lambda, Amazon S3 Intelligent-Tiering, and AWS Cost Explorer to scale your resources according to demand, pay only for what you use, and optimize your storage costs.
- Monitor and measure your energy consumption and carbon footprint using AWS tools and third-party solutions. For example, you can use AWS Compute Optimizer, Amazon CloudWatch, and AWS Trusted Advisor to analyze your workload performance and identify opportunities for improvement. You can also use AWS Sustainability Calculator, AWS Carbon Calculator, and AWS Partner Network (APN) solutions to estimate your carbon emissions and track your sustainability metrics.
- Implement green IT practices and policies that support your sustainability goals. For example, you can use renewable energy sources, recycle or donate your unused hardware, educate your employees and customers about sustainability, and participate in environmental initiatives and programs.