xxx是什么意思| 新生儿不睡觉是什么原因| 无氧运动是什么| 耳朵后面长痘痘是什么原因| 什么木做菜板最好| 2025是什么生肖年| 蝙蝠怕什么| 什么是备皮| 大便出血是什么原因引起的| 胃萎缩是什么原因| 纯色是什么颜色| 颈椎病头晕吃什么药好| 丹五行属性是什么| 外阴痒用什么药| 免疫抑制剂是什么意思| 活色生香的意思是什么| h1是什么意思| 镇党委副书记是什么级别| 哈尔滨有什么好吃的| 血脂高吃什么药效果好| 不解之谜的意思是什么| 固摄是什么意思| 1938年中国发生了什么| 梦见放生鱼是什么意思| 狗肚子有虫子吃什么药| 单人旁的字有什么| 性有什么好处和坏处| 总是心慌是什么原因| 浮萍是什么植物| 成年人改名字需要什么手续| 为什么海水是咸的| 韭菜可以炒什么| 便秘吃什么水果好| 玉屏风治什么病最好| 水里有什么| 随波逐流是什么意思| 龋齿是什么原因造成的| 虐狗什么意思| 东窗事发是什么意思| 火烧是什么食物| 屁特别臭是什么原因| 风雨交加是什么生肖| 减肥为什么不让吃南瓜| 牛油果坏了是什么样| 多吃玉米有什么好处和坏处| 窦性心律早期复极是什么意思| 舌头尖麻木是什么原因| 春秋是一部什么体史书| 鑫字属于五行属什么| 肚子有腹水是什么症状| 青蛙吃什么东西| 蝎子吃什么东西| 晚上难以入睡是什么原因| 办理出院手续都需要什么| 雌二醇测定是什么检查| 高中生适合用什么手机| 头顶秃了一小块是什么原因怎么办| 画像是什么意思| 经常头晕头疼是什么原因| 周到是什么意思| 老鼠最怕什么气味驱赶| 牙齿掉了一小块是什么原因| hpv感染是什么意思| 一点是什么时辰| 腋下有异味是什么原因| 门静脉高压是什么意思| 敬谢不敏什么意思| 09年的牛是什么命| 判处死刑缓期二年执行是什么意思| 瑗是什么意思| 囊壁钙化是什么意思| 猫吃什么食物除了猫粮| 观音菩萨代表什么生肖| ehe是什么牌子| 盘尼西林是什么药| 田各读什么| 中年人喝什么奶粉好| 什么的大树| 爱发朋友圈的女人是什么心态| 异常出汗是什么原因| 眼睛里有红血丝是什么原因| 康健是什么意思| a型血的孩子父母是什么血型| 艾灸能治什么| 过桥米线为什么叫过桥| 初心不改是什么意思| 尿白细胞弱阳性什么意思| 瑶五行属性是什么| 丹田是什么器官| 土笋冻是什么虫子| 董卓字什么| 精索是什么| 腺体是什么| 制冰机不制冰是什么原因| 宫颈hsil是什么意思| 996是什么意思| 紫涵女装属于什么档次| 炎症用什么药最好| v4是什么意思| alk是什么意思| 咽炎吃什么药最有效| 胃火重口臭吃什么药好| 三杯鸡为什么叫三杯鸡| 欲钱看正月初一是什么生肖| 怎么知道自己缺什么五行| 痱子用什么药| 反复高烧是什么原因| 管型尿是什么意思| 华法林是什么药| 什么是应力| 光影什么| 知了吃什么食物| 甲状腺是什么部位| 四海是什么意思| 小孩用脚尖走路是什么原因| 文昌星是什么意思| 外痔是什么样子的| 什么食物降尿酸效果好| 傻子是什么意思| 肾寒吃什么中成药| 脖子落枕挂什么科| 什么是手足口病| 甲肝抗体阳性代表什么| 小m是什么意思| 为什么会发生地震| 热痱子是什么样子图片| 8月15日什么星座| 元肉是什么| 小麦和大麦有什么区别| 垒是什么意思| 什么是肺部磨玻璃结节| 着凉拉肚子吃什么药| 猪生肠是什么部位| 什么食物对肝有好处| 灏读什么| 息斯敏又叫什么药名| 三十八岁属什么生肖| 起床气是什么意思| 红茶适合什么季节喝| 女生小便带血是什么原因| 现在有什么水果| 仁字五行属什么| 龙马精神代表什么生肖| 心脏除颤是什么意思| 白天尿少晚上尿多什么原因| 什么是逻辑思维| 睾酮是什么意思| 枸橼酸西地那非片是什么药| 贵人多忘事什么意思| 想请假找什么理由好| 盘根是什么| 生长激素分泌的高峰期是什么时候| 鲥鱼是什么鱼| 芥末为什么会冲鼻| 买手店是什么意思| 问羊知马是什么生肖| 藏红花不能和什么一起吃| 胃不好吃什么养胃| 80分贝相当于什么声音| 喝绿豆汤有什么好处| 为什么胃酸会分泌过多| 蒂芙尼蓝是什么颜色| 白带是什么样的| 滴虫病女性有什么症状| 胸闷是什么原因| 动物园有什么动物| 截疟是什么意思| 怀孕养狗对胎儿有什么影响| ict是什么意思| 扁桃体肿大吃什么药| 补料是什么意思| 脾大是什么原因造成的怎么治疗| 兰若是什么意思| 为什么低血糖| 间接胆红素高是什么意思| 双子座是什么象星座| 4月1号是什么星座| 回本是什么意思| 智能眼镜有什么功能| 什么叫间质瘤| 甲亢的早期症状是什么| 什么的小狗| 慢性胃炎吃什么药好| 想要什么样的爱| 心门是什么意思| 阴虱病是什么原因引起的| 一抹是什么意思| 经常上火口腔溃疡是什么原因| 富士康是干什么的| 女人手心脚心发热是什么原因| 好是什么意思| 口是心非是什么动物| 大枕大池是什么意思| 龙头凤尾是什么生肖| 骨质疏松是什么意思| 室内传导延迟什么意思| 狗的尾巴有什么作用| 老鹰代表什么生肖| 什么食物含蛋白高| 飞机托运不能带什么| 屿是什么意思| 小学生什么时候考试| gp是什么的缩写| 为什么排卵期会出血| 拘留是什么意思| 一花一世界一叶一菩提是什么意思| 肺火旺吃什么药| 鼻子经常流鼻涕是什么原因| 什么食物含硒多| 奥莱是什么牌子| 浙江有什么城市| vb6是什么药| zara是什么品牌| 洁字五行属什么| 梦见杀猪是什么意思| 放屁太臭是什么原因| 晕血是什么症状| 舌头溃疡吃什么药最好| 晚上十二点是什么时辰| 近视是什么意思| 云南白药的保险子是起什么作用的| slogan什么意思| 常吃黑芝麻有什么好处和坏处| 衣食父母什么意思| 真菌孢子是什么| 聊胜于无什么意思| 昏睡是什么症状| 长史相当于现在什么官| 痛风可以吃什么水果| 辣皮子是什么| 梅子色是什么颜色| 女生右手中指戴戒指什么意思| 灰枣和红枣有什么区别| 吃什么养头发| 梦见自己给别人钱是什么意思| 内啡肽是什么| 什么是柏拉图式的爱情| 晚上20点是什么时辰| 晚上七点多是什么时辰| 什么是房颤| 水猴子长什么样子| 安眠穴在什么位置| 性生活后尿路感染是什么原因| 晁盖的绰号是什么| 为什么眼睛老是流泪| 大腿青筋明显是什么原因| 2018年属什么生肖| 肝不好吃什么| 稽留热常见于什么病| 白食是什么意思| 11月18日是什么星座| 咽炎挂什么科室| 范字五行属什么| 祛湿是什么意思| 爱情是什么| 抽搐吃什么药| 雨中即景什么意思| 法国鳄鱼属于什么档次| 左氧氟沙星氯化钠注射作用是什么| bq是什么意思啊| 带状疱疹一般长在什么地方| 1978年属马五行缺什么| 什么是阴唇| 梦见杀牛是什么预兆| 百度

男性尿道出血什么原因

Behind AWS S3’s Massive Scale
This is a guest article by Stanislav Kozlovski, an Apache Kafka Committer. If you would like to connect with Stanislav, you can do so on Twitter and LinkedIn.
百度 中国特色社会主义进入新时代,需要加强党对协商民主的顶层设计和组织领导,对协商民主在社会主义民主政治中的价值地位、协商民主与选举民主的关系、协商民主的系统化建设等进行整体规划和设计,形成更为完善的协商民主制度程序。

AWS S3 is a service every engineer is familiar with.

It’s the service that popularized the notion of cold-storage to the world of cloud. In essence - a scalable multi-tenant storage service which provides interfaces to store and retrieve objects with extremely high availability and durability at a relatively low cost. Customers share all of the underlying hardware.

It is one of the most popular services in AWS.

Through 18 years since its release in 2006, it has grown into a behemoth of an offering, spanning 31 regions and 99 availability zones as well as boasting numbers that show ground-breaking scale:

  • 100 million requests a second
  • 400 terabits a second
  • 280 trillion objects stored

Here is a short timeline of the major features ever since its release:

  • 2006: S3 officially launched
  • 2012: Glacier - a low-cost service for data archiving and long-term backup - introduced (retrieval times take minutes to hours)
  • 2013: S3 Intelligent-Tiering - a feature that automatically moves your data the most cost-effective access tier
  • 2015: Standard-Infrequent Access - a new tier for data that’s less frequently accessed, but requires rapid access when needed
  • 2018: Glacier Deep Archive - cheaper Glacier with much slower retrieval times (12 hours)
  • 2020: Strong read-after-write consistency introduced
  • 2021: Object Lambda - allow customers to add their own code to S3 GET requests, modifying data as its returned to an application
  • 2023 - S3 Express released, offering 10x better latencies (single-digit) and 50% cheaper request costs.

It started as a service optimized for backups, video and image storage for e-commerce websites - but eventually grew in scale and demand to support being the main storage system used for analytics and machine learning on massive data lakes. Looking at current trends, it seems to be becoming the staple storage backbone for any cloud-native data infrastructure.

Architecture

S3 is said to be composed of more than 300 microservices.

It tries to follow the core design principle of simplicity.

You can distinct its architecture by four high-level services:

  • a front-end fleet with a REST API
  • a namespace service
  • a storage fleet full of hard disks
  • a storage management fleet that does background operations, like replication and tiering.

Said simply - it’s a multi tenant object storage service with an HTTP REST API.

It is said that AWS ships its organizational chart.

That is, the architecture strongly exhibits Conway’s Law: the theory that organizations will design systems that copy their communication structure.

Each of the four components are part of the S3 organization, each with their own leaders and number of teams that work on it. It is said that this continues recursively - the boxes in the chart contain their own individual nested components with their own teams and fleets. In many ways, they all operate like independent businesses.

The way each team interacts with others is literal API-level contracts, for which engineers are on the hook for keeping up.

Storage Fleet

S3 is just a really big system built out of A LOT (millions) of hard disks.

At the core of all of this are the storage node servers. They are simple key-value stores that persist object data to hard disks. The node servers only shards of the overall object data, with the control plane replicating said shards across many different nodes.

AWS has written a lot about their newly-written storage backend - ShardStore. It initially started with just 40k lines of Rust.

Under the hood, it’s a simple log-structured merge tree (LSM Tree) with shard data stored outside the tree to reduce write amplification.

It uses a soft-updates-based crash consistency protocol and is designed from the ground up to support extensive concurrency. It leverages several optimizations to support efficient HDD IO usage like scheduling of operations and coalescing.

This is perhaps the most key piece of code in S3 and as such, its development process is highly invested in. AWS has written at length about the lightweight formal verification methods they’ve integrated into their developers’ workflows.

Hard Drives

Hard drives are an old technology that’s not ideal for all use cases - they’re constrained for IOPS, they have high seek latency and are physically fragile.

It’s worth stepping back and checking their evolution:

  • 1956: a 3.75MB drive cost $9k
  • 2024: 26TB drives exist, where 1TB costs $15

In their existence, they've seen exponential improvement:

  • price: 6,000,000,000 times cheaper per byte (in inflation-adjusted $)
  • capacity increased 7,200,000 times
  • size decreased 5,000 times
  • weight decreased 1,235x

But one problem persists - they’re constrained for IOPS. They have been stuck at 120 IOPS for a long time. Similarly, the latency has not kept up in the same pace as the rest.

This means that per byte, HDDs are becoming slower.

In an increasingly latency-sensitive world, it’s hard for an HDD to keep up with the demanding requirements modern systems have.

Yet, S3 found a way to deliver tolerable latency while working around this - it heavily leverages parallel IO.

Replication

In storage systems, redundancy schemes are a common practice.

They are most often associated with extra durability - helping protect data against hardware failures. If one disk fails, a copy of the data remains present in another disk, hence it isn’t lost.

An under-appreciated aspect of these redundancy schemes is managing heat. Such schemes spread the load out and give your system the flexibility to direct read traffic in a balanced way.

S3 uses Erasure Coding (EC). It’s a complex data protection method which breaks data into K shards with M redundant “parity” shards. EC allows you to recover the data from any K shards out of the total K+M total shards. 

e.g. break data into 10 fragments, add 6 parity shards. You can lose up to 6 shards.

Replication is expensive from a durability perspective (a lot of extra capacity is taken), yet is efficient from an I/O perspective.Erasure Coding helps S3 find a middle balance, by not taking too much extra capacity for the durability needs (they aren’t replicating the data 3x) while still providing flexible I/O opportunity and surviving the same number of disk failures.

Sharding

S3 tries to spread these shards as broadly as they can. They’ve shared that they have tens of thousands of customers with data stored over a million physical drives.

Spreading the data around gives them the following benefits:

1. Hot Spot Aversion
When the data is well spread out, no single customer can cause a hot spot that affects the system. Hot spots are particularly troublesome and worth avoiding, as they introduce cascading delays throughout the system and in the worst case can result in unpleasant domino effects.

2. Burst Demand
Similarly, the extra parallelism afforded by each drive the shards are on can allow for greater burst IO compared to what a naive solution may bring. (e.g the data replicated on 3 drives)

3. Greater Durability
The more the shards are spread, the greater the durability. You can sustain more individual machine failures.

4. No Read Amplification
Read amplification denotes how many disk seeks a single read causes.Assuming homogeneous hardware, a single read will still cause roughly the same amount of disk seeks. But since the data is spread around, those seeks will not be on the same drive.

The less disk seeks one HDD needs to do, the better. This reduces the likelihood of high latency.

Heat Management at Scale

Given the scale, one of S3’s biggest technical problems is managing and balancing this I/O demand across the large set of hard drives.

The overall goal is to minimize the number of requests that hit the same disk at any point in time. Otherwise, as we mentioned -  you garner hot spots and accumulate tail latency that eventually grows to impact the overall request latency.

In the worst case, a disproportionate number of requests on the same drive can cause stalling as the limited I/O the disk provides becomes exhausted. This results in overall poor performance on requests that depend on the drive(s). As you stall the requests, the delay gets amplified up through layers of the software stack - like Erasure Coding write requests for new data or metadata lookups.

The initial placement of data is key, yet also tricky to get right because S3 doesn’t know when or how the data is going to be accessed at the time of writing.

The S3 team shares that due to the sheer scale of the system and its multi tenancy, this otherwise hard problem becomes fundamentally easier.

S3 experiences so-called workload decorrelation. That is the phenomenon of seeing a smoothening of load once it’s aggregated on a large enough scale.

The team shares that most storage workloads remain completely idle for most of the time. They only experience a sudden load peak when the data is accessed, and that peak demand is much higher than the mean.

But as the system aggregates millions of workloads, the underlying traffic to the storage flattens out remarkably. The aggregate demand results in a smoothened out, more predictable throughput.

When you aggregate on a large enough scale, a single workload cannot influence the aggregate peak.

The problem then becomes much easier to solve - you simply need to balance out a smooth demand rate across many disks.

Parallelism

Parallelism has aligned incentives for both AWS and its customers - it unlocks better performance for the customers and allows S3 to optimize for workload decorrelation.

Let us take some numbers as examples:

Two Opposite Examples

Imagine a 3.7 Petabyte S3 bucket. Say it takes in 2.3 million IOs a second.

With 26 TB disks, you’d only need 143 drives to support the 3.7PB capacity.
But with the low 120 IOPS per disk, you’d need 19,166 drives for the 2.3m IO demand! That’s 13,302% more drives.

Now imagine the opposite - a 28PB S3 bucket with just 8,500 IOs a second.

With 120 IOPS per disk, you’d need 71 drives for the IOs. 
With 26TB disks, you’d need 1076 drives for the capacity. That’s 1,415% more drives.

A 134x imbalance in one case, 15x in another.

S3 likely has both types of workloads, but the large drive requirement proves the need of parallelism to achieve the necessary read throughput.

Parallelism in Practice

S3 leverages parallelism in two ways:

Across Servers

Parallelism begins at the clients!

Instead of requesting all the files through one client with one connection to one S3 endpoint, users are encouraged to create as many powers as possible with as many parallel connections as possible. This utilizes many different endpoints of the distributed system, ensuring no single point in the infrastructure becomes too hot (e.g caches)

Intra-Operation

Within a single operation inside a connection, S3 also leverages parallelism.

  • PUT requests support multipart upload, which AWS recommends in order to maximize throughput by leveraging multiple threads.
  • GET requests similarly support an HTTP header denoting you read only a particular range of the object. AWS again recommends this for achieving higher aggregate throughput instead of the single object read request.

Metadata and the Move to Strong Consistency

In 2020, S3 introduced strong read-after-write consistency. Meaning that once you write a new object or overwrite an existing one, any subsequent read will receive the latest version of the object.

This was a massive change given the scale S3 was operating on, especially considering it came with zero impact to performance, availability or cost.

S3 has a discrete subsystem for storing per-object metadata. Because the metadata write/read is part of the critical data path for most requests, the system’s persistence tier was designed to use a highly resilient caching technology ensuring that even if it was impaired, S3 requests would succeed.

This meant that writes might flow through one part of the cache infrastructure while reads might query another part, resulting in a stale read. It is said this was the main source of S3’s eventual consistency.

As part of other features, S3 introduced new replication logic into its persistence tier that allowed them to reason about the per-object order of operations. As a core piece of their cache coherency protocol, this allowed them to understand if a cache’s view of an object is stale.

A new component was introduced that would track this. Said component acts as a witness to S3 writes and a read barrier during S3 reads. When the component recognizes that a value may be stale, it is invalidated from the cache and read from the persistence layer.

Durability

S3 offers 11 nines of durability - 99,999999999. The expected average annual loss is therefore 0.000000001% of objects stored.

Said simply, if you store 10,000 objects in S3, you can expect to lose a single object once every 10,000,000 years.

Hardware Failure

Achieving 11 nines of durability is a difficult feat. Hard drives frequently fail and at S3’s scale, it’s likely an hourly occurrence.

In theory - durability is simple. As hard drives fail, you repair them. As long as the repair rate matches the failure rate, your data is safe.

In practice - to ensure this process is reliable in the face of unpredictable failures, you need to build a system around it.

AWS shared a simple real-world example where failures accumulate - imagine the data center is in a hot climate and a power outage occurs. Its cooling system stops working. Disks begin to overheat and the failure rate goes up significantly.

AWS built a continuous self-healing system with detectors that track the failures and scale their repair fleet accordingly.

In the background, a durability model runs analysis to track whether the desired durability is actually met.

The lesson here is that durability cannot be a single snapshot of the system - it is a continuous evaluation.

Other Failures

Durability is a large umbrella. Your hardware may work fine but data can still be lost through other means, like deploying a bug that corrupts data, a human operator error deleting data, problematic deployments, or having the network corrupt the bits uploaded by the customer before it reaches your data center.

As one example - S3 has implemented what they call a durable chain of custody. To solve the edge case where data can become corrupted before it reaches S3, AWS implemented a checksum in the SDK that’s added as an HTTP Trailer (preventing the need of scanning the data twice) to the request.

The sheer size of S3 means that any theoretical failure that could be experienced has probably been seen, or will soon be.

Culture

Achieving such a feat of engineers like S3 is as much about social organization as it is about technological innovation.

With new engineers joining and experienced ones leaving, maintaining the same velocity at a growing scale & stakes can be difficult.

AWS aims to automate as much as possible. They have integrated extensive property-based tests iterating through large sets of request patterns as well as lightweight formal verification in their CI/CD pipeline. Both features automated giving engineers confidence in the 

AWS have shared that they internally follow a “Durability Threat Model” design approach, borrowed from the popular security threat model. When designing features, they take into account any potential threats to durability and ensure they think critically about mitigations for all of them.

They stress the need to ingrain durability into the organization’s culture and ensure continuous processes that maintain it.

Conclusion

S3 is essentially a massively multi-tenant storage service. It’s a gigantic distributed system consisting of many individually slow nodes that on aggregate allow you to get heaps of data fast.

Through economies of scale, its design makes certain use cases that were previously cost-prohibitive - affordable. Bursty workloads that store a lot of data, stay idle for months and then read everything in a short burst, like genomics, machine learning and self-driving cars, can use S3 without the need to pay the full price of all the drives they need if they had to deploy it themselves. It’s an interesting case study of how scale unlocks cost-effectiveness and makes management easier.

Resources

This blog post was formed by a collection of public resources AWS have shared regarding S3. Here is a short list if you are interested in reading more:

省军区司令员是什么级别 什么降血糖 才字五行属什么 舌头什么颜色正常 哥谭市是什么意思
空调制冷量是什么意思 皮肤自愈能力差缺什么 乙酰氨基酚是什么药 看肛门挂什么科 八月是什么月
膝关节疼是什么原因 什么是人格 高血糖是什么原因引起的 汗多尿少是什么原因 功德是什么意思
不小心怀孕了吃什么药可以流掉 神经鞘瘤挂什么科 飞机杯有什么用 chocker是什么意思 卵巢囊肿是什么
黑户是什么意思gysmod.com 成人改名字需要什么手续cl108k.com 痛风可以吃什么鱼hcv8jop9ns5r.cn 五月17号是什么星座inbungee.com 胸闷气短是什么原因引起的hcv9jop3ns7r.cn
息肉和囊肿有什么区别hcv7jop4ns5r.cn 乏力是什么意思hcv7jop9ns4r.cn 吃酒是什么意思hcv9jop2ns8r.cn 戒备心是什么意思hlguo.com 腿弯后面疼是什么原因hcv7jop5ns2r.cn
脆哨是什么hcv7jop6ns3r.cn 京东自营什么意思hcv9jop4ns5r.cn 绝对零度是什么意思hcv8jop1ns2r.cn 面瘫吃什么药好hcv9jop7ns9r.cn 喝酒胃出血吃什么药hcv8jop0ns3r.cn
蔻驰手表属于什么档次hcv9jop5ns8r.cn 茂盛的意思是什么hcv8jop4ns3r.cn 每天半夜两三点醒是什么原因hcv9jop6ns1r.cn 如来是什么意思hcv8jop7ns2r.cn 同舟共济什么意思hcv9jop1ns2r.cn
百度