← index
Designing Data-Intensive Applications
Reliability
this means the system should work at it’s required performance with functionality regardless of hurdels such as hardware or software faults or human error
fault vs faliure
A fault is like deviating of a component from it’s objective. A failure is like complet cease of the operation. it’s very hard to remove all the faults but we should atleast handle them all not letting them turn into faliures. so the goal is to make fault taulrant system
types of faults
-
hardware faults HD crash, memory overflow, power outage, network outage etc. are very common for data centers and cloud service providers. increase data volume and cloud usage have sifted tech towards making software fault-tolerance techniques meaning the show must go-on even when our mens are down, like rolling up without downtime!, state backups are necessary.
-
Software faults bugs can be harder to anticipate and can result in corelated failure, which can even spread to other nodes as well.
-
Human error humans like ai are unpridictable, if there is a slight chance of error it will happen. hence our system should handle them properly. Well designed abstraction, API, admin interface are good practices. allowing quick recovery from mishaps ex. fast reliable rollups, rolling out changes/features gradully to be able to manage the costly fire. setup clear monitoring i.e. error rates, performance matrics for having a clear picture with history for every event this is called telemetry
Importance: lost productivity, trust, data, legal risk, lost revenu and reputation.
Scalability
system’s ablity to copeup with increase in load. it not binary rather a discussion of options for handling growth and increase use of compute reliablly.
what is load?
to discuss growth, we need load parameters to quantise it properly.
- requestes per second.
- read to write ratio.
- active users.
- cache hit limit.
twitter example fan-out load.
as users tweet and this tweet needs to be delivered to the followed users and there followed user’s homefeed timeline within perticular time (twitter standard 5 sec!!).
approach 1 when a user tweet the tweet is inserted to a global collection of tweets, when the homefeed is requested a sql query is runned, which finds all the prople they follow and get all the tweets sorted with time and presented to the user. This is not very effecient as it’s expensive and slow as it require for all the users to get runn this query every time to request homefeed.
approach 2 mentain a cache for each user’s timeline as a followed user tweet something the tweet is then added to the cache providing faster and less expensive approach. which twitter followed. BUT the problem with this approach is some users can have millions of followers, as that user tweet something the info have to be added to those millions of timelines, which is expensive.