System Design

2025-04-04

How would someone to deal with non-atomicity in NoSQL databases?

If it makes sense to your data, you can denormalize it. For example, instead of referencing the user data in the orders document, you simply duplicate the relevant part of the user data in the orders document.

This creates data duplication, redundancy and eventually inconsistency because the user document may have changed at time Y while the order data duplicated the data at time X.

Does a configuration with a NoSQL database running in a single node is consistent?

It is not a common scenario but if I am using a NoSQL database with a single node, then it is consistent all the time. It is also true that we still have the drawbacks of NoSQL databases such as no enforcement of ACID principles, for example, some NoSQL databases do not offer transactions and no rollbacks in case of failures.

2025-04-03

Horizontal and vertical partitioning of tables, when to use?

Comparison between read-through cache and write-trough cache strategies

Aspect	Read-Through	Write-Through
Focus	Read operations	Write operations
Data Retrieval	Cache fetches from data store on misses	N/A
Data Update	N/A	Cache writes to data store synchronously
Consistency	May have stale data	Strong consistency
Performance	Fast reads on hits, slower on misses	Slower writes due to synchronous update
Use Case	Read-heavy applications	Applications requiring data integrity

What are common strategies in a write-heavy system?

We can think of writing the operations in a buffer log that is flushed in a constant rate. Besides reducing databases hops, it also keeps a registry of the write operations reducing data losses.
We can also think of horizontal data partitioning. For example, partitioning a user metadata by user id helps us to distribute the load over the partitions.
We can also adopt a task-queue strategy in which write operations are sent to this queue and processes asynchronously.
We can also think of a quorum strategy if we are concerned about load distribution and consistency issues. We can stablish that a write is considered successful if we have written the data into 3 databases. This should be thought carefully because it will likely improve latency, but if we have many replicas, the load might be distributed well enough to allow such strategy.

How it works in practice a master/slave database strategy in a read-heavy system?

All writes are handled by the master database. A log with all the writing operations is kept and processed from time to time. At each time it is processed, the data is replicated into the slaves (replicas). This architecture distributes the read load among several replicas, centralizes the writes and delivery eventual consistency.

How to deal with bursts in traffics or how to implement a traffic limit?

One technique is the token bucket algorithm. In this technique, data is allowed to flow only if there is a token available in the bucket. Tokens are added in a steady rate to the bucket. Therefore, this technique allows an initial burst, consuming all available tokens, and then data starts to flow in the same rate as tokens are delivered.

Use case example: Video service streaming.

2025-04-01

What are some common API protocols?

Rest
SOAP
GraphURL
RPC: Remote Procedure Call Create a .proto file specifying the interface contract. Calls are directly mapped to procedures in the server. Return serialized data using proto buffers, which are much smaller than textual data. Auto code generation Other optimizations that rely on the more rigidity of the contract also apply.

2025-03-31

What are some common operations in MongoDB?

MongoDB Guides

Can I connect file streams to CDN to reduce bandwidth requirements on my server?

Yes, it is possible to do that. Most CDNs provides a signed URL which users from your service can use to upload images to the CDN directly.

How composite keys are stores in B-Trees?

They are registered as tuples. For example (id1,id2). It is important that the tuple domain is totally ordered.

2025-03-28

NoSQL and wide-column stores demystified

2025-03-24

System design steps

Analysis

Design independent data

Number of users
Number of requests
Average request size
Average store object size

Design dependent data

Which hashing algorithm to use to ensure low conflicting rate over a period of time?
What consequences the choice above imposes over storage or performance?

Metrics

Storage requirements over 1,5,10 years
Bandwidth per day
- Write x Read

What is a object store?

Youtube

2025-03-10

Learn more about zookeeper

How search engines as ElasticSearch implements an inverted index?

How to choose which column to index?

Indexing column A improves performance of querying the table whenever we need to execute a WHERE like operation on the table. Instead of scanning all the data, we scan the index, which likely created some smart structure such as a B-Tree using the selected column as key.

What is the difference between Memcached and Redis

Both are in-memory data stores.
Memcached is not persistent. Redis offer ways to achieve persistence.
Memcached is a simple key-value data store and only accepts string or binary data. Redis accept lists, hashes, bitmaps.

2025-03-09

System design master template

It was said that: "Design Youtube" is not a good question for a senior engineer. It is better to be more specific. So what type of exercise should I expect?
You can implement some security measures in the load balancer, assuming that is the first point of entry in your system. Examples are to prevent DDoS and scary attacks.
Practice with Excalidraw.

What is the difference between tracing and logging?

The goal of tracing is to have an holistic view of how different components of the system communicate with each other. In distributed systems it is very common to use tracings to track down a request through all the intermediate services and detect eventual bottlenecks, as it is common to output tracings in an hierarchic way with a time duration attribute.
Logging is defined in the scope of a single application or service and it is mostly used for debug purposes.

What is a coordination service?

Zookeeper is an example. For example, the video processing service needs to make a call to the notification server, that is, it needs to discover where the notification server endpoint is and make a request. Instead of hard-coding the endpoint in the video processing server, let Zookeeper (coordination service) to handle that.

In this sense, coordination service is the service authority of the system. It is the place that has the most update endpoint to other services.

What is ElasticaSearch?

It is an open-source search index.

How to design a notification system?

Monitoring
Distributed logging
Distributed tracing

Can you use Redis/Memcached to cache relational database data?

What is a shard manager?

It decides when is the moment to rething the sharding strategy: Add more shard tables? Remove shard tables? Reallocate data due to excess load in one of the shard tables?

What are serverless functions?

What is a block server? How youtube videos are stored?

2025-03-07

What are examples of distributed messaging queue systems?

Apache Kafka.

What is a distributed messaging queue?

In an event-driven architecture we have producers and consumers. Producers are the nodes that request processing and consumers are those that can process the request.

We also have brokers, which are the nodes that makes the communication between producers and consumers.

One feature of messaging queues is that we hardly have message loss. The broker will hold the message until a node that can process it becomes available. If a broker goes down, there are replication strategies that can be used to avoid data loss.

How DNS system is structured?

The DNS hierarchy is composed of:

Authoritative name server: The server that actually contains the map between a domain and its IP address.
Root DNS server: They forward the DNS request to the proper TLD server. There are 13 root servers in the world.
Top Level Domain server: There is one for each domain (.com, .net...)
Recursive server: Lies in between the Root server and the Authoritative name server.
Caching server.
Forward server: Mainly used in organizations to centralize DNS requests.

How nodes can check if data get from another node is not corrupted?

Use checksum.

What is heart-beat message in system design?

It is a way that servers can broadcast that they are up and running. Servers periodically send this heart-beat message that will be used by their pairs to know if it is available to receive requests. This reduces the number of requests without responses (which improve latency).

What is an alternative to quorums?

Leader-Follower strategy. In this design, a single node is responsible to manage the reads and writings. Requests arrive to the Leader that trigger read/write operations on the Followers.

One advantage is that the leader can do a quick poll on the nodes to check which one has the most updated data before returning, improving consistency. But it also has the disadvantage of becoming a bottleneck.

Where a Bloom filter is used?

In every scenario where data lookup is expensive. Before doing a data lookup, we check if the data is present. If the data is not present, we do not need to do the lookup.

What is a Bloom filter?

It is a bitmap used to detect the presence of data. For each data, k hash functions are executed and the corresponding positions in the bitmap are set to 1. An entry is definitely not present if there is at least one corresponding position in the bitmap that is set to 0.

What are the differences between WebSockets and SSE

WebSockets offers a full-duplex communication between client and server while in SSE only the server can keep sending data to client.
WebSockets is a protocol itself that needs to be implemented, where SSE is more like a technique that operates on HTTP.
WebSockets are more complex to implement because it might need extra configuration of proxies and firewalls. SSE is much simpler because it is a feature of HTTP.

What are the differences between Pooling and Long Pooling?

The first strategy needs regular requests from the client with a majority of them being empty. The later replies only when new data is available.

What are common techniques to enable client server communication?

Request/Response
Polling
Long Polling
WebSocket
Server-Sent-Events

2025-03-06

How redundancy is implemented in a consistent hashing?

Redundancy is implemented by storing the entry in the node associated to its hash value and in the next R nodes in the ring (clockwise). The value R is the number of copies and it is a parameter of a consistent hashing implementation.

What is consistent hashing?

Consistent hashing was developed to solve a common issue with data partition.

When partitioning data, we apply a hash function to a group of its columns (or id) and then compute the hash modulo N to decide in which partition to store the entry.

The problem with this approach is that as soon as we add a new node we need to reassign all the data because the hash is invalidated.

For example, I had 5 nodes and I was using modulo 5 to decide which node to store an entry. With 6 nodes I need to compute modulo 6. The id 12 is not more stored in node 2, but in node 0.

Consistent hashing assigns a range of the hash function image to each node. This is done by assigning a hash value (token) to each node and arranging them around a ring. The range for node N consists into all entries which the hash function falls between the token of N and the token of next node minus 1.

We can minimize the creation of bottlenecks (hotspots) by using virtual nodes. Virtual nodes are mapped to a real node, but they act as nodes in the sense that they are spread over the ring as well.

How redundancy can improve reliability of a system?

First, it is always a good idea to have a copy of data in case of data loss. This was already common knowledge before the born of distributed systems.

Second, by keeping one or more copies of data and distributing it over several server, we improve the reliability of the system because we are improving its availability. If a server fails, we can fallback to another one.

Additionally, in read operations, we can distribute better the charge over the databases, improving response time.

Redundancy comes with a cost, which is: maintaining the copies. Synchronous copies introduce latency and asynchronous copies produce temporary inconsistency. An hybrid approach is usually adopted and this is very related with the concept of quorums.

How indexes are implemented in databases?

B+-Trees are commonly used. They are a variant of a B-Tree and inherits its advantages such as:

Fewer disk accesses (each node is a block in disk)
It is a balanced structure, which guarantees optimal performance.

Additionally, the B+-Tree variation stores all data in the leaf, which themselves form a linked-list which optimizes range-search, a very commons operation in databases.

How indexes can improve performance in databases?

An index can speed-up the lookup in tables (reading) but it adds an overhead on writing, since we need to maintain the index updated after an insert, deletion or update.

2025-03-05

What are some difficulties with database partitioning?

Queries that use joins do not perform well. That's because we might need to make the join over data that is distributed over several nodes.
Referential integrity: How do you maintain the integrity of foreign keys? Imagine the scenario in which we have table A that has a reference to table B. We add an entry in Table A that uses an id from B. First, we would have need to check on all partitions of B if the id is valid; but then, we also need to lock all the partitions of B because we might have a deletion meanwhile. As you see, it became very complicated and a source of additional latency.

What are some techniques of data partitioning?

Horizontal partitioning: Split the rows of tables in several tables. The tables are then stored in different servers / databases. Ex: By geographic position; using a hash function on some of the columns. Vertical partitioning: Create separated tables for one or more group of columns. Hybrid partitioning.

How vertical partitioning a table might help performance?

Vertical partitioning a table means to create separated tables for frequently accessed columns. Vertical partitioning minimizes IO/Load, since we have less data stored in the table to scan.

For example, imagine that we have a table with 20 columns. We observe that two columns are frequently requested. We then create a vertical partitioning of the table. That is, a table with 19 columns (18 + id) and another one with 3 columns (2 + id).

Considering the B+-tree nodes, the data is stored in a leaf node. But since we have only three columns in one of the vertical partitioning, we increase the amount of data that fits into a storage/memory block. Therefore, we do fewer IO/accesses to render the result of a query, for example.

A plus is that we also improve the chances of a cache hit, since we can fit more that in the cache.

Think about a node in a B+-Tree

How map-reduce framework fits in data partition concept?

Map-reduce is a framework to process large amount of data in a distributed system.

The problem map-reduce solves is the following;

We want to execute a process P on data D, but D is extremely big.

Splits D into N partitions and executes the map task in each of it.
Each map task produces a key-value pair. The key identify all map results that should be processed together in the reduce phase.
Then, send to a reduce node all the elements with the same key.

Example: Compute how many times each word appear in a text.

What are the advantages and disadvantages of each write strategy?

Write-back cache: Low response time / eventual inconsistencies.
Write-through cache: Higher response time / Higher consistency.
Write-around cache: Better response time than (2) / Same consistency as (2)

Which write strategies are available for cache?

Write through cache: Write directly into cache and then the cache synchronously make a request to write the data in the origin source.
Write around cache: Write directly on the origin source. In the next cache miss, the cache will update itself.
Write back cache: Like write through cache but data is written in the origin source asynchronously.

Which read strategies are available for cache?

Read through cache: Always read from cache, if data is not available (cache miss), then fetches the data from origin source, updates the cache and then return it.
Read aside cache: In this case, the cache system does not update itself. An application makes a request to cache and if fails, the application makes a subsequent request to the origin source. Then, if the application wishes, it makes a request to update the cache.

What are some common cache invalidation techniques?

Purge: Removes an object from cache.
Refresh: Fetch the content from origin source and updates the cache content.
Ban: Forbids the cache so store objects that follows a certain pattern.
TTL: Executes a Refresh operation as soon as the TTL is expired.
Stale-while-revalidate: Always serve an entry in the cache but asynchronously make a fetch on the origin server to update the cache.

What should be considered when implementing a cache?

In distributed systems, we should consider how the caches will be updated. One technique consists into invalidating an entry in the cache when the same is updated.

How to reduce likelihood of data loss during a client request?

A system is considered unreliable if it behaves different from what the client expectations. If a client makes a request that contains data on it, it suffices to reply with acknowledgement message to the client as soon as the data is processed. To reduce the likelihood of data loss, consider that the data has been received only after it was processed/copied by at least K nodes in your system.

What are some common caching strategies?

In-memory caching: In distributed systems, it is common to use Memcached or Redis for that.
Client-side caching: Browser caching
Database caching.
CDN caching
DNS caching.

What is a CDN?

CDN stands for "Content Delivery Network". It is a technique to reduce latency and improve throughput on the content served by a system.

This is done by creating a network composed of Edge servers that are distributed globally. Users are connected to the closest Edge server. The edge server keeps a cache locally and it can retrieve the requested data without passing through the source server.

Akamai is an example of static CDN.

2025-03-03

How proper quorum configuration can improve your system?

Fault tolerance: By considering an operation to be completed whenever a certain number of nodes has completed it enables the system to continue operating even if some nodes are unavailable or for some reason have failed to complete the operation.
The side-effect this is consistency. If an operation is said to be completed before being executed in all nodes, that means that we allow nodes to temporarily have divergent data.

How the quorum choice can impact availability?

If we choose a high quorum for an operation and the system faces a network instability, the operation might be blocked for a while.

What is a quorum in the context of distributed systems?

A quorum is the minimum number of nodes that should agree or participate in an operation for the operation to be considered a success or completed.

For example, consider read and write operations.

R:1/W:3 : In this case, read will always be available and the fastest as it can be since it just need to read data from one node. On the other hand, I need to write the data in 3 nodes before considering the task to be completed, which can slow writing operations down, but guarantees full replication.

What is the difference between a Forward and Reverse Proxy?

Forward and Reverse Proxies

What is the PACELC theorem?

It is the extension of CAP. In case of no network partition, the system should tradeoff between latency and consistency.

What is the CAP theorem?

In the presence of a network partition, a distributed system must choose either Consistency or Availability. It is impossible to have both.

Consider a system that is always consistent. A network partition occurs and communication between nodes A and B fail. Assume that the system is also always available. But then, if A cannot communicate with B, that means that A and B do not have the same information, which means that the system is inconsistent. But this it not possible because the system is always consistent. Then, at some point we should stop serving the system and restart only when A and B communication is restablished, but then the system is not always available.

A network partition is a failure in the network, for example, communication fails between nodes A and B.

How to deal with node failures?

Consider the case in which two nodes A,B of your system need to communicate with each other. The node B fails. How to design the system such that A can respond the request even with the failure of B?

We should have at least more than one instance of node B. Additionally, as soon as node B goes down, some process is triggered to revive it.

What about if A and B must communicate via several stateful turns?

Both nodes should be able to have a copy of the state in order to reproduce a request in case of failure.

What does the BASE compliancy stands for?

Basically available, Soft-State, Eventually consistent. NoSQL databases are usually BASE.

What does the ACID compliancy means?

Atomic, Consistency, Isolation, Durability. SQL databases are ACID.

What is sharding?

It is data partition. Sharding consists into split data into multiple nodes. This concept is often used in distributed databases.

What are some challenges of sharding?

What is the main reason of using NoSQL databases?

SQL databases are complicated to scale horizontally.
- Keep ACID policy in a distributed environment is difficult and can increase latency.
- How to sharding the data? Moreover, how to execute SQL queries across several nodes?
- There are some databases available to offer the best of both words. The so called NewSQL databases.
  - Google Spanner
  - CockroachDB.
In prototype system, where the requirements are not completely defined, choosing a SQL database slows you down because you need to change the schema every time you need to add or remove a column in your table.
NoSQL databases can also be more suitable for storing unstructured data.

What are the differences between Redis and MongoDB?

Both are No-SQL databases.
Redis is a key-value pair database.
- Strings, Lists, Hashes and Bitmaps are some examples of data types. All of them holding a key, though.
Mongo is a document database (BSON).
Both attempt to load part of the data in memory to speedup operations.

Manageability

How difficult it is to maintain and operate the system? A good design implement components that help the system manager to quickly detect errors when they occur (monitoring, logging) and have a reduced maintenance downtime (ideally no downtime).

Notice that manageability influences availability.

How to measure efficiency?

Response time (how much time it take to process a request)
Throughput (how much data was delivered in a period of time)

What is the difference between Reliability and Availability

A system with high reliability has a high availability but the opposite is not necessarily true.

Availability is to be always available. In particular, keep the throughput even during high loads.

What is the difference between reliability and fault tolerance

Reliability is the robustness of each individual components. How often does a component fail? How often a component goes down? How a component behaves during high load?

Fault tolerance is how the system detects, isolate and recovery from components breakdown. Does the system can keep operating with a missing component (even if in a reduced way)? Does the system can recover itself in case of critical error? Does the system can operate if a node goes down?

Reliability is a user-centric concept while fault tolerance is a system-centric tolerance.

What are the key measures of a system design?

Scalability
Reliability
Availability
Efficiency
Manageability

Design problem examples

We have design X for the system A. What do you propose to reduce our costs in storage while keeping our metrics at the same lever or better?

What are some tips during the interview?

Consider and explain trade-offs.
Listen the interviewer and their hints. Be prepared to adapt your design accordingly.
Ask clarifications. Question the interviewer. Consider the design process as a collaborative process, a dialog.

Back-of-Envelope Estimate

It is helpful to estimate the resources needed by a system. Some examples of metrics to estimate are:

Number of requests per second
Bandwidth (bytes per second)
Storage (MB per second)
Latency
Processing capacity (number of cpus, servers, memory...)

Estimation Reference Table

Operation Name	Time
L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns
Mutex lock/unlock	100 ns
Main memory reference	100 ns
Compress 1K bytes with Zippy	10,000 ns = 10 μs
Send 2K bytes over 1 Gbps network	20,000 ns = 20 μs
Read 1 MB sequentially from memory	250,000 ns = 250 μs
Round trip within the same datacenter	500,000 ns = 500 μs
Disk seek	10,000,000 ns = 10 ms
Read 1 MB sequentially from network	10,000,000 ns = 10 ms
Read 1 MB sequentially from disk	30,000,000 ns = 30 ms
Send packet CA→Netherlands→CA	150,000,000 ns = 150 ms

Network transfer: Gbps
Storage: MB, GB

2025-03-02

System design sketch/diagram

First pass

Identify necessary components.
Make connections between them.

Second pass

For each component, answer the questions:
1. How to scale?
2. How to make it robust to failures?
3. How to improve performance?
4. Is it safe? How can be exploited and how to prevent?
For the questions above, we may accept different answers depending on the level of the project.
If it is a prototype, we can relax some constraints and simplify infrastructure.
If it is a final product version, then delivery the most performant, robust, scalable and safer design.

Functional and Non-Functional Requirements

Functional: What a system should do.
Non-Functional: How the system performs a task.
Clarify requirements. Make sure to get all functional and non-functional requirements.
Prioritize. Some requirements might be more important than others.
Discuss trade-offs.

API Gateway x Load Balancer

Load balancer is more focused on distributing the traffic to improve throughput and reduce response time. An API gateway is more about centralizing, managing and routing.

In general, an API gateways can hold much more responsibilities than a load balancer. An API gateway is always a piece of software, while, load balancers can also be devices.

API Gateway

It is usually a service that acts as the entry point of several other backend services.
It is commonly used in microservice architecture.
Can also execute load balancing tasks.

What are the uses of API Gateway

Monitoring / Logging
Data transformation.
Caching.
Load balancing.
Authentication.
API versioning.
SSL termination: Decrypts data at this points and eventually do not encrypt again for next requests (assuming that you have other security measures in place that allows you to not use SSL, for example, you are in an internal network)
Policy enforcement
A/B and canary test.
Localization and Internationalization.

What are the precautions to take when implementing an API Gateway

Do not become a single point of failure and a bottleneck. Proper configuration solves that.

If I want to implement an API Gateway to my services, how to proceed?

There are some solutions available:

Amazon API Gateway
Kong (open source)
NGINX

Load balancer FAQ

How to build a server infrastructure that ensures low latency among the servers and optimize throughput to users even during peak times?

One needs to construct a load balancing. That could be, for example, a middleware service that every request to your system passes through. The main task of this service, which should be lightweight, is to assign the request to one of your servers in your farm.

Where to put a Load Balancer?

Everywhere in between a request and a service. For example:

Between client and web server.
Between web server and backend/API application.
Between API and database.

What are the roles of load balancer?

Distribute requests between servers.
Execute periodic health checks.
Centralized SSL/TSL encrypting and decrypting.
Session persistence: the same client is assigned to the same backend server during the whole session.

What are the typical algorithms implemented by a load balancer?

Random.
Round Robin / Weighted Round Robin.
- Better when servers are homogeneous and application is stateless
- Weighted version can be used in non-homogeneous server farmers.
- It is predictable, which can be exploited by attackers.
- Both versions do not take into account the load of a server in real time
Least Connections / Weighted Least Connections
- Dynamic load balancing.
- Still not good for stateful applications.
- It requires load balancer to book keeping active connections which might create an overhead, mainly during peak times.
IP Hash
- Good for stateful applications.
- Simple to implement.
- Do not offer dynamic load balancing
Least response time.
- Dynamic load balancing.
- Requires system monitoring and statistics computation.
Least bandwidth
- Suitable for video streaming.
Custom Load

Which load balancing algorithm is more suitable for stateful applications.

IP Hash
URL Hash (in this case, the state is encoded in the URL)
Cookie-based Persistence

What are the use cases of a Load Balancer?

Improve service performance. That is, reduce latency and maximize throughput.
Improve resilience to failures. If a server goes down, directs the request to another one.
Scalability. During peak times, or increase in demand, simply add a new server to the pool.

How load balancers can be implemented?

In hardware. We can have specialized devices which have the distribution algorithm implemented in their circuits. It is a very optimized solution, but it might be difficult to maintain and scale.
On software. In this case, the load balancer is like any other service that is hosted on a server. It is more scalable but less performant than hardware load balancers.
Cloud-based solutions.
DNS load balancing.
Global server load balancing.
Hybrid load balancing
Layer 4 load balancing (transport layer: TCP)
Layer 7 load balancing (application layer: HTTP, e.g.)

How to distribute load on the load balancer?

The load balancer is a service like any other and the same strategies are applied to it. We can have several layers to distribute the load between load balancers, for example. But I believe that as high it is a layer in the hierarchy the faster and more performant should be. Otherwise it becomes a bottleneck.

Additionally, one can implement load balancing in the DNS level by mapping multiple IPs to the same domain/address. For example, the service www.shoping.com is mapped to three different IPs, one for North America, other for Europe and another for Asia.

Why would someone implement a stateful application over stateless?

Real time applications. Stateless applications need to do session management which might involver doing requests to other services (or writing data on client's cookies). Some real-time applications may not tolerate this extra latency.

Stateless and Stateful applications

A stateless load balancer can make a decision based on the client IP address. Therefore, it suitable for stateful applications, right?

This is true in some cases, but users coming from a mobile network might have their IDs frequently changed. It won't be a good experience to loose its session every time its id changes.

On the other hand, stateful load balancers most hold information that represents the users session.

Periodic health checks: Allows a load balancer to not redirect traffic to a server that is down.
Monitoring: Depending on the load balancer strategy, the load balancer needs to know statistics about the services it can reach.

How load balancers keep in sync with each other?

Using a centralized configuration store (e.g.: etcd, Consul, ZooKeeper) to make sure that all load balancer in the same level have consistent behaviour.
Using a distributed database (Redis, Memcached) to keep all load balancers in sync with respect to data that all of them need to know (for example, user sessions in case of stateful load balancers)

How to scale load balancers?

Horizontally: More load balancers instances.
Vertically: More resources allocated to a load balancer instance.

How to improve performance?

Limit number of connections per load balancer.
Define a request rate limit by user/ip.
Use of caching to serve some files such as images, scripts.
Minification (to reduce bandwidth)