Key considerations for database performance,

1. Indexing

Purpose: Speed up read queries by avoiding full table scans.
Types of indexes:
- B-Tree Index → default, best for equality & range queries.
- Hash Index → best for equality lookups (Postgres supports it).
- Composite Index → for queries filtering on multiple columns.

Example:


-- Single column index
CREATE INDEX idx_email ON users(email);

-- Composite index for queries with WHERE first_name AND last_name
CREATE INDEX idx_name ON users(first_name, last_name);

Query using index:


SELECT * FROM users WHERE email = 'test@example.com';

👉 More indexes = faster reads but slower writes (INSERT/UPDATE/DELETE must also update indexes).

2. Query Optimization

Purpose: Reduce query cost and execution time.
Techniques:
- Fetch only necessary columns (SELECT col1, col2 instead of SELECT *).
- Use EXPLAIN to check execution plan.
- Use JOINs instead of subqueries where possible.
🔹 Examples

1. Avoid SELECT * (fetch only required columns)
👉 Bad (full scan + unnecessary columns):
SELECT * FROM orders WHERE status = 'PENDING';

👉 Better (narrower result set, index on status helps):
SELECT order_id, customer_id FROM orders WHERE status = 'PENDING';

2. Use EXPLAIN to analyze queries
EXPLAIN shows whether a query uses an index scan or full table scan.

Example 1: With index
EXPLAIN SELECT order_id, customer_id FROM orders WHERE status = 'PENDING';

Output (simplified):
Index Scan using idx_orders_status on orders

👉 Efficient, uses index.

Example 2: Without index
EXPLAIN SELECT order_id, customer_id FROM orders WHERE email = 'abc@test.com';

Output (simplified):
Seq Scan on orders

👉 Full table scan → consider adding index on email.

3. Replace subqueries with JOINs

Subquery (less efficient):

SELECT order_id, total_amount
FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE country = 'USA'
);

Better with JOIN:

SELECT o.order_id, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'USA';

👉 DB can optimize using indexes on orders.customer_id and customers.country.

3. Connection Pooling

Purpose: Reduce the cost of creating new database connections.
🔹 Spring Boot (HikariCP – Default Connection Pool)
Spring Boot automatically configures HikariCP as the default connection pool if it’s on the classpath (no extra setup needed). You can fine-tune it in application.properties or application.yml:
```
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.idle-timeout=600000       # 10 minutes
spring.datasource.hikari.max-lifetime=1800000      # 30 minutes
```
🔹 How It Helps
- Without pooling → Each DB request opens a new connection (slow & resource heavy).
- With pooling → Connections are created once, kept alive, and reused.
👉 This drastically improves performance under load, since connections are expensive to create.
Example:
- 100 requests without pooling → 100 new connections (huge overhead).
- 100 requests with pooling → Reuses 5–30 pre-opened connections (fast).

4. Caching

Purpose: Reduce repeated DB hits for frequently accessed data.
Example (Spring Boot with Redis):


@Cacheable("user")
public User getUserById(Long id) {
    return userRepository.findById(id).get();
}

First call → fetches from DB and caches.
Next calls → return from cache until eviction/expiry.

5. Data Model & Schema Design

Normalization: Avoid redundancy and maintain data integrity (3rd Normal Form). Ideal for transactional systems where inserts, updates, and deletes are frequent.

Denormalization: Sometimes duplicate or precompute data to optimize read performance. Useful for reporting, dashboards, or read-heavy APIs where query speed is critical.

Example:

Normalized (good for transactions):


orders(order_id, user_id, product_id, qty)
users(user_id, name, email)

Pros: minimal redundancy, consistent updates, easier to maintain integrity.

Denormalized (good for reporting):


orders_summary(user_id, name, email, total_orders, total_spent)

Pros: faster reads, aggregated data readily available.
Cons: requires extra effort to keep data in sync with transactional tables.

Tip:

Use normalized tables for transactional operations and denormalized tables for read-heavy reporting or dashboards — often a hybrid approach works best.

6. Partitioning

Purpose: Split a large table into smaller, manageable pieces to improve query performance and maintenance.

Types:

Range Partitioning → split by ranges (dates, IDs, etc.).
List Partitioning → split by specific values (e.g., region = 'US', 'EU').
Hash Partitioning → distribute evenly by hash of a column for load balancing.

Example (PostgreSQL Range Partition):


CREATE TABLE orders (
    id BIGINT,
    order_date DATE NOT NULL,
    amount DECIMAL
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2024 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Query benefit:


SELECT * FROM orders WHERE order_date >= '2025-01-01';

→ DB scans only the orders_2025 partition, reducing the number of rows scanned and improving performance.

Additional notes:

Partitioning also makes archiving or purging old data easier (drop old partitions instead of deleting rows).
Indexes can be local to partitions or global, depending on your DB engine.
Best for very large tables where full table scans are costly.

7. Sharding

Purpose: Distribute data across multiple databases or servers when a single database cannot handle the load, improving scalability and performance.

Strategies:

Range-based → Customers A–M → DB1, N–Z → DB2.
Hash-based → Hash(user_id) % 4 → pick one of 4 DB shards.

Example:

Sharding by region:

Users in Asia → db_asia
Users in Europe → db_europe
Queries go only to the relevant database, reducing unnecessary load.

Java pseudo-code (shard routing layer):


int shardId = userId % 4;  // pick DB shard
DataSource ds = shardManager.getDataSource(shardId);

Additional notes:

Sharding works best for horizontal scaling when table size or traffic exceeds the capacity of a single DB.
Maintain a consistent sharding key (e.g., user_id) to avoid uneven distribution.
Challenges:

Cross-shard joins and aggregates are difficult → usually handled in the application or via federated queries.
Rebalancing shards can be complex if data grows unevenly.

8. Concurrency & Locking

Problem: Multiple transactions may try to read/write the same data simultaneously, which can lead to inconsistencies or conflicts.

🔹 Lock Types

1. Optimistic Locking

Idea: Assume conflicts are rare. Don’t lock rows up front — instead, check a version column before committing. If another transaction has updated the row in the meantime, your update fails, and you retry.

When to use:

Read-heavy workloads with low contention
Financial/accounting, analytics, or reporting where retries are acceptable

Example 1: SQL with Version Column

Table: accounts

account_id	balance	version
101	500	2

Two users try to withdraw money at the same time:

Transaction A (withdraw 100):

UPDATE accounts

SET balance = balance - 100,

version = version + 1

WHERE account_id = 101 AND version = 2;

Transaction B (withdraw 50, also sees version = 2):

UPDATE accounts

SET balance = balance - 50,

version = version + 1

WHERE account_id = 101 AND version = 2;

Outcome:

Transaction A succeeds → balance = 400, version = 3
Transaction B fails (no rows updated because version changed) → must retry

Example 2: JPA/Hibernate with @Version

@Entity

public class Account {

@Id

private Long accountId;

private BigDecimal balance;

@Version

private int version;

}

Hibernate automatically checks the version field when updating.
If another transaction updated the row first, Hibernate throws OptimisticLockException.
Application can catch this and retry the withdrawal.

Pros

Minimal blocking → supports high concurrency
Works well in read-heavy systems with occasional writes

Cons

Requires retry logic in case of conflicts
Not suitable for heavily contended rows (e.g., a single "hot" account being updated frequently)

2. Pessimistic Locking

Idea: Lock the row when reading, preventing other transactions from modifying it until the current transaction completes.

When to use: High contention scenarios, critical updates (e.g., e-commerce checkout).

Example 1: (SQL)


SELECT * FROM products
WHERE id = 100
FOR UPDATE;

Other transactions trying to update this row must wait until the lock is released.

Example 2: Java (JPA/Hibernate)


Product product = entityManager.find(Product.class, 100L,
                  LockModeType.PESSIMISTIC_WRITE);

Pros:

Prevents conflicts upfront → no retries needed.
Ensures strict correctness in critical operations.

Cons:

Reduces concurrency → other transactions may block.
Can cause deadlocks if not carefully managed.

🔹 Isolation Levels

Level	Description
READ COMMITTED	Cannot read uncommitted changes.
REPEATABLE READ	Same row always returns same value in one transaction.
SERIALIZABLE	Strictest; transactions behave as if executed sequentially.

🔹 Example Use Cases

E-commerce checkout: Pessimistic lock on product row to prevent overselling.
Analytics query / reporting: Optimistic lock or no lock, since stale reads are acceptable.

🔹 Tips:

Use optimistic locking for high-read, low-write scenarios to maximize throughput.
Use pessimistic locking for high-risk write operations where conflicts are costly.

9. Monitoring & Metrics

Purpose: Ensure your database is performing optimally and detect issues early.

Key Metrics to Watch:

Query latency → slow queries indicate missing indexes or inefficient joins.
Connection pool usage → active, idle, and waiting connections (e.g., HikariCP metrics in Spring Boot).
Deadlocks → monitor conflicts that can block transactions.
Throughput & load → number of queries per second, CPU/memory usage.

Example:

Set up Prometheus to collect database metrics and Grafana to visualize them.
Track Postgres/MySQL metrics such as:
- pg_stat_activity / information_schema.processlist
- HikariCP metrics: hikaricp.connections.active, idle, pending

Tips:

Alerts on high query latency, maxed-out connection pools, or frequent deadlocks help prevent performance degradation.
Combine application-level metrics with DB metrics for full visibility.

10. Hardware & Configuration

Purpose: Ensure the database has sufficient resources and is tuned for optimal performance.

Hardware Considerations:

SSD vs HDD → SSDs provide drastically faster read/write speeds and lower latency compared to HDDs. Essential for high-performance transactional or analytical workloads.
Memory & CPU → More RAM allows larger buffer/cache usage; CPU cores affect query processing and parallel operations.

Database Configuration:

MySQL → innodb_buffer_pool_size should typically be set to ~70% of available RAM to cache data and indexes efficiently.
PostgreSQL → Key settings include:
- shared_buffers → portion of memory used to cache data pages.
- work_mem → memory allocated per query operation (sorting, joining).

Tips:

Monitor resource utilization (CPU, RAM, I/O) regularly to identify bottlenecks.
Adjust DB configs gradually and benchmark changes; overly aggressive settings can cause instability.
SSDs + properly tuned memory parameters often provide the largest performance gains for large datasets.

11. Batching & Bulk Operations

Purpose: Reduce database overhead by avoiding too many individual queries, improving performance for inserts, updates, or deletes.

Why it helps:

Each DB query has network, parsing, and transaction overhead.
Sending operations in batches reduces round-trips and allows the DB to optimize execution internally.
Example (Java Batch Insert using JdbcTemplate):

jdbcTemplate.batchUpdate(
    "INSERT INTO logs(msg, created_at) VALUES (?, ?)",
    logs,
    1000,  // batch size
    (ps, log) -> {
        ps.setString(1, log.getMsg());
        ps.setTimestamp(2, Timestamp.valueOf(log.getCreatedAt()));
    });

Sends 1000 inserts per batch → significantly faster than individual inserts.

Tips & Considerations:

Choose batch size based on memory and DB capacity; too large can cause memory issues.
Works for updates/deletes as well (batchUpdate supports any DML).
Combine with transactions to ensure atomicity: commit per batch instead of per row.
For ORMs (Hibernate/JPA), use session.flush() and session.clear() periodically to avoid memory bloat during batch operations.

Example Use Case:

Logging millions of events daily → batch inserts reduce DB stress and network overhead.
Bulk order updates in e-commerce systems → batch updates reduce lock contention.

12. Archiving & Purging Old Data

Purpose: Keep operational tables lean, improve query performance, and manage storage by moving or deleting old data.

Why it helps:

Large tables slow down queries, indexes, and backups.
Archiving preserves historical data without affecting daily operations.
Purging removes truly obsolete data to free up storage.

Example (Archiving Completed Orders Older Than 3 Years):

-- Move old data to archive table
INSERT INTO orders_archive  
SELECT * FROM orders
WHERE order_date < '2022-01-01';

-- Delete old data from main table
DELETE FROM orders
WHERE order_date < '2022-01-01';

Best Practices:

Batch deletes to avoid locking the entire table (e.g., delete 10k rows at a time).
Use transactional archiving or maintain consistency with foreign keys.
Consider partitioned tables for easier purging (drop old partitions instead of deleting rows).
Schedule archiving during low-traffic periods to minimize impact on live operations.

Example Use Case:

E-commerce systems → archive completed orders older than 3 years for compliance.
Logging systems → purge old logs beyond retention period to reduce storage costs.

13. Database-Side Features

Purpose: Leverage built-in database features to improve performance, reduce application load, and simplify complex operations.

🔹 Materialized Views

Definition: Store the results of a query physically, so repeated queries can fetch precomputed data instead of recomputing each time.
Use Case: Reporting dashboards, aggregated summaries, or slow queries that don’t need real-time accuracy.
Example (PostgreSQL):


        CREATE MATERIALIZED VIEW sales_summary AS
        SELECT customer_id, SUM(amount) AS total_spent
        FROM sales
        GROUP BY customer_id;

Benefits:
- Faster query response for complex aggregations.
- Can be refreshed periodically (REFRESH MATERIALIZED VIEW) to keep data reasonably up-to-date.

🔹 Stored Procedures

Definition: Encapsulate SQL logic on the database side to reduce repeated code and minimize app ↔ DB round-trips.
Use Case: Complex business logic, batch inserts/updates, or multi-step transactions.
Example (simplified):


        CREATE PROCEDURE update_stock(IN product_id INT, IN qty INT)
        BEGIN
            UPDATE products
            SET stock = stock - qty
            WHERE id = product_id;
        END;

Benefits:
- Reduces network overhead between application and database.
- Ensures consistent execution of critical operations.
- Can improve performance for batch operations.

🔹 Tips:

Combine materialized views and stored procedures for read-heavy dashboards: materialized views precompute aggregates, stored procedures handle updates efficiently.
Remember to schedule refreshes for materialized views if data changes frequently.

SIRAJ CHAUDHARY

Database Performance Tuning

Key considerations for database performance,

1. Indexing

2. Query Optimization

3. Connection Pooling

🔹 Spring Boot (HikariCP – Default Connection Pool)

🔹 How It Helps

4. Caching

5. Data Model & Schema Design

6. Partitioning

7. Sharding

8. Concurrency & Locking

🔹 Lock Types

🔹 Isolation Levels

🔹 Example Use Cases

🔹 Tips:

9. Monitoring & Metrics

10. Hardware & Configuration

11. Batching & Bulk Operations

12. Archiving & Purging Old Data

13. Database-Side Features

🔹 Materialized Views

🔹 Stored Procedures

🔹 Tips: