Database Performance Tuning


Key considerations for database performance,


1. Indexing

  • Purpose: Speed up read queries by avoiding full table scans.

  • Types of indexes:

    • B-Tree Index → default, best for equality & range queries.
    • Hash Index → best for equality lookups (Postgres supports it).
    • Composite Index → for queries filtering on multiple columns.
  • Example:

    -- Single column index CREATE INDEX idx_email ON users(email); -- Composite index for queries with WHERE first_name AND last_name CREATE INDEX idx_name ON users(first_name, last_name);
    • Query using index:

    SELECT * FROM users WHERE email = 'test@example.com';

👉 More indexes = faster reads but slower writes (INSERT/UPDATE/DELETE must also update indexes).


2. Query Optimization

  • Purpose: Reduce query cost and execution time.

  • Techniques:

    • Fetch only necessary columns (SELECT col1, col2 instead of SELECT *).
    • Use EXPLAIN to check execution plan.
    • Use JOINs instead of subqueries where possible.

    🔹 Examples


    1. Avoid SELECT * (fetch only required columns)

    👉 Bad (full scan + unnecessary columns):

    SELECT * FROM orders WHERE status = 'PENDING';


    👉 Better (narrower result set, index on status helps):

    SELECT order_id, customer_id FROM orders WHERE status = 'PENDING';


    2. Use EXPLAIN to analyze queries

    EXPLAIN shows whether a query uses an index scan or full table scan.


    Example 1: With index

    EXPLAIN SELECT order_id, customer_id FROM orders WHERE status = 'PENDING';


    Output (simplified):

    Index Scan using idx_orders_status on orders


    👉 Efficient, uses index.


    Example 2: Without index

    EXPLAIN SELECT order_id, customer_id FROM orders WHERE email = 'abc@test.com';


    Output (simplified):

    Seq Scan on orders


    👉 Full table scan → consider adding index on email.


    3. Replace subqueries with JOINs


    Subquery (less efficient):


    SELECT order_id, total_amount

    FROM orders

    WHERE customer_id IN (

        SELECT customer_id FROM customers WHERE country = 'USA'

    );


    Better with JOIN:


    SELECT o.order_id, o.total_amount

    FROM orders o

    JOIN customers c ON o.customer_id = c.customer_id

    WHERE c.country = 'USA';


    👉 DB can optimize using indexes on orders.customer_id and customers.country.


3. Connection Pooling

  • Purpose: Reduce the cost of creating new database connections.

    🔹 Spring Boot (HikariCP – Default Connection Pool)

    Spring Boot automatically configures HikariCP as the default connection pool if it’s on the classpath (no extra setup needed). You can fine-tune it in application.properties or application.yml:

    spring.datasource.hikari.maximum-pool-size=30 spring.datasource.hikari.minimum-idle=5 spring.datasource.hikari.idle-timeout=600000 # 10 minutes spring.datasource.hikari.max-lifetime=1800000 # 30 minutes

    🔹 How It Helps

    • Without pooling → Each DB request opens a new connection (slow & resource heavy).
    • With pooling → Connections are created once, kept alive, and reused.

    👉 This drastically improves performance under load, since connections are expensive to create.

    Example:

    • 100 requests without pooling → 100 new connections (huge overhead).
    • 100 requests with pooling → Reuses 5–30 pre-opened connections (fast).

4. Caching

  • Purpose: Reduce repeated DB hits for frequently accessed data.
  • Example (Spring Boot with Redis):
  • @Cacheable("user") public User getUserById(Long id) { return userRepository.findById(id).get(); }
  • First call → fetches from DB and caches.
  • Next calls → return from cache until eviction/expiry.

5. Data Model & Schema Design

Normalization: Avoid redundancy and maintain data integrity (3rd Normal Form). Ideal for transactional systems where inserts, updates, and deletes are frequent.

Denormalization: Sometimes duplicate or precompute data to optimize read performance. Useful for reporting, dashboards, or read-heavy APIs where query speed is critical.

Example:

Normalized (good for transactions):

orders(order_id, user_id, product_id, qty) users(user_id, name, email)
  • Pros: minimal redundancy, consistent updates, easier to maintain integrity.

Denormalized (good for reporting):

orders_summary(user_id, name, email, total_orders, total_spent)
  • Pros: faster reads, aggregated data readily available.
  • Cons: requires extra effort to keep data in sync with transactional tables.

Tip:

  • Use normalized tables for transactional operations and denormalized tables for read-heavy reporting or dashboards — often a hybrid approach works best.


6. Partitioning

Purpose: Split a large table into smaller, manageable pieces to improve query performance and maintenance.

Types:

  • Range Partitioning → split by ranges (dates, IDs, etc.).
  • List Partitioning → split by specific values (e.g., region = 'US', 'EU').
  • Hash Partitioning → distribute evenly by hash of a column for load balancing.

Example (PostgreSQL Range Partition):

CREATE TABLE orders ( id BIGINT, order_date DATE NOT NULL, amount DECIMAL ) PARTITION BY RANGE (order_date); CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); CREATE TABLE orders_2025 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Query benefit:

SELECT * FROM orders WHERE order_date >= '2025-01-01';

→ DB scans only the orders_2025 partition, reducing the number of rows scanned and improving performance.

Additional notes:

  • Partitioning also makes archiving or purging old data easier (drop old partitions instead of deleting rows).
  • Indexes can be local to partitions or global, depending on your DB engine.
  • Best for very large tables where full table scans are costly.

7. Sharding

Purpose: Distribute data across multiple databases or servers when a single database cannot handle the load, improving scalability and performance.

Strategies:

  • Range-based → Customers A–M → DB1, N–Z → DB2.
  • Hash-basedHash(user_id) % 4 → pick one of 4 DB shards.

Example:

Sharding by region:

  • Users in Asia → db_asia
  • Users in Europe → db_europe
  • Queries go only to the relevant database, reducing unnecessary load.

Java pseudo-code (shard routing layer):

int shardId = userId % 4; // pick DB shard DataSource ds = shardManager.getDataSource(shardId);

Additional notes:

  • Sharding works best for horizontal scaling when table size or traffic exceeds the capacity of a single DB.
  • Maintain a consistent sharding key (e.g., user_id) to avoid uneven distribution.
  • Challenges:
    • Cross-shard joins and aggregates are difficult → usually handled in the application or via federated queries.
    • Rebalancing shards can be complex if data grows unevenly.

8. Concurrency & Locking

Problem: Multiple transactions may try to read/write the same data simultaneously, which can lead to inconsistencies or conflicts.

🔹 Lock Types

1. Optimistic Locking

Idea: Assume conflicts are rare. Don’t lock rows up front — instead, check a version column before committing. If another transaction has updated the row in the meantime, your update fails, and you retry.


When to use:

  • Read-heavy workloads with low contention
  • Financial/accounting, analytics, or reporting where retries are acceptable


Example 1: SQL with Version Column


Table: accounts

account_id

balance

version

101

500

2


Two users try to withdraw money at the same time:


Transaction A (withdraw 100):


UPDATE accounts

SET balance = balance - 100,

    version = version + 1

WHERE account_id = 101 AND version = 2;


Transaction B (withdraw 50, also sees version = 2):


UPDATE accounts

SET balance = balance - 50,

    version = version + 1

WHERE account_id = 101 AND version = 2;


Outcome:

  • Transaction A succeeds → balance = 400, version = 3
  • Transaction B fails (no rows updated because version changed) → must retry


Example 2: JPA/Hibernate with @Version


@Entity

public class Account {

    @Id

    private Long accountId;

    private BigDecimal balance;


    @Version

    private int version;

}

  • Hibernate automatically checks the version field when updating.
  • If another transaction updated the row first, Hibernate throws OptimisticLockException.
  • Application can catch this and retry the withdrawal.


Pros

  • Minimal blocking → supports high concurrency
  • Works well in read-heavy systems with occasional writes

Cons

  • Requires retry logic in case of conflicts
  • Not suitable for heavily contended rows (e.g., a single "hot" account being updated frequently)

2. Pessimistic Locking

Idea: Lock the row when reading, preventing other transactions from modifying it until the current transaction completes.

When to use: High contention scenarios, critical updates (e.g., e-commerce checkout).

Example 1: (SQL)

SELECT * FROM products WHERE id = 100 FOR UPDATE;
  • Other transactions trying to update this row must wait until the lock is released.

Example 2: Java (JPA/Hibernate)

Product product = entityManager.find(Product.class, 100L, LockModeType.PESSIMISTIC_WRITE);

Pros:

  • Prevents conflicts upfront → no retries needed.
  • Ensures strict correctness in critical operations.

Cons:

  • Reduces concurrency → other transactions may block.
  • Can cause deadlocks if not carefully managed.

🔹 Isolation Levels

Level

Description

READ COMMITTED

Cannot read uncommitted changes.

REPEATABLE READ

Same row always returns same value in one transaction.

SERIALIZABLE

Strictest; transactions behave as if executed sequentially.



🔹 Example Use Cases

  • E-commerce checkout: Pessimistic lock on product row to prevent overselling.
  • Analytics query / reporting: Optimistic lock or no lock, since stale reads are acceptable.

🔹 Tips:

  • Use optimistic locking for high-read, low-write scenarios to maximize throughput.
  • Use pessimistic locking for high-risk write operations where conflicts are costly.

9. Monitoring & Metrics

Purpose: Ensure your database is performing optimally and detect issues early.

Key Metrics to Watch:

  • Query latency → slow queries indicate missing indexes or inefficient joins.
  • Connection pool usage → active, idle, and waiting connections (e.g., HikariCP metrics in Spring Boot).
  • Deadlocks → monitor conflicts that can block transactions.
  • Throughput & load → number of queries per second, CPU/memory usage.

Example:

  • Set up Prometheus to collect database metrics and Grafana to visualize them.

  • Track Postgres/MySQL metrics such as:

    • pg_stat_activity / information_schema.processlist
    • HikariCP metrics: hikaricp.connections.active, idle, pending

Tips:

  • Alerts on high query latency, maxed-out connection pools, or frequent deadlocks help prevent performance degradation.
  • Combine application-level metrics with DB metrics for full visibility.

10. Hardware & Configuration

Purpose: Ensure the database has sufficient resources and is tuned for optimal performance.

Hardware Considerations:

  • SSD vs HDD → SSDs provide drastically faster read/write speeds and lower latency compared to HDDs. Essential for high-performance transactional or analytical workloads.
  • Memory & CPU → More RAM allows larger buffer/cache usage; CPU cores affect query processing and parallel operations.

Database Configuration:

  • MySQLinnodb_buffer_pool_size should typically be set to ~70% of available RAM to cache data and indexes efficiently.

  • PostgreSQL → Key settings include:

    • shared_buffers → portion of memory used to cache data pages.
    • work_mem → memory allocated per query operation (sorting, joining).

Tips:

  • Monitor resource utilization (CPU, RAM, I/O) regularly to identify bottlenecks.
  • Adjust DB configs gradually and benchmark changes; overly aggressive settings can cause instability.
  • SSDs + properly tuned memory parameters often provide the largest performance gains for large datasets.

11. Batching & Bulk Operations

Purpose: Reduce database overhead by avoiding too many individual queries, improving performance for inserts, updates, or deletes.

Why it helps:

  • Each DB query has network, parsing, and transaction overhead.
  • Sending operations in batches reduces round-trips and allows the DB to optimize execution internally.
  • Example (Java Batch Insert using JdbcTemplate):

jdbcTemplate.batchUpdate(

    "INSERT INTO logs(msg, created_at) VALUES (?, ?)",

    logs,

    1000,  // batch size

    (ps, log) -> {

        ps.setString(1, log.getMsg());

        ps.setTimestamp(2, Timestamp.valueOf(log.getCreatedAt()));

    });

  • Sends 1000 inserts per batch → significantly faster than individual inserts.

Tips & Considerations:

  • Choose batch size based on memory and DB capacity; too large can cause memory issues.
  • Works for updates/deletes as well (batchUpdate supports any DML).
  • Combine with transactions to ensure atomicity: commit per batch instead of per row.
  • For ORMs (Hibernate/JPA), use session.flush() and session.clear() periodically to avoid memory bloat during batch operations.

Example Use Case:

  • Logging millions of events daily → batch inserts reduce DB stress and network overhead.
  • Bulk order updates in e-commerce systems → batch updates reduce lock contention.

12. Archiving & Purging Old Data

Purpose: Keep operational tables lean, improve query performance, and manage storage by moving or deleting old data.

Why it helps:

  • Large tables slow down queries, indexes, and backups.
  • Archiving preserves historical data without affecting daily operations.
  • Purging removes truly obsolete data to free up storage.

Example (Archiving Completed Orders Older Than 3 Years):

-- Move old data to archive table

INSERT INTO orders_archive  

SELECT * FROM orders

WHERE order_date < '2022-01-01';


-- Delete old data from main table

DELETE FROM orders

WHERE order_date < '2022-01-01';

Best Practices:

  • Batch deletes to avoid locking the entire table (e.g., delete 10k rows at a time).
  • Use transactional archiving or maintain consistency with foreign keys.
  • Consider partitioned tables for easier purging (drop old partitions instead of deleting rows).
  • Schedule archiving during low-traffic periods to minimize impact on live operations.

Example Use Case:

  • E-commerce systems → archive completed orders older than 3 years for compliance.
  • Logging systems → purge old logs beyond retention period to reduce storage costs.

13. Database-Side Features

Purpose: Leverage built-in database features to improve performance, reduce application load, and simplify complex operations.

🔹 Materialized Views

  • Definition: Store the results of a query physically, so repeated queries can fetch precomputed data instead of recomputing each time.

  • Use Case: Reporting dashboards, aggregated summaries, or slow queries that don’t need real-time accuracy.

  • Example (PostgreSQL):

        CREATE MATERIALIZED VIEW sales_summary AS         SELECT customer_id, SUM(amount) AS total_spent         FROM sales         GROUP BY customer_id;
  • Benefits:

    • Faster query response for complex aggregations.
    • Can be refreshed periodically (REFRESH MATERIALIZED VIEW) to keep data reasonably up-to-date.

🔹 Stored Procedures

  • Definition: Encapsulate SQL logic on the database side to reduce repeated code and minimize app ↔ DB round-trips.

  • Use Case: Complex business logic, batch inserts/updates, or multi-step transactions.

  • Example (simplified):

        CREATE PROCEDURE update_stock(IN product_id INT, IN qty INT)         BEGIN         UPDATE products         SET stock = stock - qty         WHERE id = product_id;         END;
  • Benefits:

    • Reduces network overhead between application and database.
    • Ensures consistent execution of critical operations.
    • Can improve performance for batch operations.

🔹 Tips:

  • Combine materialized views and stored procedures for read-heavy dashboards: materialized views precompute aggregates, stored procedures handle updates efficiently.
  • Remember to schedule refreshes for materialized views if data changes frequently.