Key considerations for database performance,
1. Indexing
-
Purpose: Speed up read queries by avoiding full table scans.
-
Types of indexes:
-
B-Tree Index → default, best for equality & range queries.
- Hash Index → best for equality lookups (Postgres supports it).
- Composite Index → for queries filtering on multiple columns.
-
-
Example:
-
Query using index:
-
👉 More indexes = faster reads but slower writes (INSERT/UPDATE/DELETE
must also update indexes).
2. Query Optimization
-
Purpose: Reduce query cost and execution time.
-
Techniques:
-
Fetch only necessary columns (
SELECT col1, col2
instead ofSELECT *
). - Use
EXPLAIN
to check execution plan. - Use
JOIN
s instead of subqueries where possible.
🔹 Examples
1. Avoid SELECT * (fetch only required columns)
👉 Bad (full scan + unnecessary columns):
SELECT * FROM orders WHERE status = 'PENDING';
👉 Better (narrower result set, index on status helps):
SELECT order_id, customer_id FROM orders WHERE status = 'PENDING';
2. Use EXPLAIN to analyze queries
EXPLAIN shows whether a query uses an index scan or full table scan.
Example 1: With index
EXPLAIN SELECT order_id, customer_id FROM orders WHERE status = 'PENDING';
Output (simplified):
Index Scan using idx_orders_status on orders
👉 Efficient, uses index.
Example 2: Without index
EXPLAIN SELECT order_id, customer_id FROM orders WHERE email = 'abc@test.com';
Output (simplified):
Seq Scan on orders
👉 Full table scan → consider adding index on email.
3. Replace subqueries with JOINs
Subquery (less efficient):
SELECT order_id, total_amount
FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE country = 'USA'
);
Better with JOIN:
SELECT o.order_id, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'USA';
👉 DB can optimize using indexes on orders.customer_id and customers.country.
-
3. Connection Pooling
-
Purpose: Reduce the cost of creating new database connections.
🔹 Spring Boot (HikariCP – Default Connection Pool)
Spring Boot automatically configures HikariCP as the default connection pool if it’s on the classpath (no extra setup needed). You can fine-tune it in
application.properties
orapplication.yml
:🔹 How It Helps
-
Without pooling → Each DB request opens a new connection (slow & resource heavy).
- With pooling → Connections are created once, kept alive, and reused.
👉 This drastically improves performance under load, since connections are expensive to create.
Example:
-
100 requests without pooling → 100 new connections (huge overhead).
- 100 requests with pooling → Reuses 5–30 pre-opened connections (fast).
-
4. Caching
-
Purpose: Reduce repeated DB hits for frequently accessed data.
- Example (Spring Boot with Redis):
-
-
First call → fetches from DB and caches.
- Next calls → return from cache until eviction/expiry.
5. Data Model & Schema Design
Normalization: Avoid redundancy and maintain data integrity (3rd Normal Form). Ideal for transactional systems where inserts, updates, and deletes are frequent.
Denormalization: Sometimes duplicate or precompute data to optimize read performance. Useful for reporting, dashboards, or read-heavy APIs where query speed is critical.
Example:
Normalized (good for transactions):
-
Pros: minimal redundancy, consistent updates, easier to maintain integrity.
Denormalized (good for reporting):
-
Pros: faster reads, aggregated data readily available.
- Cons: requires extra effort to keep data in sync with transactional tables.
Tip:
-
Use normalized tables for transactional operations and denormalized tables for read-heavy reporting or dashboards — often a hybrid approach works best.
6. Partitioning
Purpose: Split a large table into smaller, manageable pieces to improve query performance and maintenance.
Types:
-
Range Partitioning → split by ranges (dates, IDs, etc.).
- List Partitioning → split by specific values (e.g., region = 'US', 'EU').
- Hash Partitioning → distribute evenly by hash of a column for load balancing.
Example (PostgreSQL Range Partition):
Query benefit:
→ DB scans only the orders_2025
partition, reducing the number of rows scanned and improving performance.
Additional notes:
-
Partitioning also makes archiving or purging old data easier (drop old partitions instead of deleting rows).
- Indexes can be local to partitions or global, depending on your DB engine.
- Best for very large tables where full table scans are costly.
7. Sharding
Purpose: Distribute data across multiple databases or servers when a single database cannot handle the load, improving scalability and performance.
Strategies:
-
Range-based → Customers A–M → DB1, N–Z → DB2.
- Hash-based →
Hash(user_id) % 4
→ pick one of 4 DB shards.
Example:
Sharding by region:
-
Users in Asia →
db_asia
- Users in Europe →
db_europe
- Queries go only to the relevant database, reducing unnecessary load.
Java pseudo-code (shard routing layer):
Additional notes:
-
Sharding works best for horizontal scaling when table size or traffic exceeds the capacity of a single DB.
- Maintain a consistent sharding key (e.g.,
user_id
) to avoid uneven distribution. - Challenges:
- Cross-shard joins and aggregates are difficult → usually handled in the application or via federated queries.
- Rebalancing shards can be complex if data grows unevenly.
8. Concurrency & Locking
Problem: Multiple transactions may try to read/write the same data simultaneously, which can lead to inconsistencies or conflicts.
🔹 Lock Types
1. Optimistic Locking
Idea: Assume conflicts are rare. Don’t lock rows up front — instead, check a version column before committing. If another transaction has updated the row in the meantime, your update fails, and you retry.
When to use:
- Read-heavy workloads with low contention
- Financial/accounting, analytics, or reporting where retries are acceptable
Example 1: SQL with Version Column
Table: accounts
account_id | balance | version |
101 | 500 | 2 |
Two users try to withdraw money at the same time:
Transaction A (withdraw 100):
UPDATE accounts
SET balance = balance - 100,
version = version + 1
WHERE account_id = 101 AND version = 2;
Transaction B (withdraw 50, also sees version = 2):
UPDATE accounts
SET balance = balance - 50,
version = version + 1
WHERE account_id = 101 AND version = 2;
Outcome:
- Transaction A succeeds → balance = 400, version = 3
- Transaction B fails (no rows updated because version changed) → must retry
Example 2: JPA/Hibernate with @Version
@Entity
public class Account {
@Id
private Long accountId;
private BigDecimal balance;
@Version
private int version;
}
- Hibernate automatically checks the version field when updating.
- If another transaction updated the row first, Hibernate throws OptimisticLockException.
- Application can catch this and retry the withdrawal.
Pros
- Minimal blocking → supports high concurrency
- Works well in read-heavy systems with occasional writes
Cons
- Requires retry logic in case of conflicts
- Not suitable for heavily contended rows (e.g., a single "hot" account being updated frequently)
2. Pessimistic Locking
Idea: Lock the row when reading, preventing other transactions from modifying it until the current transaction completes.Example 1: (SQL)
-
Other transactions trying to update this row must wait until the lock is released.
Example 2: Java (JPA/Hibernate)
Pros:
-
Prevents conflicts upfront → no retries needed.
- Ensures strict correctness in critical operations.
Cons:
-
Reduces concurrency → other transactions may block.
- Can cause deadlocks if not carefully managed.
🔹 Isolation Levels
Level | Description |
READ COMMITTED | Cannot read uncommitted changes. |
REPEATABLE READ | Same row always returns same value in one transaction. |
SERIALIZABLE | Strictest; transactions behave as if executed sequentially. |
🔹 Example Use Cases
-
E-commerce checkout: Pessimistic lock on product row to prevent overselling.
- Analytics query / reporting: Optimistic lock or no lock, since stale reads are acceptable.
🔹 Tips:
-
Use optimistic locking for high-read, low-write scenarios to maximize throughput.
- Use pessimistic locking for high-risk write operations where conflicts are costly.
9. Monitoring & Metrics
Purpose: Ensure your database is performing optimally and detect issues early.
Key Metrics to Watch:
-
Query latency → slow queries indicate missing indexes or inefficient joins.
- Connection pool usage → active, idle, and waiting connections (e.g., HikariCP metrics in Spring Boot).
- Deadlocks → monitor conflicts that can block transactions.
- Throughput & load → number of queries per second, CPU/memory usage.
Example:
-
Set up Prometheus to collect database metrics and Grafana to visualize them.
-
Track Postgres/MySQL metrics such as:
-
pg_stat_activity
/information_schema.processlist
- HikariCP metrics:
hikaricp.connections.active
,idle
,pending
-
Tips:
-
Alerts on high query latency, maxed-out connection pools, or frequent deadlocks help prevent performance degradation.
- Combine application-level metrics with DB metrics for full visibility.
10. Hardware & Configuration
Purpose: Ensure the database has sufficient resources and is tuned for optimal performance.
Hardware Considerations:
-
SSD vs HDD → SSDs provide drastically faster read/write speeds and lower latency compared to HDDs. Essential for high-performance transactional or analytical workloads.
- Memory & CPU → More RAM allows larger buffer/cache usage; CPU cores affect query processing and parallel operations.
Database Configuration:
-
MySQL →
innodb_buffer_pool_size
should typically be set to ~70% of available RAM to cache data and indexes efficiently. -
PostgreSQL → Key settings include:
-
shared_buffers
→ portion of memory used to cache data pages. work_mem
→ memory allocated per query operation (sorting, joining).
-
Tips:
-
Monitor resource utilization (CPU, RAM, I/O) regularly to identify bottlenecks.
- Adjust DB configs gradually and benchmark changes; overly aggressive settings can cause instability.
- SSDs + properly tuned memory parameters often provide the largest performance gains for large datasets.
11. Batching & Bulk Operations
Purpose: Reduce database overhead by avoiding too many individual queries, improving performance for inserts, updates, or deletes.
Why it helps:
-
Each DB query has network, parsing, and transaction overhead.
- Sending operations in batches reduces round-trips and allows the DB to optimize execution internally.
- Example (Java Batch Insert using
JdbcTemplate
):
-
Sends 1000 inserts per batch → significantly faster than individual inserts.
Tips & Considerations:
-
Choose batch size based on memory and DB capacity; too large can cause memory issues.
- Works for updates/deletes as well (
batchUpdate
supports any DML). - Combine with transactions to ensure atomicity: commit per batch instead of per row.
- For ORMs (Hibernate/JPA), use
session.flush()
andsession.clear()
periodically to avoid memory bloat during batch operations.
Example Use Case:
-
Logging millions of events daily → batch inserts reduce DB stress and network overhead.
- Bulk order updates in e-commerce systems → batch updates reduce lock contention.
12. Archiving & Purging Old Data
Purpose: Keep operational tables lean, improve query performance, and manage storage by moving or deleting old data.
Why it helps:
-
Large tables slow down queries, indexes, and backups.
- Archiving preserves historical data without affecting daily operations.
- Purging removes truly obsolete data to free up storage.
Example (Archiving Completed Orders Older Than 3 Years):
Best Practices:
-
Batch deletes to avoid locking the entire table (e.g., delete 10k rows at a time).
- Use transactional archiving or maintain consistency with foreign keys.
- Consider partitioned tables for easier purging (drop old partitions instead of deleting rows).
- Schedule archiving during low-traffic periods to minimize impact on live operations.
Example Use Case:
-
E-commerce systems → archive completed orders older than 3 years for compliance.
- Logging systems → purge old logs beyond retention period to reduce storage costs.
13. Database-Side Features
Purpose: Leverage built-in database features to improve performance, reduce application load, and simplify complex operations.
🔹 Materialized Views
-
Definition: Store the results of a query physically, so repeated queries can fetch precomputed data instead of recomputing each time.
-
Use Case: Reporting dashboards, aggregated summaries, or slow queries that don’t need real-time accuracy.
Example (PostgreSQL):
-
Benefits:
-
Faster query response for complex aggregations.
- Can be refreshed periodically (
REFRESH MATERIALIZED VIEW
) to keep data reasonably up-to-date.
-
🔹 Stored Procedures
-
Definition: Encapsulate SQL logic on the database side to reduce repeated code and minimize app ↔ DB round-trips.
-
Use Case: Complex business logic, batch inserts/updates, or multi-step transactions.
Example (simplified):
-
Benefits:
-
Reduces network overhead between application and database.
- Ensures consistent execution of critical operations.
- Can improve performance for batch operations.
-
🔹 Tips:
-
Combine materialized views and stored procedures for read-heavy dashboards: materialized views precompute aggregates, stored procedures handle updates efficiently.
- Remember to schedule refreshes for materialized views if data changes frequently.