Navigating Postgres- A guide to Identify bottlenecks

Have you been in a situation, where your database is becoming a bottleneck with growing data? API latency increases due to DB bottleneck?You are close to hitting the threshold of vertically scaling up your database? Well, I hope this guide will help you identify the bottlenecks and what you can do to mitigate it.

Here is the steps you can follow

Activity	What you will identify	Remediation
Understand the volume of your database	Tables and indexes that are grown big	Archiving Partitioning
Understand the bloats in your table/index	Identify wasted space due to dead tuples and which table have high numbers	Create a data archive strategy to purge+vacuum+re-index indexes on those tables
Understand the table’s index usage	Identify table that are not using index	Identify the queries that are running on this table and use appropriate indexes
Understand the queries that are less performant	Queries doing seq scan	Identify these queries and create the right indexes

1. Understand the table and index size of the database

In Postgres a table space might mean

Data size = data size + size of any associated TOAST tables
Index Size of all Index for a table
Total size = data size + index size

Use the below query to identify the tables which have outgrown.

 SELECT
   relname  as table_name,
   pg_size_pretty(pg_total_relation_size(relid)) As total_size,
   pg_size_pretty(pg_indexes_size(relid)) as index_size,
   pg_size_pretty(pg_relation_size(relid)) as data_size
   FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

2. Identify the tables with high Dead tuples

Once you have identified the “Big” tables, you need to know if its because of growing data or due to growth of dead tuples.

Use below query to identify the tables which have high dead tuples

SELECT relname AS table_name,
       n_dead_tup AS dead_tuples,n_live_tup,
       pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

This query retrieves statistics about dead tuples from the “pg_stat_user_tables” system catalog table. It selects the table name, the number of dead tuples (n_dead_tup), and the total disk space used by the table (pg_total_relation_size(relid)). The WHERE clause filters the results to only include tables with more than 1000 dead tuples.

Dead tuples are rows that have been deleted or updated, but their space in the table has not yet been reclaimed by the “VACUUM” process. Accumulation of dead tuples can cause table bloat and increased disk space usage, as well as slower query performance due to index bloat and increased I/O operations. To understand more about them, please read by previous blog

By identifying tables with high numbers of dead tuples, you can prioritise “VACUUM” operations to free up disk space and optimise query performance.

Identifying wasted bytes

SELECT relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_indexes_size(relid)) as index_size,
pg_size_pretty(pg_relation_size(relid)) as data_size,
pg_size_pretty((n_dead_tup * (SELECT sum(attlen)FROM pg_attribute WHERE attrelid = relid))) as wasted_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

This query adds the “wasted_size” column to show the total disk space wasted by dead tuples in the table. The “pg_total_relation_size(relid)” returns the total disk space used by the table including indexes and other overhead. Then we calculate the size of the dead tuples by multiplying the number of dead tuples (n_dead_tup) by the sum of the lengths of all attributes in the table (sum(attlen)).

By including the wasted_bytes column, you can get a better idea of the actual disk space that can be reclaimed by running “VACUUM” on the table.

3. Identify the less performant tables

Table with less index usage

SELECT relname AS table_name, seq_scan as full_table_scans, idx_scan as index_scans,
       round(CAST(idx_scan as numeric) / CAST(seq_scan as numeric), 2) as index_to_seq_scan_ratio
FROM pg_stat_all_tables
WHERE schemaname = 'public' AND seq_scan > 0
ORDER BY index_to_seq_scan_ratio ASC;

This query retrieves statistics about table and index usage from the “pg_stat_all_tables” system catalog table. It selects the table name, the number of full table scans (seq_scan), the number of index scans (idx_scan), and a calculated ratio of the index scans to the full table scans.

The index_to_seq_scan_ratio column indicates how often an index is used relative to how often the table is scanned without using any index. A ratio closer to 0 means that the index is used less often and may not be very effective in optimizing queries. You can use this information to identify tables that may benefit from additional indexes or index tuning.

Seq scan vs index scan ratio

SELECT relname AS table_name, seq_scan as full_table_scans, idx_scan as index_scans,
       round(CAST(idx_scan as numeric) / CAST(seq_scan as numeric), 2) as index_to_seq_scan_ratio
FROM pg_stat_all_tables
WHERE schemaname = 'public' AND seq_scan > 0
ORDER BY index_to_seq_scan_ratio ASC;

By identifying tables with high sequential scan and/or tuple read counts, you can optimise query performance by creating appropriate indexes, revisiting your database schema and data access patterns.

Identify queries that are using sequential scans

You can identify queries that are using sequential scans (also known as table scans) by using the PostgreSQL “pg_stat_statements” extension, which tracks statistics on SQL statements executed by the server.

To enable “pg_stat_statements“, you need to add the following line to your postgresql.conf file and restart the PostgreSQL server:

shared_preload_libraries = 'pg_stat_statements'

SELECT 
  (total_time / 1000 / 60) AS total_minutes, 
  (total_time/calls) AS average_time,
  query
FROM 
  pg_stat_statements 
WHERE 
  (total_time / 1000 / 60) > 1 
  AND (seq_scan > 0)
ORDER BY 
  total_minutes DESC;

This query selects the total execution time (in minutes), average execution time per call, and query text for all SQL statements that have taken more than 1 minute to execute and have performed at least one sequential scan. The results are ordered by total execution time in descending order.

By running this query, you can identify queries that are doing sequential scans and taking a long time to execute. From there, you can take steps to optimise the queries by adding appropriate indexes, rewriting the queries, or adjusting your schema or data access patterns.

Conclusion

Once you have identified the bottleneck, here are some options you should think of

Archive data that are no longer required. Archival should include strategies to take backup, purge and vacuum to regain wasted space.
Create appropriate indexes: If the table is being scanned frequently on a certain column or set of columns, you may want to consider creating an index on those columns. This can greatly reduce the amount of data that needs to be scanned and can speed up queries.
Revisit database schema and data access patterns: If you have tables that are frequently being scanned, it may indicate a problem with your database schema or data access patterns. You may need to revisit your schema and optimise it for the types of queries you are running, or adjust your data access patterns to reduce the amount of data being scanned.
Monitor query performance: Once you have made changes to your schema, indexes, or queries, it’s important to monitor query performance to ensure that your changes are having the desired effect. You can use tools like EXPLAIN and EXPLAIN ANALYZE to analyze query execution plans and identify areas for further optimization.
Partitions: Try to use postgres native partitioning to distribute a big table based on a partition key. This is internally create separate files per partition and you can “TRUNCATE” the partition without worrying about vacuum.
Vertical scale up: As a last resort, you can increase the instance size RAM and CPU, this will be a short term solution only and you need to understand the data growth patterns and re design schema as a long term solution. Beyond a point the cost of vertically scaling up will grow exponentially.

These actions will help your database operate faster and ensure that your queries are running as efficiently as possible.

2G

Navigating Postgres- A guide to Identify bottlenecks

1. Understand the table and index size of the database

2. Identify the tables with high Dead tuples

Identifying wasted bytes

3. Identify the less performant tables

Table with less index usage

Seq scan vs index scan ratio

Identify queries that are using sequential scans

Conclusion

Leave a comment Cancel reply

Navigating Postgres- A guide to Identify bottlenecks

1. Understand the table and index size of the database

2. Identify the tables with high Dead tuples

Identifying wasted bytes

3. Identify the less performant tables

Table with less index usage

Seq scan vs index scan ratio

Identify queries that are using sequential scans

Conclusion

Share this:

Leave a comment Cancel reply