June, 28 2016 - 5432... Meet us!
Authors
Ronan Dunklau
DBA @ Dalibo
Open-Source: Multicorn...
Some PostgreSQL contributions (IMPORT FOREIGN SCHEMA...)
Julien Rouhaud
DBA @ Dalibo
Open-Source: HypoPG, OPM...
Some PostgreSQL contributions
But also...
Marc Cousin
Thomas Reiss
PoWA ?
[t]
2cm
5cm
Workload analysis tool
Suggests opitmizations
Live!
Application stack
pg_stat_statements
Official PostgreSQL contrib
Normalized queries
Cumulative counters (buffers, execution time...), by
user
database
query
pg_stat_statements
Number of execution per normalized query
Average execution time
Temporary file creation
Blocks access from or outside PostgreSQL’s cache
pg_stat_statements
pg_stat_statements
pg_stat_kcache
Collects system metrics, by normalized queries
Physical disk access
CPU usage
pg_stat_kcache
“real” hit-ratio (PostgreSQL cache Vs system cache)
Identify CPU bound queries
pg_stat_kcache
pg_stat_kcache
pg_qualstats
Predicate analysis
WHERE clauses
JOIN clauses
Collects various metrics
Selectivity
Constants sampling (most executed, most filtering...)
Execution count
Evalutation type (Index clause or post-scan filtering)
pg_qualstats
pg_qualstats
pg_qualstats
pg_qualstats
powa-archivist
Archive those data sources
Configurable (retention, frequency...)
Extensible to other datasources
powa-archivist
Where / when are the bottlenecks
For what reason
How to fix
Live!
Compatibility
PostgreSQL 9.4 et later
PoWA 1 compatible with 9.3, but much more limited
powa-web
Web interface for PoWA
Manage one or more PoWA instance
Drill-down analysis
powa-web
problem: bad performance on parts of an application
Select an analysis period
Identify the database
powa-web
powa-web
powa-web
Problematic database has been identified...
let’s drill down to the query level!
powa-web
powa-web
powa-web
powa-web
2 problematic queries
Drill down on each of them
[fragile]
[mathescape, numbersep=5pt, gobble=2, frame=lines, framesep=2mm]sql SELECT com.id, sum(cl.pric) AS totalprice FROM command com JOIN commandline cl ON com.id = cl.idcommand JOIN client cli ON cli.id = com.idclient WHERE cli.id = ? GROUP BY com.id
powa-web
powa-web
powa-web
powa-web
[fragile]powa-web
[mathescape, numbersep=5pt, gobble=2, frame=lines, framesep=2mm]sql SELECT id, dt FROM command WHERE state = ?
powa-web
powa-web
powa-web
powa-web
github.com/dalibo/HypoPG extension support
Global index suggestion
HypoPG
Allow for hypothetical indexes creation
Instant creation, no impact on resources and no lock
Only used in EXPLAIN statements
[fragile]HypoPG
[mathescape, numbersep=5pt, gobble=2, frame=lines, framesep=2mm]sql rjuju=# EXPLAIN SELECT * FROM t1 WHERE id = 3 ; QUERY PLAN ————————————————— Seq Scan on t1 (cost=0.00..1693.00 rows=1 width=4) Filter: (id = 3) (2 rows)
[fragile]HypoPG
[mathescape, numbersep=5pt, gobble=2, frame=lines, framesep=2mm]sql # SELECT hypopgcreateindex(’CREATE INDEX ON t1(id)’) ; hypopgcreateindex ————————– (77523,<77523>btreet1id) (1 row)
rjuju=# EXPLAIN SELECT * FROM t1 WHERE id = 3 ; QUERY PLAN ————————————————————————– Index Only Scan using <77523>btreet1id on t1 (0.04..8.06 rows=1 width=4) Index Cond: (id = 3) (2 rows)
HypoPG
Will PostgreSQL use such an index
What size can I expect it to be
How useful can it be
HypoPG
Global optimization
Find the optimal set of index to add
Helping every queries
Minimum set of indexes
Privileging multi-column indexes
Global optimization
Fetch the predicates that need optimization (pg_qualstats)
Predicates filtering more than X lines out
Predicates filtering more than X% of lines out
Predicates used as part of a Seq Scan
Global optimization
Group predicates by supported access methods
Build a list of predicates “contained” by each predicates
WHERE id = ? AND label = ?
WHERE id = ?
WHERE label = ?
For each node, attribute a “score” to it (currently, number of columns)
Global optimization
For each node, compute a path containing all included node
Score it (sum of individual nodes scores)
Starting with the highest scoring path, generate the index definition for it
Delete any other path made obsolete by this one
Loop until no path is left unoptimized
Global optimization
Goal: estimate if the indexes will be used, and how much improvement they bring
If HypoPG is available on the target database:
Create hypothetical index for each suggestion
EXPLAIN every query with and without the hypothetical indexes
Based on the difference cost, estimate the gain
If HypoPG is available on the target database:
By query
Globally
Estimate the size of the indexes
Global optimization
Global optimization
Global optimization
Global optimization
Global optimization
Global optimization
What’s next
Find correlations, and suggest them once correlated statistics are available
WHERE cityname = ? AND zipcode = ? (10 rows avg)
WHERE cityname = ? (10 rows avg)
WHERE zipcode = ? (10 rows avg)
It means that cityname and zipcode are probably correlated
Collect statistics on table to take DML workload into account
Suggest partial indexes based on most-often used values
Useful links
powa-archivist
powa-web
pg_qualstats
pg_stat_kcache
HypoPG
Questions ?