A fast PostgreSQL log analyzer
Key FeaturesSQL queries related reports
- Overall statistics
- The most frequent waiting queries.
- Queries that waited the most.
- Queries generating the most temporary files.
- Queries generating the largest temporary files.
- The slowest queries.
- Queries that took up the most time.
- The most frequent queries.
- The most frequent errors.
- Locks statistics.
- ueries by type (select/insert/update/delete).
- Distribution of queries type per database/application
- Sessions per database/user/client.
- Connections per database/user/client.
- Autovacuum and autoanalyze per table.
- SQL queries statistics.
- Temporary file statistics.
- Checkpoints statistics.
- Autovacuum and autoanalyze statistics.
pgBadger is a PostgreSQL log analyzer build for speed with fully detailed reports from your PostgreSQL log file. It's a single and small Perl script that outperform any other PostgreSQL log analyzer.
pgBadger is able to autodetect your log file format (syslog, stderr or csvlog). It is designed to parse huge log files as well as gzip compressed file. See a complete list of features below.
All charts are zoomable and can be saved as PNG images.
You can also limit pgBadger to only report errors or remove any part of the report using command line options.
pgBadger supports any custom format set into log_line_prefix of your postgresql.conf file provide that you use the %t, %p and %l patterns.
pgBadger allow parallel processing on a single log file and multiple files through the use of the -j option and the number of CPUs as value.
If you want to save system performance you can also use log_duration instead of log_min_duration_statement to have reports on duration and number of queries only.
QuotesAndrew Dunstan was the first to talk about pgBadger after seeing the talk at PGCON 2012: "The biggest thing that piqued my interest yesterday at pgCon was the lightning talk on pgbadger. I'm certainly going to take a look at this as a substitute for pgfouine". See the whole blog post on Andrew's blog.Christophe Pettus, another PostgreSQL Experts consultant, says June, 30th, 2012: "It’s written in Perl, at least as fast as pgfouine, and can process log files that pgfouine can’t handle. It can read either CSV or standard log format, and can directly read *.gz files. It also produces a wider range of reports that pgfouine, including some very useful locking reports. I threw 25GB of logs with near 80 million lines at it without it complaining". Read it all on his blog post.Fidel Ramos' tweet says it all: "pgbadger, an alternative PostgreSQL log analyzer to pgfouine which promises to be both faster and more extensive".Vincent Picavet from Oslandia, says "When we setup a Geographical information system database with PostGIS, we install tools for monitoring its behaviour. This enable us to fine-tune the server and queries. We use PgBadger on various production servers, and it's a real quick and efficient solution to get useful reports. The really dynamic development of this solution has been a big plus in our choice."Michael Paquier from Postgres-XC project said in his article: "It is one of those utilities that you can use not only for production database systems, but for extra things like benchmark or performance analysis. Its installation is easy, will not heavy your system with packages you might not want, so go ahead and use it ... this utility is 100% compatible with Postgres-XC, the only thing you need to do is just to launch pgbadger for each node of your cluster."
- pgBadger is an original work from Gilles Darold.
- The pgBadger logo is an original creation of Damien Clochard.
- The pgBadger v4.x design comes from the "Art is code" company.
- This web site is a work of Gilles Darold.
- pgBadger is maintained by Gilles Darold and the good folks at Dalibo every one who wants to contribute.
- Many people have contributed to pgBadger, they are all quoted in the Changelog file.
- <pgbadger AT dalibo DOT com>
- <contact AT dalibo DOT com>
- <gilles AT darold DOT net>
- "Art is code"
Latest news about pgBadger
- 4.1 version
- 2013-11-08 Date
This release fixes two major bugs and some others minor issues. There's also a new command line option --exclude-appname that allow exclusion from the report of queries generated by a specific program, like pg_dump. Documentation have been updated with a new chapter about building incremental reports.
- Add log_autovacuum_min_duration into documentation in chapter about postgresql configuration directives. Thanks to Herve Werner for the report. - Add chapter about "Incremental reports" into documentation. - Fix reports with per minutes average where last time fraction was not reported. Thanks to Ludovic Levesque and Vincent Laborie for the report. - Fix unterminated comment in information popup. Thanks to Ronan Dunklau for the patch. - Add --exclude-appname command line option to eliminate unwanted traffic generated by a specific application. Thanks to Steve Crawford for the feature request. - Allow external links use into URL to go to a specific report. Thanks to Hubert depesz Lubaczewski for the feature request. - Fix empty reports when parsing compressed files with the -j option which is not allowed with compressed file. Thanks to Vincent Laborie for the report. - Prevent progress bar length to increase after 100% when real size is greater than estimated size (issue found with huge compressed file). - Correct some spelling and grammar in ChangeLog and pgbadger. Thanks to Thom Brown for the patch. - Fix major bug on SQL traffic reports with wrong min value and bad average value on select reports, add min/max for select queries. Thanks to Vincent Laborie for the report.
- 4.0 version
- 2013-10-31 Date
This major release is the "Say goodbye to the fouine" release. With a full rewrite of the reports design, pgBadger has now turn the HTML reports into a more intuitive user experience and professional look.
The report is now driven by a dynamic menu with the help of the embedded boostrap library. Every main menu correspond to hidden an slide that is brought to front when the menu or one of his submenus is activated. There's also the embedded font FontAwasome webfont to beautify the report.
Every statistic report now include a key value section that shows you immediately some of the relevant informations. Pie charts have also been separated from their data tables using two tabs, one for the chart and the other one for the data.
Tables reporting hourly statistic have been moved to a multiple tabs report following the data. This is used with General (queries, connections, sessions), Checkpoints (buffer, files, warnings), Temporary file and Vacuums activities.
There's some new useful informations shown in the key value sections. Peak information shows the number and datetime of the highest activity. Here is the list of those reports:
- - Queries peak
- - Read queries peak
- - Write queries peak
- - Connections peak
- - Checkpoints peak
- - Wal files usage Peak
- - Checkpoints warnings peak
- - Temporary file size peak
- - Temporary file number peak
Reports about Checkpoints and Restartpoints have been merge in a single one. This is the same, outside the fact that restartpoints are on a slave cluster, so there was no need to separate those informations.
Recent PostgreSQL versions add additional information about checkpoint, the number of synced files, the longest sync and the average of sync time per file. pgBadger collects and shows these informations in the Checkpoint Activity report.
There's also some new reports:
- - Prepared queries ratio (execute vs prepare)
- - Prepared over normal queries
- - Queries (select, insert, update, delete) per user/host/application
- - Pie charts for tables with the more tuples and pages removed during vacuum.
The vacuum report will now highlight the costly table during a vacuum or analyze of a database.
The errors are now highlighted by a different color following the level. A LOG level will be green, HINT will be yellow, WARNING orange, ERROR red and FATAL dark red.
Some changes in the binary format are not backward compatible and option --client have been remove as it was replaced by --dbclient for a long time now.
If you are running a pg_dump or some batch process with very slow queries your report analyze will be annoyed by those queries taking too much place in the report. Before that release it was a pain to exclude those queries from the report. Now you can use the --exclude-time command line option to exclude all traces matching the given time regexp from the report. For example, let's say you have a pg_dump at 13:00 each day during half an hour, you can use pgbadger as follow:
pgbadger --exclude-time "2013-09-.* 13:.*" postgresql.log
If your are also running a pg_dump at night, let's say 22:00, you can write it as follow:
pgbadger --exclude-time '2013-09-\d+ 13:[0-3]' --exclude-time '2013-09-\d+ 22:[0-3]' postgresql.log
or more shortly:
pgbadger --exclude-time '2013-09-\d+ (13|22):[0-3]' postgresql.log
Exclude time always require the iso notation yyyy-mm-dd hh:mm:ss, even if log format is syslog. This is the same for all time related options. Take care that this option has a high cost on the parser performances.
pgBadger may have a binary package corresponding to your distribution. For RPM packages, thanks to the great work of Devrim GÜNDÜZ, you can find the pgBadger package at the PostgreQSL yum repository
If you are the maintainer of other package distribution or you know how to download them, please let me know I will add a link here.
The latest development code can always be found into the pgBadger's GitHub repository
If you planned to parse PostgreSQL CSV log files you might need some Perl Modules:
Text::CSV_XS - to parse PostgreSQL CSV log files.
This module is optional, if you don't have PostgreSQL log in the CSV format you don't need to install it.
Compressed log file format is autodetected from the file exension. If pgBadger find a gz extension it will use the zcat utility, with a bz2 extension it will use bzcat and if the file extension is zip then the unzip utility will be used.
If those utilities are not found in the PATH environment variable then use the --zcat command line option to change this path. For example:
--zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc" --zcat="C:\tools\unzip -p"
By default pgBadger will use the zcat, bzcat and unzip utilities following the file extension. If you use the default autodetection compress format you can mixed gz, bz2 or zip files. Specifying a custom value to --zcat option will remove this feature of mixed compressed format.
Note that multiprocessing can not be used with compressed files or CSV files as well as under Windows platform.
You must enable and set some configuration directives in your postgresql.conf before starting.
You must first enable SQL query logging to have something to parse:
log_min_duration_statement = 0
Here every statement will be logged, on busy server you may want to increase this value to only log queries with a higher duration time. Note that if you have log_statement set to 'all' nothing will be logged through log_min_duration_statement. See next chapter for more information.
With 'stderr' log format, log_line_prefix must be at least:
log_line_prefix = '%t [%p]: [%l-1] '
Log line prefix could add user and database name as follows:
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
or for syslog log file format:
log_line_prefix = 'user=%u,db=%d '
Log line prefix for stderr output could also be:
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '
or for syslog output:
log_line_prefix = 'db=%d,user=%u '
You need to enable other parameters in postgresql.conf to get more information from your log files:
log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0
Do not enable log_statement as their log format will not be parsed by pgBadger.
Of course your log messages should be in English without locale support:
but this is not only recommended by pgBadger.
log_min_duration_statement, log_duration and log_statement
If you want full statistics reports you must set log_min_duration_statement to 0 or more milliseconds.
If you just want to report duration and number of queries and don't want all details about queries, set log_min_duration_statement to -1 to disable it and enable log_duration in your postgresql.conf file. If you want to add the most common request report you can either choose to set log_min_duration_statement to a higher value or choose to enable log_statement.
Enabling log_min_duration_statement will add reports about slowest queries and queries that took up the most time. Take care that if you have log_statement set to 'all' nothing will be logged with log_line_prefix.
To enable parallel processing you just have to use the -j N option where N is the number of cores you want to use.
pgbadger will then proceed as follow:
for each log file chunk size = int(file size / N) look at start/end offsets of these chunks fork N processes and seek to the start offset of each chunk each process will terminate when the parser reach the end offset of its chunk each process write stats into a binary temporary file wait for all children has terminated All binary temporary files generated will then be read and loaded into memory to build the html output.
With that method, at start/end of chunks pgbadger may truncate or omit a maximum of N queries perl log file which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is loose is near 0, this is why I think this gap is livable. Most of the time the query is counted twice but truncated.
When you have lot of small log files and lot of CPUs it is speedier to dedicate one core to one log file at a time. To enable this behavior you have to use option -J N instead. With 200 log files of 10MB each the use of the -J option start being really interesting with 8 Cores. Using this method you will be sure to not loose any queries in the reports.
He are a benchmarck done on a server with 8 CPUs and a single file of 9.5GB.
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+---------+-------+-------+------ -j | 1h41m18 | 50m25 | 25m39 | 15m58 -J | 1h41m18 | 54m28 | 41m16 | 34m45
With 200 log files of 10MB each and a total og 2GB the results are slightly different:
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+-------+-------+-------+------ -j | 20m15 | 9m56 | 5m20 | 4m20 -J | 20m15 | 9m49 | 5m00 | 2m40
So it is recommanded to use -j unless you have hundred of small log file and can use at least 8 CPUs.
IMPORTANT: when you are using parallel parsing pgbadger will generate a lot of temporary files in the /tmp directory and will remove them at end, so do not remove those files unless pgbadger is not running. They are all named with the following template tmp_pgbadgerXXXX.bin so they can be easily identified.
Download the tarball from github and unpack the archive as follow:
tar xzf pgbadger-3.x.tar.gz cd pgbadger-3.x/ perl Makefile.PL make && sudo make install
This will copy the Perl script pgbadger to /usr/local/bin/pgbadger by default and the man page into /usr/local/share/man/man1/pgbadger.1. Those are the default installation directories for 'site' install.
If you want to install all under /usr/ location, use INSTALLDIRS='perl' as an argument of Makefile.PL. The script will be installed into /usr/bin/pgbadger and the manpage into /usr/share/man/man1/pgbadger.1.
For example, to install everything just like Debian does, proceed as follows:
perl Makefile.PL INSTALLDIRS=vendor
By default INSTALLDIRS is set to site.
pgbadger [options] logfile [...] PostgreSQL log analyzer with fully detailed reports and charts. Arguments: logfile can be a single log file, a list of files, or a shell command returning a list of files. If you want to pass log content from stdin use - as filename. Note that input from stdin will not work with csvlog. Options: -a | --average minutes : number of minutes to build the average graphs of queries and connections. -b | --begin datetime : start date/time for the data to be parsed in log. -c | --dbclient host : only report on entries for the given client host. -C | --nocomment : remove comments like /* ... */ from queries. -d | --dbname database : only report on entries for the given database. -e | --end datetime : end date/time for the data to be parsed in log. -f | --format logtype : possible values: syslog,stderr,csv. Default: stderr -G | --nograph : disable graphs on HTML output. Enable by default. -h | --help : show this message and exit. -i | --ident name : programname used as syslog ident. Default: postgres -j | --jobs number : number of jobs to run on parallel on each log file. Default is 1, run as single process. -J | --Jobs number : number of log file to parse in parallel. Default is 1, run as single process. -l | --last-parsed file: allow incremental log parsing by registering the last datetime and line parsed. Useful if you want to watch errors since last run or if you want one report per day with a log rotated each week. -m | --maxlength size : maximum length of a query, it will be restricted to the given size. Default: no truncate -n | --nohighlight : disable SQL code highlighting. -N | --appname name : only report on entries for given application name -o | --outfile filename: define the filename for output. Default depends on the output format: out.html, out.txt or out.tsung. To dump output to stdout use - as filename. -p | --prefix string : give here the value of your custom log_line_prefix defined in your postgresql.conf. Only use it if you aren't using one of the standard prefixes specified in the pgBadger documentation, such as if your prefix includes additional variables like client ip or application name. See examples below. -P | --no-prettify : disable SQL queries prettify formatter. -q | --quiet : don't print anything to stdout, even not a progress bar. -s | --sample number : number of query samples to store/display. Default: 3 -S | --select-only : use it if you want to report select queries only. -t | --top number : number of queries to store/display. Default: 20 -T | --title string : change title of the HTML page report. -u | --dbuser username : only report on entries for the given user. -U | --exclude-user username : exclude entries for the specified user from report. -v | --verbose : enable verbose or debug mode. Disabled by default. -V | --version : show pgBadger version and exit. -w | --watch-mode : only report errors just like logwatch could do. -x | --extension : output format. Values: text, html or tsung. Default: html -z | --zcat exec_path : set the full path to the zcat program. Use it if zcat or bzcat or unzip is not on your path. --pie-limit num : pie data lower than num% will show a sum instead. --exclude-query regex : any query matching the given regex will be excluded from the report. For example: "^(VACUUM|COMMIT)" You can use this option multiple times. --exclude-file filename: path of the file which contains all the regex to use to exclude queries from the report. One regex per line. --include-query regex : any query that does not match the given regex will be excluded from the report. For example: "(table_1|table_2)" You can use this option multiple times. --include-file filename: path of the file which contains all the regex of the queries to include from the report. One regex per line. --disable-error : do not generate error report. --disable-hourly : do not generate hourly report. --disable-type : do not generate query type report. --disable-query : do not generate query reports (slowest, most frequent, ...). --disable-session : do not generate session report. --disable-connection : do not generate connection report. --disable-lock : do not generate lock report. --disable-temporary : do not generate temporary report. --disable-checkpoint : do not generate checkpoint report. --disable-autovacuum : do not generate autovacuum report. --charset : used to set the HTML charset to be used. Default: utf-8. Examples: pgbadger /var/log/postgresql.log pgbadger /var/log/postgres.log.2.gz /var/log/postgres.log.1.gz /var/log/postgres.log pgbadger /var/log/postgresql/postgresql-2012-05-* pgbadger --exclude-query="^(COPY|COMMIT)" /var/log/postgresql.log pgbadger -b "2012-06-25 10:56:11" -e "2012-06-25 10:59:11" /var/log/postgresql.log cat /var/log/postgres.log | pgbadger - # log prefix with stderr log output perl pgbadger --prefix '%t [%p]: [%l-1] user=%u,db=%d,client=%h' \ /pglog/postgresql-2012-08-21* perl pgbadger --prefix '%m %u@%d %p %r %a : ' /pglog/postgresql.log # Log line prefix with syslog log output perl pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' \ /pglog/postgresql-2012-08-21* Use my 8 CPUs to parse my 10GB file faster, really faster perl pgbadger -j 8 /pglog/postgresql-9.1-main.log Generate Tsung sessions XML file with select queries only: perl pgbadger -S -o sessions.tsung --prefix '%t [%p]: [%l-1] user=%u,db=%d ' /pglog/postgresql-9.1.log Reporting errors every week by cron job: 30 23 * * 1 /usr/bin/pgbadger -q -w /var/log/postgresql.log -o /var/reports/pg_errors.html Generate report every week using incremental behavior: 0 4 * * 1 /usr/bin/pgbadger -q `find /var/log/ -mtime -7 -name "postgresql.log*"` \ -o /var/reports/pg_errors-`date +%F`.html -l /var/reports/pgbadger_incremental_file.dat This supposes that your log file and HTML report are also rotated every week.
v4.x sample reports
The sample report generated by pbBadger v4.x can be found here.
Feature request & Bugs
Please report any bugs, patches, discussion, etc. using the GitHub tools at https://github.com/dalibo/pgbadger/.
If you need new features or you think something is missing, please send your comments and requests. This helps a lot to develop a better/useful tool.
You can also write to: <pgbadger |AT| dalibo |DOT| com> or subscribe to the pgBadger mailing list
How to contribute
Dalibo as the maintainer of the project can offer you a high level commercial support.
For a complete list of commercial support near of your place take a look at the PostgreSQL Professional Services page, they all do great job and most of them can help you.