A fast PostgreSQL log analyzer
Key FeaturesSQL queries related reports
- Overall statistics
- The most frequent waiting queries.
- Queries that waited the most.
- Queries generating the most temporary files.
- Queries generating the largest temporary files.
- The slowest queries.
- Queries that took up the most time.
- The most frequent queries.
- The most frequent errors.
- Users involved in top queries.
- Locks statistics.
- ueries by type (select/insert/update/delete).
- Distribution of queries type per database/application
- Sessions per database/user/client.
- Connections per database/user/client.
- Autovacuum and autoanalyze per table.
- Histogram of queries times.
- Histogram of sessions times.
- SQL queries statistics.
- Temporary file statistics.
- Checkpoints statistics.
- Autovacuum and autoanalyze statistics.
- Histogram of query count and duration per hours
- Histogram of error/event count per hours
pgBadger is a PostgreSQL log analyzer build for speed with fully detailed reports from your PostgreSQL log file. It's a single and small Perl script that outperform any other PostgreSQL log analyzer.
pgBadger is able to autodetect your log file format (syslog, stderr or csvlog). It is designed to parse huge log files as well as gzip compressed file. See a complete list of features below.
All charts are zoomable and can be saved as PNG images.
You can also limit pgBadger to only report errors or remove any part of the report using command line options.
pgBadger supports any custom format set into log_line_prefix of your postgresql.conf file provide that you use the %t, %p and %l patterns.
pgBadger allow parallel processing on a single log file and multiple files through the use of the -j option and the number of CPUs as value.
If you want to save system performance you can also use log_duration instead of log_min_duration_statement to have reports on duration and number of queries only.
pgBadger supports an incremental mode that allow to construct incremental reports after successives runs of pgBadger. It is possible to run pgbadger each days or even more, each hours, and have cumulatives reports per day and per week. A top index page will allow you to go directly to the weekly and daily reports.
QuotesAndrew Dunstan was the first to talk about pgBadger after seeing the talk at PGCON 2012: "The biggest thing that piqued my interest yesterday at pgCon was the lightning talk on pgbadger. I'm certainly going to take a look at this as a substitute for pgfouine". See the whole blog post on Andrew's blog.Christophe Pettus, another PostgreSQL Experts consultant, says June, 30th, 2012: "It’s written in Perl, at least as fast as pgfouine, and can process log files that pgfouine can’t handle. It can read either CSV or standard log format, and can directly read *.gz files. It also produces a wider range of reports that pgfouine, including some very useful locking reports. I threw 25GB of logs with near 80 million lines at it without it complaining". Read it all on his blog post.Fidel Ramos' tweet says it all: "pgbadger, an alternative PostgreSQL log analyzer to pgfouine which promises to be both faster and more extensive".Vincent Picavet from Oslandia, says "When we setup a Geographical information system database with PostGIS, we install tools for monitoring its behaviour. This enable us to fine-tune the server and queries. We use PgBadger on various production servers, and it's a real quick and efficient solution to get useful reports. The really dynamic development of this solution has been a big plus in our choice."Michael Paquier from Postgres-XC project said in his article: "It is one of those utilities that you can use not only for production database systems, but for extra things like benchmark or performance analysis. Its installation is easy, will not heavy your system with packages you might not want, so go ahead and use it ... this utility is 100% compatible with Postgres-XC, the only thing you need to do is just to launch pgbadger for each node of your cluster."
- pgBadger is an original work from Gilles Darold.
- The pgBadger logo is an original creation of Damien Cazeils.
- The pgBadger v4.x design comes from the "Art is code" company.
- This web site is a work of Gilles Darold.
- pgBadger is maintained by Gilles Darold and the good folks at Dalibo and e²very one who wants to contribute.
- Many people have contributed to pgBadger, they are all quoted in the Changelog file.
- <pgbadger AT dalibo DOT com>
- <contact AT dalibo DOT com>
- <gilles AT darold DOT net>
- "Art is code"
Latest news about pgBadger
- 6.3 version
- 2015-03-27 Date
This new release fixes some bugs and adds some new reports:
* A new per user details (total duration and times executed) for each query reported in Top Queries reports. The details are visible from a new button called "User(s) involved". * Add "Average queries per session" and "Average queries duration per session" in Sessions tab of the Global statistics. * Add connection time histogram. * Use bar graph for Histogram of query times and sessions times.
There's also some cool new features and options:
* Add -L | --logfile-list option to read a list of logfiles from an external file. * Add support to log_timezones with + and - signs for timestamp with milliseconds (%m). * Add --noreport option to instruct pgbadger to not build any HTML reports in incremental mode. pgBadger will only create binary files. * Add autodetection of client=%h or remote=%h from the log so that adding a prefix is not needed when it respect the default of pgbadger. * Redefine sessions duration histogram bound to be more accurate. * Add new option -M | --no-multiline to not collect multiline statement and avoid storing and reporting garbage when needed. * Add --log-duration option to force pgbadger to associate log entries generated by both log_duration=on and log_statement=all.
The pgbadger_tools script have also been improve with new features:
* Add a new tool to pgbadger_tool to output top queries in CSV format for follow-up analysis. * Add --explain-time-consuming and --explain-normalized options to generate explain statement about top time consuming and top normalized slowest queries.
Here is the full list of changes/fixes:
- Update flotr2.min.js to latest github code. - Add per user detail information (total duration and times executed) for each query reported in "Time consuming queries", "Most frequent queries" "and Normalized slowest queries". The details are visible from a new button called "User(s) involved" near the "Examples" button. Thanks to Guillaume Le Bihan for the patch and tsn77130 for the feature request. - pgbadger_tool: add tool to output top queries to CSV format, for follow-up analysis. Thanks to briklen for the patch. - Add geometric operators to SQL beautifier. Thanks to Rodolphe Quiedeville for the report. - Fix non closing session when a process crash with message: "terminating connection because of crash of another server process". Thanks to Mael Rimbault for the report. - Add -L|--logfile-list command line option to read a list of logfiles from a file. Thanks to Hubert depesz Lubaczewski for the feature request. - Automatically remove %q from prefix. Thanks to mbecroft for report. - Do not store DEALLOCATE log entries anymore. - Fix queries histogram where range was not appears in the right order. Thanks to Grzegorz Garlewicz for the report. - Fix min yaxis in histogram graph. Thanks to grzeg1 for the patch. - Add --log-duration command line option to force pgbadger to associate log entries generated by both log_duration = on and log_statement=all. Thanks to grzeg1 for the feature request. - Small typographical corrections. Thanks to Jefferson Queiroz Venerando and Bill Mitchell the patches. - Reformat usage output and add explanation of the --noreport command line option. - Fix documentation about minimal pattern in custom log format. Thanks to Julien Rouhaud for the suggestion. - Add support to log_timezones with + and - signs to timestamp with milliseconds (%m). Thanks to jacksonfoz for the patch. pgbadger was not recognize log files with timezones like 'GMT+3'. - Add --noreport command line option to instruct pgbadger to not build any reports in incremental mode. pgBadger will only create binary files. Thanks to hubert Depesz Lubaczewski for the feature request. - Add time consuming information in tables of Queries per database... Thanks to Thomas for the feature request. - Add more details about the CSV parser error. It now prints the line number and the last parameter that generate the failure. This should allow to see the malformed log entry. - Change substitution markup in apptempt to fix a new look-behind assertions error. Thanks to Paolo Cavallini for the report. - Use bar graph for Histogram of query times and sessions times. - Fix wrong count of min/max queries per second. Thanks to Guillaume Lelarge for the report. Add COPY statement to SELECT or INSERT statements statistics following the copy direction (stdin or stdout). - Fix Illegal division by zero at line 3832. Thanks to MarcoTrek for the report. - Add "Average queries per session" and "Average queries duration per session" in Sessions tab of the Global stat. Thanks to Guillaume Lelarge for the feature request. - Reformat numbers in pie graph tracker. Thanks to jirihlinka for the report. - pgbadger_tools: Add --explain-time-consuming and --explain-normalized to generate explain statement about top time consuming and top normalized slowest queries. Thanks to Josh Kupershmid fot the feature request. - Remove everything than error information from json output when -w | --watch-mode is enable. Thanks to jason. - Fix undefined subroutine encode_json when using -x json. Thanks to jason for the report. - Add autodetection of client=%h or remote=%h from the log so that adding a prefix is not needed when it respect the default of pgbadger. - Redefine sessions duration histogram bound to be more accurate. Thanks to Guillaume Lelarge for the report. - Add connection time histogram. Thanks to Guillaume Lelarge for the feature request. - Add new option -M | --no-multiline to not collect multiline statement to avoid garbage especially on errors that generate a huge report. - Do not return SUCCESS error code 0 when aborted or something fails. Thanks to Bruno Almeida for the patch.
- 6.2 version
- 2014-10-07 Date
This is a maintenance release to fix a regression in SQL traffic graphs and fix some other minor issues.
The release also add a new option -D or --dns-resolv to map client ip addresses to FQDN without having log_hostname enabled on the postgresql's configuration
- Do not display queries in Slowest individual, Time consuming and Normalized slowest queries reports when there is no duration in log file. Display NO DATASET instead. - Fix min/max queries in SQL traffic that was based on duration instead of query count. - Fix wrong unit to Synced files in Checkpoints files report. Thanks to Levente Birta for the report. - Enable allow_loose_quotes in Text::CSV_XS call to fix CSV parsing error when fields have quote inside an unquoted field. Thanks to Josh Berkus for the report. - Add -D | --dns-resolv command line option to replace ip addresses by their DNS name. Be warned that this can slow down pgBagder a lot. Thanks to Jiri Hlinka for the feature request.
- 6.1 version
- 2014-09-25 Date
This release fix some issues and adds some new features. It adds a new option -B or --bar-graph to use bar instead of line in graphs. It will also keep tick formatting when zooming.
The release also adds a new program: pgbadger_tools to demonstrate how to work with pgBadger binary files to build your own new feature. The first tools 'Explain slowest' allow printing of top slowest queries as EXPLAIN statements. There's also additionnal options to execute automatically the statements with EXPLAIN ANALYZE and get the execution plan. See help of the program for more information or the README file in the tools directory.
Some modifications with behavior changes:
- The -T | --title text value will now be displayed instead of the pgBadger label right after the logo. report. It was previously displayed on mouse over the pgBadger label.
- 6.0 version
- 2014-08-12 Date
This new major release adds some new features like automatic cleanup of binary files in incremental mode or maximum number of weeks for reports retention. It improve the incremental mode with allowing the use of multiprocessing with multiple log file.
It also adds report of query latency percentile on the general activity table (percentiles are 90, 95, 99).
There's also a new output format: JSON. This format is good for sharing data with other languages, which makes it easy to integrate pgBadger's result into other monitoring tools.
You may want to expose your reports but not the data, using the --anonymize option pgBadger will be able to anonymize all literal values in the queries.
Sometime select to copy a query from the report could be a pain. There's now a click-to-select button in front of each query that allow you to just use Ctrl+C to copy it on clipboard
Warning: the behavior of pgBadger in incremental mode has changed. It will now always cleanup the output directory of all the obsolete binary file. If you were using those files to build your own reports, you can prevent pgBadger to remove them by using the --noclean option. Note that if you use the retention feature, all those files in obsolete directories will be removed too.
See ChangeLog for a complete list of changes.
pgBadger may have a binary package corresponding to your distribution. For RPM packages, thanks to the great work of Devrim GÜNDÜZ, you can find the pgBadger package at the PostgreQSL yum repository
If you are the maintainer of other package distribution or you know how to download them, please let me know I will add a link here.
The latest development code can always be found into the pgBadger's GitHub repository
If you planned to parse PostgreSQL CSV log files you might need some Perl Modules:
Text::CSV_XS - to parse PostgreSQL CSV log files.
This module is optional, if you don't have PostgreSQL log in the CSV format you don't need to install it.
Compressed log file format is autodetected from the file exension. If pgBadger find a gz extension it will use the zcat utility, with a bz2 extension it will use bzcat and if the file extension is zip then the unzip utility will be used.
If those utilities are not found in the PATH environment variable then use the --zcat command line option to change this path. For example:
--zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc" --zcat="C:\tools\unzip -p"
By default pgBadger will use the zcat, bzcat and unzip utilities following the file extension. If you use the default autodetection compress format you can mixed gz, bz2 or zip files. Specifying a custom value to --zcat option will remove this feature of mixed compressed format.
Note that multiprocessing can not be used with compressed files or CSV files as well as under Windows platform.
You must enable and set some configuration directives in your postgresql.conf before starting.
You must first enable SQL query logging to have something to parse:
log_min_duration_statement = 0
Here every statement will be logged, on busy server you may want to increase this value to only log queries with a higher duration time. Note that if you have log_statement set to 'all' nothing will be logged through log_min_duration_statement. See next chapter for more information.
With 'stderr' log format, log_line_prefix must be at least:
log_line_prefix = '%t [%p]: [%l-1] '
Log line prefix could add user and database name as follows:
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
or for syslog log file format:
log_line_prefix = 'user=%u,db=%d '
Log line prefix for stderr output could also be:
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '
or for syslog output:
log_line_prefix = 'db=%d,user=%u '
You need to enable other parameters in postgresql.conf to get more information from your log files:
log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0
Do not enable log_statement as their log format will not be parsed by pgBadger.
Of course your log messages should be in English without locale support:
but this is not only recommended by pgBadger.
log_min_duration_statement, log_duration and log_statement
If you want full statistics reports you must set log_min_duration_statement to 0 or more milliseconds.
If you just want to report duration and number of queries and don't want all details about queries, set log_min_duration_statement to -1 to disable it and enable log_duration in your postgresql.conf file. If you want to add the most common request report you can either choose to set log_min_duration_statement to a higher value or choose to enable log_statement.
Enabling log_min_duration_statement will add reports about slowest queries and queries that took up the most time. Take care that if you have log_statement set to 'all' nothing will be logged with log_line_prefix.
To enable parallel processing you just have to use the -j N option where N is the number of cores you want to use.
pgbadger will then proceed as follow:
for each log file chunk size = int(file size / N) look at start/end offsets of these chunks fork N processes and seek to the start offset of each chunk each process will terminate when the parser reach the end offset of its chunk each process write stats into a binary temporary file wait for all children has terminated All binary temporary files generated will then be read and loaded into memory to build the html output.
With that method, at start/end of chunks pgbadger may truncate or omit a maximum of N queries perl log file which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is loose is near 0, this is why I think this gap is livable. Most of the time the query is counted twice but truncated.
When you have lot of small log files and lot of CPUs it is speedier to dedicate one core to one log file at a time. To enable this behavior you have to use option -J N instead. With 200 log files of 10MB each the use of the -J option start being really interesting with 8 Cores. Using this method you will be sure to not loose any queries in the reports.
He are a benchmarck done on a server with 8 CPUs and a single file of 9.5GB.
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+---------+-------+-------+------ -j | 1h41m18 | 50m25 | 25m39 | 15m58 -J | 1h41m18 | 54m28 | 41m16 | 34m45
With 200 log files of 10MB each and a total og 2GB the results are slightly different:
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+-------+-------+-------+------ -j | 20m15 | 9m56 | 5m20 | 4m20 -J | 20m15 | 9m49 | 5m00 | 2m40
So it is recommanded to use -j unless you have hundred of small log file and can use at least 8 CPUs.
IMPORTANT: when you are using parallel parsing pgbadger will generate a lot of temporary files in the /tmp directory and will remove them at end, so do not remove those files unless pgbadger is not running. They are all named with the following template tmp_pgbadgerXXXX.bin so they can be easily identified.
Download the tarball from github and unpack the archive as follow:
tar xzf pgbadger-3.x.tar.gz cd pgbadger-3.x/ perl Makefile.PL make && sudo make install
This will copy the Perl script pgbadger to /usr/local/bin/pgbadger by default and the man page into /usr/local/share/man/man1/pgbadger.1. Those are the default installation directories for 'site' install.
If you want to install all under /usr/ location, use INSTALLDIRS='perl' as an argument of Makefile.PL. The script will be installed into /usr/bin/pgbadger and the manpage into /usr/share/man/man1/pgbadger.1.
For example, to install everything just like Debian does, proceed as follows:
perl Makefile.PL INSTALLDIRS=vendor
By default INSTALLDIRS is set to site.
pgbadger [options] logfile [...] PostgreSQL log analyzer with fully detailed reports and charts. Arguments: logfile can be a single log file, a list of files, or a shell command returning a list of files. If you want to pass log content from stdin use - as filename. Note that input from stdin will not work with csvlog. Options: -a | --average minutes : number of minutes to build the average graphs of queries and connections. -b | --begin datetime : start date/time for the data to be parsed in log. -c | --dbclient host : only report on entries for the given client host. -C | --nocomment : remove comments like /* ... */ from queries. -d | --dbname database : only report on entries for the given database. -e | --end datetime : end date/time for the data to be parsed in log. -f | --format logtype : possible values: syslog,stderr,csv. Default: stderr -G | --nograph : disable graphs on HTML output. Enable by default. -h | --help : show this message and exit. -i | --ident name : programname used as syslog ident. Default: postgres -j | --jobs number : number of jobs to run on parallel on each log file. Default is 1, run as single process. -J | --Jobs number : number of log file to parse in parallel. Default is 1, run as single process. -l | --last-parsed file: allow incremental log parsing by registering the last datetime and line parsed. Useful if you want to watch errors since last run or if you want one report per day with a log rotated each week. -m | --maxlength size : maximum length of a query, it will be restricted to the given size. Default: no truncate -n | --nohighlight : disable SQL code highlighting. -N | --appname name : only report on entries for given application name -o | --outfile filename: define the filename for output. Default depends on the output format: out.html, out.txt or out.tsung. To dump output to stdout use - as filename. -p | --prefix string : give here the value of your custom log_line_prefix defined in your postgresql.conf. Only use it if you aren't using one of the standard prefixes specified in the pgBadger documentation, such as if your prefix includes additional variables like client ip or application name. See examples below. -P | --no-prettify : disable SQL queries prettify formatter. -q | --quiet : don't print anything to stdout, even not a progress bar. -s | --sample number : number of query samples to store/display. Default: 3 -S | --select-only : use it if you want to report select queries only. -t | --top number : number of queries to store/display. Default: 20 -T | --title string : change title of the HTML page report. -u | --dbuser username : only report on entries for the given user. -U | --exclude-user username : exclude entries for the specified user from report. -v | --verbose : enable verbose or debug mode. Disabled by default. -V | --version : show pgBadger version and exit. -w | --watch-mode : only report errors just like logwatch could do. -x | --extension : output format. Values: text, html or tsung. Default: html -z | --zcat exec_path : set the full path to the zcat program. Use it if zcat or bzcat or unzip is not on your path. --pie-limit num : pie data lower than num% will show a sum instead. --exclude-query regex : any query matching the given regex will be excluded from the report. For example: "^(VACUUM|COMMIT)" You can use this option multiple times. --exclude-file filename: path of the file which contains all the regex to use to exclude queries from the report. One regex per line. --include-query regex : any query that does not match the given regex will be excluded from the report. For example: "(table_1|table_2)" You can use this option multiple times. --include-file filename: path of the file which contains all the regex of the queries to include from the report. One regex per line. --disable-error : do not generate error report. --disable-hourly : do not generate hourly report. --disable-type : do not generate query type report. --disable-query : do not generate query reports (slowest, most frequent, ...). --disable-session : do not generate session report. --disable-connection : do not generate connection report. --disable-lock : do not generate lock report. --disable-temporary : do not generate temporary report. --disable-checkpoint : do not generate checkpoint report. --disable-autovacuum : do not generate autovacuum report. --charset : used to set the HTML charset to be used. Default: utf-8. Examples: pgbadger /var/log/postgresql.log pgbadger /var/log/postgres.log.2.gz /var/log/postgres.log.1.gz /var/log/postgres.log pgbadger /var/log/postgresql/postgresql-2012-05-* pgbadger --exclude-query="^(COPY|COMMIT)" /var/log/postgresql.log pgbadger -b "2012-06-25 10:56:11" -e "2012-06-25 10:59:11" /var/log/postgresql.log cat /var/log/postgres.log | pgbadger - # log prefix with stderr log output perl pgbadger --prefix '%t [%p]: [%l-1] user=%u,db=%d,client=%h' \ /pglog/postgresql-2012-08-21* perl pgbadger --prefix '%m %u@%d %p %r %a : ' /pglog/postgresql.log # Log line prefix with syslog log output perl pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' \ /pglog/postgresql-2012-08-21* Use my 8 CPUs to parse my 10GB file faster, really faster perl pgbadger -j 8 /pglog/postgresql-9.1-main.log Generate Tsung sessions XML file with select queries only: perl pgbadger -S -o sessions.tsung --prefix '%t [%p]: [%l-1] user=%u,db=%d ' /pglog/postgresql-9.1.log Reporting errors every week by cron job: 30 23 * * 1 /usr/bin/pgbadger -q -w /var/log/postgresql.log -o /var/reports/pg_errors.html Generate report every week using incremental behavior: 0 4 * * 1 /usr/bin/pgbadger -q `find /var/log/ -mtime -7 -name "postgresql.log*"` \ -o /var/reports/pg_errors-`date +%F`.html -l /var/reports/pgbadger_incremental_file.dat This supposes that your log file and HTML report are also rotated every week.
v6.x sample reports
Sample report generated by pbBadger v6.x with incremental mode can be found here.
v5.x sample reports
Sample report generated by pbBadger v5.x with incremental mode can be found here.
v4.x sample reports
The sample report generated by pbBadger v4.x can be found here.
Feature request & Bugs
Please report any bugs, patches, discussion, etc. using the GitHub tools at https://github.com/dalibo/pgbadger/.
If you need new features or you think something is missing, please send your comments and requests. This helps a lot to develop a better/useful tool.
You can also write to: <pgbadger |AT| dalibo |DOT| com> or subscribe to the pgBadger mailing list (email@example.com)
I want to make a donation
Some of you ask me a way to donate to support my all days efforts to build a better tool. You can use the link below and if you need an invoice, just drop me line I will send it back to you.
How to contribute
Dalibo as the maintainer of the project can offer you a high level commercial support.
For a complete list of commercial support near of your place take a look at the PostgreSQL Professional Services page, they all do great job and most of them can help you.