A fast PostgreSQL log analyzer
Key FeaturesSQL queries related reports
- Overall statistics
- The most frequent waiting queries.
- Queries that waited the most.
- Queries generating the most temporary files.
- Queries generating the largest temporary files.
- The slowest queries.
- Queries that took up the most time.
- The most frequent queries.
- The most frequent errors.
- Locks statistics.
- ueries by type (select/insert/update/delete).
- Distribution of queries type per database/application
- Sessions per database/user/client.
- Connections per database/user/client.
- Autovacuum and autoanalyze per table.
- Histogram of query times.
- SQL queries statistics.
- Temporary file statistics.
- Checkpoints statistics.
- Autovacuum and autoanalyze statistics.
- Histogram of query count and duration per hours
- Histogram of error/event count per hours
pgBadger is a PostgreSQL log analyzer build for speed with fully detailed reports from your PostgreSQL log file. It's a single and small Perl script that outperform any other PostgreSQL log analyzer.
pgBadger is able to autodetect your log file format (syslog, stderr or csvlog). It is designed to parse huge log files as well as gzip compressed file. See a complete list of features below.
All charts are zoomable and can be saved as PNG images.
You can also limit pgBadger to only report errors or remove any part of the report using command line options.
pgBadger supports any custom format set into log_line_prefix of your postgresql.conf file provide that you use the %t, %p and %l patterns.
pgBadger allow parallel processing on a single log file and multiple files through the use of the -j option and the number of CPUs as value.
If you want to save system performance you can also use log_duration instead of log_min_duration_statement to have reports on duration and number of queries only.
pgBadger supports an incremental mode that allow to construct incremental reports after successives runs of pgBadger. It is possible to run pgbadger each days or even more, each hours, and have cumulatives reports per day and per week. A top index page will allow you to go directly to the weekly and daily reports.
QuotesAndrew Dunstan was the first to talk about pgBadger after seeing the talk at PGCON 2012: "The biggest thing that piqued my interest yesterday at pgCon was the lightning talk on pgbadger. I'm certainly going to take a look at this as a substitute for pgfouine". See the whole blog post on Andrew's blog.Christophe Pettus, another PostgreSQL Experts consultant, says June, 30th, 2012: "It’s written in Perl, at least as fast as pgfouine, and can process log files that pgfouine can’t handle. It can read either CSV or standard log format, and can directly read *.gz files. It also produces a wider range of reports that pgfouine, including some very useful locking reports. I threw 25GB of logs with near 80 million lines at it without it complaining". Read it all on his blog post.Fidel Ramos' tweet says it all: "pgbadger, an alternative PostgreSQL log analyzer to pgfouine which promises to be both faster and more extensive".Vincent Picavet from Oslandia, says "When we setup a Geographical information system database with PostGIS, we install tools for monitoring its behaviour. This enable us to fine-tune the server and queries. We use PgBadger on various production servers, and it's a real quick and efficient solution to get useful reports. The really dynamic development of this solution has been a big plus in our choice."Michael Paquier from Postgres-XC project said in his article: "It is one of those utilities that you can use not only for production database systems, but for extra things like benchmark or performance analysis. Its installation is easy, will not heavy your system with packages you might not want, so go ahead and use it ... this utility is 100% compatible with Postgres-XC, the only thing you need to do is just to launch pgbadger for each node of your cluster."
- pgBadger is an original work from Gilles Darold.
- The pgBadger logo is an original creation of Damien Cazeils.
- The pgBadger v4.x design comes from the "Art is code" company.
- This web site is a work of Gilles Darold.
- pgBadger is maintained by Gilles Darold and the good folks at Dalibo and e²very one who wants to contribute.
- Many people have contributed to pgBadger, they are all quoted in the Changelog file.
- <pgbadger AT dalibo DOT com>
- <contact AT dalibo DOT com>
- <gilles AT darold DOT net>
- "Art is code"
Latest news about pgBadger
- 6.2 version
- 2014-10-07 Date
This is a maintenance release to fix a regression in SQL traffic graphs and fix some other minor issues.
The release also add a new option -D or --dns-resolv to map client ip addresses to FQDN without having log_hostname enabled on the postgresql's configuration
- Do not display queries in Slowest individual, Time consuming and Normalized slowest queries reports when there is no duration in log file. Display NO DATASET instead. - Fix min/max queries in SQL traffic that was based on duration instead of query count. - Fix wrong unit to Synced files in Checkpoints files report. Thanks to Levente Birta for the report. - Enable allow_loose_quotes in Text::CSV_XS call to fix CSV parsing error when fields have quote inside an unquoted field. Thanks to Josh Berkus for the report. - Add -D | --dns-resolv command line option to replace ip addresses by their DNS name. Be warned that this can slow down pgBagder a lot. Thanks to Jiri Hlinka for the feature request.
- 6.1 version
- 2014-09-25 Date
This release fix some issues and adds some new features. It adds a new option -B or --bar-graph to use bar instead of line in graphs. It will also keep tick formatting when zooming.
The release also addis a new program: pgbadger_tools to demonstrate how to work with pgBadger binary files to build your own new feature. The first tools 'Explain slowest' allow printing of top slowest queries as EXPLAIN statements. There's also additionnal options to execute automatically the statements with EXPLAIN ANALYZE and get the execution plan. See help of the program for more information or the README file in the tools directory.
Some modifications with behavior changes:
- The -T | --title text value will now be displayed instead of the pgBadger label right after the logo. report. It was previously displayed on mouse over the pgBadger label.
- 6.0 version
- 2014-08-12 Date
This new major release adds some new features like automatic cleanup of binary files in incremental mode or maximum number of weeks for reports retention. It improve the incremental mode with allowing the use of multiprocessing with multiple log file.
It also adds report of query latency percentile on the general activity table (percentiles are 90, 95, 99).
There's also a new output format: JSON. This format is good for sharing data with other languages, which makes it easy to integrate pgBadger's result into other monitoring tools.
You may want to expose your reports but not the data, using the --anonymize option pgBadger will be able to anonymize all literal values in the queries.
Sometime select to copy a query from the report could be a pain. There's now a click-to-select button in front of each query that allow you to just use Ctrl+C to copy it on clipboard
Warning: the behavior of pgBadger in incremental mode has changed. It will now always cleanup the output directory of all the obsolete binary file. If you were using those files to build your own reports, you can prevent pgBadger to remove them by using the --noclean option. Note that if you use the retention feature, all those files in obsolete directories will be removed too.
See ChangeLog for a complete list of changes.
- 5.1 version
- 2014-05-04 Date
This new release fixes several issues and adds several new features like:
- * Support to named PREPARE and EXECUTE queries. They are replaced by the real prepare statement and reported into top queries.
- * Add new --exclude-line command line option for excluding immediately log entries matching any regex.
- * Included remote and client information into the most frequent events.
- * pgBadger is now able to parse remote logfiles using a password less ssh connection and generate locally the reports.
- * Histogram granularity can be adjusted using the -A command line option.
- * Add new detail information on top queries to show when the query is a bind query.
- * Support to logfile compressed using the xz compression format.
- * Change week/day menu in incremental index, it is now represented as usual with a calendar view per month.
- * Fix various compatibility issue with Windows and Perl 5.8
See ChangeLog for a complete list of changes.
- 5.0 version
- 2014-02-05 Date
This new major release adds some new features like incremental mode and SQL queries times histogram. There is also a hourly graphic representation of the count and average duration of top normalized queries. Same for errors or events, you will be able to see graphically at which hours they are occurring the most often.
The incremental mode is an old request issued at PgCon Ottawa 2012 that concern the ability to construct incremental reports with successive runs of pgBadger. It is now possible to run pgbadger each days or even more, each hours, and have cumulative reports per day and per week. A top index page allow you to go directly to the weekly and daily reports.
This mode have been build with simplicity in mind so running pgbadger by cron as follow:
0 23 * * * pgbadger -q -I -O /var/www/pgbadger/ /var/log/postgresql.logis enough to have daily and weekly reports viewable using your browser.
You can take a look at a sample report at http://dalibo.github.io/pgbadger/demov5/index.html
There's also a useful improvement to allow pgBadger to seek directly to the last position in the same log file after a successive execution. This feature is only available using the incremental mode or the -l option and parsing a single log file. Let's say you have a weekly rotated log file and want to run pgBadger each days. With 2GB of log per day, pgbadger was spending 5 minutes per block of 2 GB to reach the last position in the log, so at the end of the week this feature will save you 35 minutes. Now pgBadger will start parsing new log entries immediately. This feature is compatible with the multiprocess mode using -j option (n processes for one log file).
Histogram of query times is a new report in top queries slide that shows the query times distribution during the analyzed period. For example:
Range Count Percentage -------------------------------------------- 0-1ms 10,367,313 53.52% 1-5ms 799,883 4.13% 5-10ms 451,646 2.33% 10-25ms 2,965,883 15.31% 25-50ms 4,510,258 23.28% 50-100ms 180,975 0.93% 100-500ms 87,613 0.45% 500-1000ms 5,856 0.03% 1000-10000ms 2,697 0.01% > 10000ms 74 0.00%
There is also some graphic and report improvements, like the mouse tracker formatting that have been reviewed. It now shows a vertical crosshair and all dataset values at a time when mouse pointer moves over series. Automatic queries formatting has also been changed, it is now done on double click event as simple click was painful when you want to copy some part of the queries.
The report "Simultaneous Connections" has been relabeled into "Established Connections", it is less confusing as many people think that this is the number of simultaneous sessions, which is not the case. It only count the number of connections established at same time.
Autovacuum reports now associate database name to the autovacuum and autoanalyze entries. Statistics now refer to "dbname.schema.table", previous versions was only showing the pair "schema.table".
This release also adds Session peak information and a report about Simultaneous sessions. Parameters log_connections and log_disconnections must be enabled in postgresql.conf.
See ChangeLog for a complete list of changes.
pgBadger may have a binary package corresponding to your distribution. For RPM packages, thanks to the great work of Devrim GÜNDÜZ, you can find the pgBadger package at the PostgreQSL yum repository
If you are the maintainer of other package distribution or you know how to download them, please let me know I will add a link here.
The latest development code can always be found into the pgBadger's GitHub repository
If you planned to parse PostgreSQL CSV log files you might need some Perl Modules:
Text::CSV_XS - to parse PostgreSQL CSV log files.
This module is optional, if you don't have PostgreSQL log in the CSV format you don't need to install it.
Compressed log file format is autodetected from the file exension. If pgBadger find a gz extension it will use the zcat utility, with a bz2 extension it will use bzcat and if the file extension is zip then the unzip utility will be used.
If those utilities are not found in the PATH environment variable then use the --zcat command line option to change this path. For example:
--zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc" --zcat="C:\tools\unzip -p"
By default pgBadger will use the zcat, bzcat and unzip utilities following the file extension. If you use the default autodetection compress format you can mixed gz, bz2 or zip files. Specifying a custom value to --zcat option will remove this feature of mixed compressed format.
Note that multiprocessing can not be used with compressed files or CSV files as well as under Windows platform.
You must enable and set some configuration directives in your postgresql.conf before starting.
You must first enable SQL query logging to have something to parse:
log_min_duration_statement = 0
Here every statement will be logged, on busy server you may want to increase this value to only log queries with a higher duration time. Note that if you have log_statement set to 'all' nothing will be logged through log_min_duration_statement. See next chapter for more information.
With 'stderr' log format, log_line_prefix must be at least:
log_line_prefix = '%t [%p]: [%l-1] '
Log line prefix could add user and database name as follows:
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
or for syslog log file format:
log_line_prefix = 'user=%u,db=%d '
Log line prefix for stderr output could also be:
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '
or for syslog output:
log_line_prefix = 'db=%d,user=%u '
You need to enable other parameters in postgresql.conf to get more information from your log files:
log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0
Do not enable log_statement as their log format will not be parsed by pgBadger.
Of course your log messages should be in English without locale support:
but this is not only recommended by pgBadger.
log_min_duration_statement, log_duration and log_statement
If you want full statistics reports you must set log_min_duration_statement to 0 or more milliseconds.
If you just want to report duration and number of queries and don't want all details about queries, set log_min_duration_statement to -1 to disable it and enable log_duration in your postgresql.conf file. If you want to add the most common request report you can either choose to set log_min_duration_statement to a higher value or choose to enable log_statement.
Enabling log_min_duration_statement will add reports about slowest queries and queries that took up the most time. Take care that if you have log_statement set to 'all' nothing will be logged with log_line_prefix.
To enable parallel processing you just have to use the -j N option where N is the number of cores you want to use.
pgbadger will then proceed as follow:
for each log file chunk size = int(file size / N) look at start/end offsets of these chunks fork N processes and seek to the start offset of each chunk each process will terminate when the parser reach the end offset of its chunk each process write stats into a binary temporary file wait for all children has terminated All binary temporary files generated will then be read and loaded into memory to build the html output.
With that method, at start/end of chunks pgbadger may truncate or omit a maximum of N queries perl log file which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is loose is near 0, this is why I think this gap is livable. Most of the time the query is counted twice but truncated.
When you have lot of small log files and lot of CPUs it is speedier to dedicate one core to one log file at a time. To enable this behavior you have to use option -J N instead. With 200 log files of 10MB each the use of the -J option start being really interesting with 8 Cores. Using this method you will be sure to not loose any queries in the reports.
He are a benchmarck done on a server with 8 CPUs and a single file of 9.5GB.
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+---------+-------+-------+------ -j | 1h41m18 | 50m25 | 25m39 | 15m58 -J | 1h41m18 | 54m28 | 41m16 | 34m45
With 200 log files of 10MB each and a total og 2GB the results are slightly different:
Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU --------+-------+-------+-------+------ -j | 20m15 | 9m56 | 5m20 | 4m20 -J | 20m15 | 9m49 | 5m00 | 2m40
So it is recommanded to use -j unless you have hundred of small log file and can use at least 8 CPUs.
IMPORTANT: when you are using parallel parsing pgbadger will generate a lot of temporary files in the /tmp directory and will remove them at end, so do not remove those files unless pgbadger is not running. They are all named with the following template tmp_pgbadgerXXXX.bin so they can be easily identified.
Download the tarball from github and unpack the archive as follow:
tar xzf pgbadger-3.x.tar.gz cd pgbadger-3.x/ perl Makefile.PL make && sudo make install
This will copy the Perl script pgbadger to /usr/local/bin/pgbadger by default and the man page into /usr/local/share/man/man1/pgbadger.1. Those are the default installation directories for 'site' install.
If you want to install all under /usr/ location, use INSTALLDIRS='perl' as an argument of Makefile.PL. The script will be installed into /usr/bin/pgbadger and the manpage into /usr/share/man/man1/pgbadger.1.
For example, to install everything just like Debian does, proceed as follows:
perl Makefile.PL INSTALLDIRS=vendor
By default INSTALLDIRS is set to site.
pgbadger [options] logfile [...] PostgreSQL log analyzer with fully detailed reports and charts. Arguments: logfile can be a single log file, a list of files, or a shell command returning a list of files. If you want to pass log content from stdin use - as filename. Note that input from stdin will not work with csvlog. Options: -a | --average minutes : number of minutes to build the average graphs of queries and connections. -b | --begin datetime : start date/time for the data to be parsed in log. -c | --dbclient host : only report on entries for the given client host. -C | --nocomment : remove comments like /* ... */ from queries. -d | --dbname database : only report on entries for the given database. -e | --end datetime : end date/time for the data to be parsed in log. -f | --format logtype : possible values: syslog,stderr,csv. Default: stderr -G | --nograph : disable graphs on HTML output. Enable by default. -h | --help : show this message and exit. -i | --ident name : programname used as syslog ident. Default: postgres -j | --jobs number : number of jobs to run on parallel on each log file. Default is 1, run as single process. -J | --Jobs number : number of log file to parse in parallel. Default is 1, run as single process. -l | --last-parsed file: allow incremental log parsing by registering the last datetime and line parsed. Useful if you want to watch errors since last run or if you want one report per day with a log rotated each week. -m | --maxlength size : maximum length of a query, it will be restricted to the given size. Default: no truncate -n | --nohighlight : disable SQL code highlighting. -N | --appname name : only report on entries for given application name -o | --outfile filename: define the filename for output. Default depends on the output format: out.html, out.txt or out.tsung. To dump output to stdout use - as filename. -p | --prefix string : give here the value of your custom log_line_prefix defined in your postgresql.conf. Only use it if you aren't using one of the standard prefixes specified in the pgBadger documentation, such as if your prefix includes additional variables like client ip or application name. See examples below. -P | --no-prettify : disable SQL queries prettify formatter. -q | --quiet : don't print anything to stdout, even not a progress bar. -s | --sample number : number of query samples to store/display. Default: 3 -S | --select-only : use it if you want to report select queries only. -t | --top number : number of queries to store/display. Default: 20 -T | --title string : change title of the HTML page report. -u | --dbuser username : only report on entries for the given user. -U | --exclude-user username : exclude entries for the specified user from report. -v | --verbose : enable verbose or debug mode. Disabled by default. -V | --version : show pgBadger version and exit. -w | --watch-mode : only report errors just like logwatch could do. -x | --extension : output format. Values: text, html or tsung. Default: html -z | --zcat exec_path : set the full path to the zcat program. Use it if zcat or bzcat or unzip is not on your path. --pie-limit num : pie data lower than num% will show a sum instead. --exclude-query regex : any query matching the given regex will be excluded from the report. For example: "^(VACUUM|COMMIT)" You can use this option multiple times. --exclude-file filename: path of the file which contains all the regex to use to exclude queries from the report. One regex per line. --include-query regex : any query that does not match the given regex will be excluded from the report. For example: "(table_1|table_2)" You can use this option multiple times. --include-file filename: path of the file which contains all the regex of the queries to include from the report. One regex per line. --disable-error : do not generate error report. --disable-hourly : do not generate hourly report. --disable-type : do not generate query type report. --disable-query : do not generate query reports (slowest, most frequent, ...). --disable-session : do not generate session report. --disable-connection : do not generate connection report. --disable-lock : do not generate lock report. --disable-temporary : do not generate temporary report. --disable-checkpoint : do not generate checkpoint report. --disable-autovacuum : do not generate autovacuum report. --charset : used to set the HTML charset to be used. Default: utf-8. Examples: pgbadger /var/log/postgresql.log pgbadger /var/log/postgres.log.2.gz /var/log/postgres.log.1.gz /var/log/postgres.log pgbadger /var/log/postgresql/postgresql-2012-05-* pgbadger --exclude-query="^(COPY|COMMIT)" /var/log/postgresql.log pgbadger -b "2012-06-25 10:56:11" -e "2012-06-25 10:59:11" /var/log/postgresql.log cat /var/log/postgres.log | pgbadger - # log prefix with stderr log output perl pgbadger --prefix '%t [%p]: [%l-1] user=%u,db=%d,client=%h' \ /pglog/postgresql-2012-08-21* perl pgbadger --prefix '%m %u@%d %p %r %a : ' /pglog/postgresql.log # Log line prefix with syslog log output perl pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' \ /pglog/postgresql-2012-08-21* Use my 8 CPUs to parse my 10GB file faster, really faster perl pgbadger -j 8 /pglog/postgresql-9.1-main.log Generate Tsung sessions XML file with select queries only: perl pgbadger -S -o sessions.tsung --prefix '%t [%p]: [%l-1] user=%u,db=%d ' /pglog/postgresql-9.1.log Reporting errors every week by cron job: 30 23 * * 1 /usr/bin/pgbadger -q -w /var/log/postgresql.log -o /var/reports/pg_errors.html Generate report every week using incremental behavior: 0 4 * * 1 /usr/bin/pgbadger -q `find /var/log/ -mtime -7 -name "postgresql.log*"` \ -o /var/reports/pg_errors-`date +%F`.html -l /var/reports/pgbadger_incremental_file.dat This supposes that your log file and HTML report are also rotated every week.
v6.x sample reports
Sample report generated by pbBadger v6.x with incremental mode can be found here.
v5.x sample reports
Sample report generated by pbBadger v5.x with incremental mode can be found here.
v4.x sample reports
The sample report generated by pbBadger v4.x can be found here.
Feature request & Bugs
Please report any bugs, patches, discussion, etc. using the GitHub tools at https://github.com/dalibo/pgbadger/.
If you need new features or you think something is missing, please send your comments and requests. This helps a lot to develop a better/useful tool.
You can also write to: <pgbadger |AT| dalibo |DOT| com> or subscribe to the pgBadger mailing list (firstname.lastname@example.org)
I want to make a donation
Some of you ask me a way to donate to support my all days efforts to build a better tool. You can use the link below and if you need an invoice, just drop me line I will send it back to you.
How to contribute
Dalibo as the maintainer of the project can offer you a high level commercial support.
For a complete list of commercial support near of your place take a look at the PostgreSQL Professional Services page, they all do great job and most of them can help you.