SYNOPSIS

goaccess [-f input-file][-c][-r][-d][-m][-q][-o][-h][...]

DESCRIPTION

goaccess is a free (GPL) real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. GoAccess parses the specified web log file and outputs the data to the X terminal. Features include:

General Statistics:

Number of valid requests, number of invalid requests, time to analyze the data, unique visitors, unique requested files, unique static files (css, ico, jpg, js, swf, gif, png) unique HTTP referrers (URLs), unique 404s (not found), size of the parsed log file, bandwidth consumption.

Unique visitors:

HTTP requests having the same IP, same date and same agent will be considered a unique visit. This includes crawlers.

Requested files

Hit totals are based on total requests. This module will display hits, percent, bandwidth [time served], [protocol] and [method].

Requested static files

Hit totals are based on total requests. Includes files such as: jpg, css, swf, js, gif, png etc. This module will display hits, percent, bandwidth, [time served], [protocol] and [method].

404 or Not Found

Hit totals are based on total requests. This module will display hits, percent, bandwidth, [time served], [protocol] and [method].

Hosts

Hit totals are based on total requests. This module will display hits, percent, [bandwidth, time served]. The expanded module can display extra information such as reverse DNS and country. If -a is enabled, a list of user agents will be displayed by selecting the IP and hitting the return key.

Operating Systems

Hit totals are based on unique visitors. This module will display hits and percent. The expanded module shows all available versions of the parent node.

Browsers

Hit totals are based on unique visitors. This module will display hits and percent. The expanded module shows all available versions of the parent node.

Referrers URLs

The URL where the request came from. Hit totals are based on total requests. This module will display hits and percent.

Referring Sites

This module will display only the host but not the whole URL. The URL where the request came from. Hit totals are based on total requests. This module will display hits and percent.

Keyphrases

This module will report keyphrases used on Google search, Google cache, and Google translate. Hit totals are based on total requests. This module will display hits and percent.

Geo Location

Determines where an IP address is geographically located. It outputs the continent and country. If it's unable to determine the country, location will be marked as unknown.

HTTP Status Codes

The values of the numeric status code to HTTP requests. Hit totals are based on total requests. This module will display hits and percent.

STORAGE

There are three storage options that can be used with GoAccess. Choosing one will depend on your environment and needs.

GLib Hash Tables

By default GoAccess uses GLib Hash Tables. If your dataset can fit in memory, then this will perform fine. It has average memory usage and pretty good performance. For better performance with memory trade-off see Tokyo Cabinet on-memory hash database.

Tokyo Cabinet On-Disk B+ Tree

Use this storage method for large datasets where is not possible to fit everything in memory. The B+ tree database is slower than any of the hash databases since it has to hit the disk. However, using an SSD greatly increases the performance. You may also use this storage method if you need data persistence to quickly load statistics at a later date.

Tokyo Cabinet On-Memory Hash Database

Although this may vary across different systems, in general the on-memory hash database should perform slightly better than GLib Hash Tables.

CONFIGURATION

Multiple options can be used to configure GoAccess. For a complete up-to-date list of configure options, run ./configure --help

--enable-debug

Compile with debugging symbols and turn off compiler optimizations.

--enable-utf8

Compile with wide character support. Ncursesw is required.

--enable-geoip

Compile with GeoLocation support. MaxMind's GeoIP is required.

--enable-tcb=<memhash|btree>

Compile with Tokyo Cabinet storage support. memhash will utilize Tokyo Cabinet's on-memory hash database. btree will utilize Tokyo Cabinet's on-disk B+ Tree database.

--disable-zlib

Disable zlib compression on B+ Tree database.

--disable-bzip

Disable bzip2 compression on B+ Tree database.

OPTIONS

The following options can be supplied via the command line or the long options through the configuration file.

--date-format=<dateformat>

The date_format variable followed by a space, specifies the log format date containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See `man strftime`.

Note that there is no need to use time specifiers since they are not used by GoAccess. It's recommended to use only date specifiers, i.e., %Y-%m-%d.

--log-format=<logformat>

The log_format variable followed by a space or \\t for tab-delimited, specifies the log format string.

Note that if there are spaces within the format, the string needs to be enclosed in double quotes. Inner quotes need to be escaped.

-c --config-dialog

Prompt log/date configuration window on program start.

--color-scheme<1|2>

Choose among color schemes. 1 for the default grey scheme. 2 for the green scheme.

--no-color

Turn off colored output. This is the default output on terminals that do not support colors.

-f --log-file=<logfile>

Specify the path to the input log file. If set in the config file, it will take priority over -f from the command line.

--debug-file=<debugfile>

Send all debug messages to the specified file. Needs to be configured with --enable-debug

--config-file=<configfile>

Specify a custom configuration file to use. If set, it will take priority over the global configuration file (if any).

--no-global-config

Do not load the global configuration file. This directory should normally be /usr/local/etc, unless specified with --sysconfdir=/dir.

-e --exclude-ip=<IP|IP-range>

Exclude one or multiple IPv4/6, includes IP ranges. i.e., 192.168.0.1-192.168.0.10

-a --agent-list

Enable a list of user-agents by host. For faster parsing, do not enable this flag.

-M --http-method

Include HTTP request method if found. This will create a request key containing the request method + the actual request.

-H --http-protocol

Include HTTP request protocol if found. This will create a request key containing the request protocol + the actual request.

-q --no-query-string

Ignore request's query string. i.e., www.google.com/page.htm?query => www.google.com/page.htm

-r --no-term-resolver

Disable IP resolver on terminal output.

-o --output-format=<json|csv>

Write output to stdout given one of the following formats: csv : Comma-separated values (CSV) json : JSON (JavaScript Object Notation)

--real-os

Display real OS names. e.g, Windows XP, Snow Leopard.

--static-file=<extension>

Add static file extension. e.g.: .mp3 Extensions are case sensitive.

--ignore-crawlers

Ignore crawlers.

--no-progress

Disable progress metrics [total requests/requests per second].

-m --with-mouse

Enable mouse support on main dashboard.

-d --with-output-resolver

Enable IP resolver on HTML|JSON output.

-g --std-geoip

Standard GeoIP database for less memory usage.

--geoip-city-data=<geocityfile>

Specify path to GeoIP City database file. i.e., GeoLiteCity.dat. File needs to be downloaded from maxmind.com.

--keep-db-files

Persist parsed data into disk. This should be set to the first dataset prior to use `load-from-disk`. Setting it to false will delete all database files when exiting the program.

Only if configured with --enable-tcb=btree

--load-from-disk

Load previously stored data from disk. Database files need to exist. See keep-db-files.

Only if configured with --enable-tcb=btree

--db-path=<dir>

Path where the on-disk database files are stored. The default value is the /tmp directory.

Only if configured with --enable-tcb=btree

--xmmap=<num>

Set the size in bytes of the extra mapped memory. The default value is 0.

Only if configured with --enable-tcb=btree

--cache-lcnum=<num>

Specifies the maximum number of leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 1024. Setting a larger value will increase speed performance, however, memory consumption will increase. Lower value will decrease memory consumption.

Only if configured with --enable-tcb=btree

--cache-ncnum=<num>

Specifies the maximum number of non-leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 512.

Only if configured with --enable-tcb=btree

--tune-lmemb=<num>

Specifies the number of members in each leaf page. If it is not more than 0, the default value is specified. The default value is 128.

Only if configured with --enable-tcb=btree

--tune-nmemb=<num>

Specifies the number of members in each non-leaf page. If it is not more than 0, the default value is specified. The default value is 256.

Only if configured with --enable-tcb=btree

--tune-bnum=<num>

Specifies the number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 32749. Suggested size of the bucket array is about from 1 to 4 times of the number of all pages to be stored.

Only if configured with --enable-tcb=btree

--compression=<zlib|bz2>

Specifies that each page is compressed with ZLIB|BZ2 encoding.

Only if configured with --enable-tcb=btree

-h --help

The help.

-V --version

Display version information and exit.

-s --storage

Display current storage method. i.e., B+ Tree, Hash.

CUSTOM LOG/DATE FORMAT

GoAccess can parse virtually any web log format.

Predefined options include, Common Log Format (CLF), Combined Log Format (XLF/ELF), including virtual host, Amazon CloudFront (Download Distribution) and W3C format (IIS).

GoAccess allows any custom format string as well.

There are two ways to configure the log format. The easiest is to run GoAccess with -c to prompt a configuration window. Otherwise, it can be configured under ~/.goaccessrc.

date_format

The date_format variable followed by a space, specifies the log format date containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See http://linux.die.net/man/3/strftime

Note that there is no need to use time specifiers since they are not used by GoAccess. It's recommended to use only date specifiers, i.e., %Y-%m-%d.

log_format

The log_format variable followed by a space or \\t , specifies the log format string.

%d

date field matching the date_format variable.

%h

host (the client IP address, either IPv4 or IPv6)

%r

The request line from the client. This requires specific delimiters around the request (as single quotes, double quotes, or anything else) to be parsable. If not, we have to use a combination of special format specifiers as %m %U %H.

%m

The request method.

%U

The URL path requested (including any query string).

%H

The request protocol.

%s

The status code that the server sends back to the client.

%b

The size of the object returned to the client.

%R

The "Referer" HTTP request header.

%u

The user-agent HTTP request header.

%D

The time taken to serve the request, in microseconds.

%T

The time taken to serve the request, in seconds or milliseconds. Note: %D will take priority over %T if both are used.

%^

Ignore this field.

GoAccess requires the following fields:

  • %h a valid IPv4/6

  • %d a valid date

  • %s server status code

  • %r the request

INTERACTIVE MENU

F1 or h

Main help.

F5

Redraw main window.

q

Quit the program, current window or collapse active module

o or ENTER

Expand selected module or open window

0-9 and Shift + 0

Set selected module to active

j

Scroll down within expanded module

k

Scroll up within expanded module

c

Set or change scheme color.

TAB

Forward iteration of modules. Starts from current active module.

SHIFT + TAB

Backward iteration of modules. Starts from current active module.

^ f

Scroll forward one screen within an active module.

^ b

Scroll backward one screen within an active module.

s

Sort options for active module

/

Search across all modules (regex allowed)

n

Find the position of the next occurrence across all modules.

g

Move to the first item or top of screen.

G

Move to the last item or bottom of screen.

EXAMPLES

The simplest and fastest usage would be:

  • # goaccess -f access.log

That will generate an interactive text-only output.

To generate full statistics we can run GoAccess as:

  • # goaccess -f access.log -a

To generate an HTML report:

  • # goaccess -f access.log -a > report.html

To generate a JSON file:

  • # goaccess -f access.log -a -d -o json > report.json

To generate a CSV file:

  • # goaccess -f access.log -o csv > report.csv

The -a flag indicates that we want to process an agent-list for every host parsed.

The -d flag indicates that we want to enable the IP resolver on the HTML | JSON output. (It will take longer time to output since it has to resolve all queries.)

The -c flag will prompt the date and log format configuration window. Only when curses is initialized.

Now if we want to add more flexibility to GoAccess, we can do a series of pipes. For instance:

If we would like to process all access.log.*.gz we can do:

  • # zcat access.log.*.gz | goaccess

OR

  • # zcat -f access.log* | goaccess

Another useful pipe would be filtering dates out of the web log

The following will get all HTTP requests starting on 05/Dec/2010 until the end of the file.

  • # sed -n '/05\/Dec\/2010/,$ p' access.log | goaccess -a

If we want to parse only a certain time-frame from DATE a to DATE b, we can do:

  • sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess -a

Note that this could take longer time to parse depending on the speed of sed.

To exclude a list of virtual hosts you can do the following:

  • grep -v "`cat exclude_vhost_list_file`" vhost_access.log | goaccess

Also, it is worth pointing out that if we want to run GoAccess at lower priority, we can run it as:

  • # nice -n 19 goaccess -f access.log -a

and if you don't want to install it on your server, you can still run it from your local machine:

  • # ssh root@server 'cat /var/log/apache2/access.log' | goaccess -a

NOTES

For now, each active window has a total of 300 items. Eventually this will be customizable.

Piping a log to GoAccess will disable the real-time functionality. This is due to the portability issue on determining the actual size of STDIN. However, a future release *might* include this feature.

BUGS

If you think you have found a bug, please send me an email to [email protected] or use the issue tracker in https://github.com/allinurl/goaccess/issues

AUTHOR

Gerardo Orellana <[email protected]> For more details about it, or new releases, please visit http://goaccess.prosoftcorp.com