In my last post http://data.andyburgin.co.uk/post/65706647269/visualising-logstash-apache-data-in-gephi I generated some pretty visualisations of apache webserver logs by extracting data from my logstash elasticsearch server using a python script.

You can now grab the script from https://github.com/andyburgin/es2gefx

The tool queries elasticsearch and retrieves entries matching specified criteria (as nodes), it then identifies edges between the event nodes using defined parameters and finally uses the pygexf library to generate a gexf file for importing into gephi.

I still need to add command line arguments, but the parameters in the code are:

  • host & port - the ip and port number of elasticsearch url
  • starttime & endtime - in format %Y%m%d%H%M%S includes all entries in elasticsearch within these timestamps
  • evttype - the value of the @type field that indicates an elasticsearch entry generated logstash that match the %{COMBINEDAPACHELOG} grok pattern
  • relatedfield & relatetimeout - field and time period to be used for edge detection •verbose - debug messages

It was built using Python 2.7.3 on Debian Wheezy vm, you will need to download the latest https://github.com/paulgirard/pygexf don’t use the easy_install method as this will install 0.2.2 which is missing some of the newer features needed. You just need a to put the files in a gexf folder next to the es2gefx.py file.

The tool has limitations, there is a limit to what the xml libraries underlying pygexf can do. I’ve been generation 25000 nodes and 18000 edges without problems.

BTW It’s my first “real” stab at Python development, so I encourage everyone to take a look and feedback (nicely).