How to scale Piwik in Amazon AWS

Piwik scales just fine in Amazon AWS.
We just moved from a dedicated hosting environment to Amazon AWS (EC2 + RDS) and performance are good (~1M page view per day). We also upgraded from 1.0 to 1.7.1. and both versions scale. Here is our feedback.

They are different ways to install Piwik in the Cloud:

  • TMDHosting or Arvixe: hardly scale, expensive and few control.
  • BitNami: open sourced but we wanted to switch from apache to nginx.
  • Do it yourself: harder (you need to get into Amazon AWS) but it is cheaper and it scales just fine.

[ Database ]
We installed an RDS instance with the following configuration:

  • MySQL 5.5.12 : Last version available on RDS at the time.
  • InnodDb : Easy to tune. No locks during backups (using: --single-transaction --quick).
  • db.m1.large : It is more than we need.
    RDS gives several options, we disabled everything:
  • No Multi-AZ : It is a performance over availibility and durability choice
  • No update : Updates means downtime. We will do it ourself.
  • No backup : Auto backup needs Multi-AZ. Thus we will backup the database ourself.

[ Web server ]
We installed an EC2 instance with the following configuration:

  • Ubuntu Server 11.10 64bit : Nothing fency
  • m2.xlarge: It is more than we need.
  • Zone: same zone as the database

[ Installation ]
Installing nginx, php and mysql-client:


apt-get install nginx
apt-get install php5-fpm php-apc php5-mysql php5-gd php5-curl php5-cli php5-dbg
apt-get install mysql-client

[ Nginx configuration ]

Creating the site config:


cd /etc/nginx/sites-available/
cp default my-default
cd /etc/nginx/sites-enabled/
rm default
ln -s ../sites-available/my-default

Open nginx config:


nano -w /etc/nginx/nginx.conf

Update the following:


fastcgi_read_timeout 14400; # Allow 4h to archive
fastcgi_buffers 256 4k;
fastcgi_buffer_size 32k;
keepalive_timeout 10 10;

Open the site conf:


nano -w /etc/nginx/sites-available/my-default

Update the following:


server {
        listen   80; ## listen for ipv4; this line is default and implied
        listen   [::]:80 default ipv6only=on; ## listen for ipv6

        root /var/www/piwik;
        index index.html index.htm index.php;

        server_name _; # Replace the '_' by your server http name.

        location / {
                # First attempt to serve request as file, then
                # as directory, then fall back to index.html
                try_files $uri /index.php?$query_string;
        }

        # Pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
        location = /index.php {
                fastcgi_pass unix:/tmp/php5-fpm.sock;
                fastcgi_index index.php;
                include fastcgi_params;
        }
        location = /piwik.php {
                fastcgi_pass unix:/tmp/php5-fpm.sock;
                fastcgi_index index.php;
                include fastcgi_params;
        }

        # Any other attempt to access PHP files redirects to the root.
        location ~* ^.+\.php$ {
           return 302 /;
        }

        # No crawling of this site for bots that obey robots.txt.
        location = /robots.txt {
           return 200 "User-agent: *\nDisallow: /\n";
        }
}

Another config which may be better for your need:
https://github.com/perusio/piwik-nginx

[ Php configuration ]

Open:


nano -w /etc/php5/fpm/php.ini

Update the following:


memory_limit       = 2048M
max_execution_time = 14400 # Allow 4h to archive

Open:


nano -w /etc/php5/fpm/pool.d/www.conf

Update the following:


listen = /tmp/php5-fpm.sock
pm.max_children = 50

[ Piwik installation ]

Insall piwik in /var/www/piwik/:

[ Let’s start ]


/etc/init.d/php5-fpm restart
/etc/init.d/nginx restart

[ Troubleshooting ]

Feel free to ask any questions, I’ll update the post if needed.

Thank you for this great how to. How many pages per day do you handle?

THANKS

I’d love to learn a bit more about this such as how much it costs!

I’d like to take my MySQL database and dump it to S3 to do separate analysis of its data periodically. Am I right in thinking that you kept the data in MySQL?

Hi

We have almost the same setup as you have except we have two EC2 machines under ELB and each machine has nginx, php-fpm,piwik and redis. We are using redis to do not write directly to the database. and we have an RDS as the database. Now we have a concern about data loss. If there will be a downtime due to an upgrade failure or something, how can we recover our lost data? I would appreciate your help if you know of any solution.
I also would like to know how much is the availability of such a setup? How can we make it higher?

Thanks