Multiple MySQL instances

I plan to install Piwik on a large and special architecture, that is composed of several servers (having their own Apache and MySql database) for load-balancing and disponibility purposes.

But i’m not sure Piwik will allow us to collect request separately as we usualy do, then collect and gather datas regularly.

I’m seen this ticket (but closed) about implementing sharding in Piwik code
http://dev.piwik.org/trac/ticket/386[/url], and this development try: [url=http://dev.piwik.org/trac/ticket/471]http://dev.piwik.org/trac/ticket/471, but its doesn’t seem to be good solutions.

I’m looking at the bulk load feature currently in development: http://dev.piwik.org/trac/ticket/134

For the present, it’s maybe a good way to proceed: allways storing “piwik logs” from several servers, then processing all of them in one database.

Can you share me your opinion on all this. Maybe i’m missing some nice feature ? If no, do you think that the best solution is to wait for the previous development ?

Ticket #134 is what you need. We will log in files on each log server, then parse them and store counters in memory (memcache), then prepare files for mysql which we will load DATA INFILE

This will rock but I still have to implement it. If you can support it financially or with development and of course testing, this would be great

Hi matt

First, thank you for your answer and responsiveness.

We are eventualy in a position to provide some support, but we need precise informations on:
[ul]
[li] appropriateness of this development to our needs:
[/li] We are often processing more than 1 million UV each day. We need to ensure that the bulk load process will be able to gather visit logs collected from several different servers and store all of them in a single database:
[list]
[li] without data losing (are “all” common informations stored in logs ?) or conflict (can you tell me more about the “gathering” step ?)
[/li][li] in a reasonable time: I understand the necessity of LOAD DATA INFILE for inserts. Can you tell me more about the memcache usefulness in the process ?
[/li][/ul]
[li] development planning
[/li][li] kind of support needed: financial could be OK, but we need to know the amount and beneficiaries
[/li][/list]

1 million uniques every day is definitely significant, I don’t know of any user currently using at this scale.

Please see the description of 134 for more info

Memcache is used to maintain counters in memory, rather than going to the database all the time, we will go much less often while importing statistics. Currently a page view in Piwik is an insert (and one update ) or 2 or 3 inserts and 1 or 2 selects.

we will change this so that for each page view and visit, we directly insert the “final state” in the DB, using efficient mysql code.

I can’t say that it will work for 1M daily, in fact it definitely won’t work… unless you have a competent sysadmin able to handle a piwik server farm.

but other piwik users are trying to push the limits and if we work together it will work out.

Thanks for your reply.

We have competent sysadmin to our disposal that is able to handle such a “piwik server farm”.
We can provide financial and development support to this evolution, but we have one constraint: time.
Our customers quickly needs web analytics, but we can’t integrate Piwik in our architecture without this feature.

In fact, we will divert the feature from its original goal: for us, the problem in not a question of performance as the possibility of storing piwik visits from independant servers and gathering all of them subsequently (No problem for us to create scripts for gathering datas, and insert lines - using the feature - in central database).

That’s why we are first worried about the “data conflict” problem. Are you sure that piwik “access logs” can be collected from independant servers and be gathered without data consistency problems ?

If yes, then we are in position to give help to this development, but we need precise delays / planning informations: Can you tell me when do you think this feature can be ready, and / or what kind of help we can provide to encourage / accelerate the development ?

Rafinou,

what is going to be the traffic per day on your Piwik install?

Yes it is possible to gather and aggregate traffic from multiple “servers”… however, not from multiple “mysql” instances. each server will probably log the visits in a temporary log file (a bit like apache log file) and then push them to the master regularly. The master will be the server running mysql. Each server must be able to connect to this master, even though it won’t happen often because the list of websites / URLs / Goals etc. will be cached locally.

After gathering data from all the different servers, we will sort the logs by date and the run them through the code that will prepare the mysql bulk insert.

please contact me at matt@piwik.org for details…

Bonjour Matt,

J’ai cru comprendre que vous étiez français via votre Blog, ce qui est également notre cas :).

Nous vous avons envoyé un message par email à l’adresse que vous m’avez soumis (matt@piwik.org). Pouvez-vous nous répondre dès que cela vous est possible ? L’intégration de web analytics dans notre plateforme doit se faire dans les délais les plus courts. Piwik semble être une solution envisageable, mais des questions subsistent et nous devons nous assurer au plus vite que le produit puisse coller à nos besoins.

En vous remerciant par avance