High traffic Piwik servers - database usage?

Posted by greenone83 

Advanced
March 16, 2011 10:17AM
This posts contains a discussion and real Piwik power users Database sizes and number of visits and websites tracked.

We would really appreciate to learn about how many Piwik power users there are! If you use Piwik to track more than 100k pages per day, please reply to this post with your DB size and traffic.

See also FAQs:
- How do I setup Piwik for a high traffic website?
- Does Piwik work in a load balanced environment?

Enjoy!!



Edited 7 time(s). Last edit at 12/01/2011 05:00AM by matt.
matt [ # ]
March 16, 2011 11:46PM
That's a good question and we'd like to show such numbers on the piwik website upfront.

On demo.piwik.org here are the numbers:
  • First important to run the archive.sh script to ensure all reports are processed and the stats below are accurate
  • Total pages ~ 10 millions (number easy to find via "Database usage" page, look at piwik_log_link_visit_action table "rows count"winking smiley
  • Total visits ~ 3.2 millions (piwik_log_visit "rows count" in the same page)
  • Websites tracked = 5
  • Total DB size (logs + pre-processed reports) after 4 years: 3.2 GB
    (number found at bottom of the DB Usage page)

Average bytes per page view: 320 bytes

So for example if after 2 years, you have tracked 10,000 pages per day on average, you will have a DB size of approximately
10000 pages * 365 days * 2 years * 320 bytes / 1000000000 = 2.33 Gb

If you track 1000 pages per day, after 2 years and keeping all data, DB size will be approx 250 MB.



Edited 3 time(s). Last edit at 04/26/2011 09:00AM by matt.
March 17, 2011 10:26AM
Attention, all numbers in my list are on innodb tables and these are using more space than the regular myisam tables.

Data collection started in the end of 2009.

(numbers taken from the piwik interface as I'm pruning the piwik_log_visit and piwik_log_link_visit_action tables after 30 days, so the db size is mostly processed data)
Visits: 6.362 Millions
Actions: 8.992 Millions

Websites tracked: 5950

Total DB Size: 7.8 GB

Edit - archiving times:
archive.sh is run once a day and takes 225 minutes
real	225m18.390s

Though I guess that most of the time is used because of the many different websites.

Average per action: 930 bytes.



Thomas Seifert
Mysnip Solutions
Managed PIWIK Hosting



Edited 2 time(s). Last edit at 03/18/2011 11:21AM by Thomas Seifert.
HellR [ # ]
March 17, 2011 10:42AM
Here are datas for my Piwik installation :
  • Total Visits: ~3 millions
  • Total Actions: ~9 millions
  • Piwik Sites: 58
  • Total DB Size: ~8 GB (InnoDB type, without pruning tables)



Edited 1 time(s). Last edit at 03/17/2011 10:43AM by HellR.
March 17, 2011 11:26PM
Total Visits: ~350.000
Total Actions: ~1 million
Piwik Sites: 5
Total DB Size: ~500MB

Though I have to say that's for the time 10:00-23:00 for today, as we've only just deployed Piwik on our sites this morning.
matt [ # ]
March 18, 2011 02:56AM
Markus, I'm not sure if Piwik will work fine for you, 1 M action in 12 hours is very much and it might reach mysql limits. Please let us know how it is going for you. In any case, the ticket [dev.piwik.org] will be very very useful for your use case (hopefully we will be working on this the next month)
SH28 [ # ]
March 18, 2011 10:09AM
Collecting since 05/2009

Total visits: 14,5 millions
Total actions: 67,6 millions
Websites tracked: 124
Total DB size: 16,4 GB
I run the archive.sh script every hour. It takes about 15 minutes.



Edited 2 time(s). Last edit at 04/20/2011 08:03AM by SH28.
matt [ # ]
March 18, 2011 10:57AM
Summary of the problems of Piwik at large scale

Piwik works well for up to 100k-500k daily pageviews. If your website is around these values, you start might experiencing some performance issues, listed below.


EDIT Nov 2011: 1) and 2) below are now dealt with, as of Piwik 1.7, by using the new misc/cron/archive.php script instead of archive.sh

1) archive.sh memory usage reaching php error: it will probably first throw the infamous "Archiving memory exhausted error". See discussion and possible solutions in [dev.piwik.org]

2) archive.sh execution time. It is possible that we hit Mysql limits (or some other limitations/ bugs) in the system which result in very long archive.sh execution time. For example, Mysql could behave badly when the log table becomes very large and INDEXes bloated and we are trying to do a group by on a 15M row set... well it will take more than 1s smiling smiley See ticket

3) High server load when Piwik is tracking data. Apparently Mysql Innodb (as you point out) seems relatively good about this, but you could hit some limit there since I've never seen Piwik with this much traffic.
Solution is already planned for next few months: write a queue and then Bulk import requests from the queue in mysql

4) Reaching Mysql performance threshold for large scale analytics
At some point, if we improve all the rest, there will still be the "Mysql" factor that will prevent good performance for largest piwik servers (millions of pages per day). We will investigate alternatives such as: InfiniDB, MongoDB, HBase, etc



Edited 4 time(s). Last edit at 12/01/2011 05:26AM by matt.
zemunack [ # ]
March 23, 2011 02:22PM
Server1 (running from 08.2010)
Total Pages: 281.7 M
Total visits: 46.8 M
Websites tracked: 1
Total DB size: 42 GB

Server2 (running from 10.2010)
Total Pages: 431.4 M
Total visits: 37.8 M
Websites tracked: 5
Total DB size: 80.1 GB
pdfforge [ # ]
March 24, 2011 11:51AM
We have started using piwik in 2009-04 (with one site)

Total pages: 35.389.673
Total visits: 21.277.954
Database size: 17.4 GB

The database size includes all archived reports etc.

Thearchiving process takes some seconds (mostly 2-5, sometimes up to 30)

kind regards,
Philip



http://www.pdfforge.org



Edited 1 time(s). Last edit at 03/24/2011 04:31PM by pdfforge.
bottion [ # ]
March 24, 2011 04:10PM
I have started using piwik in 2010-09

Total pages: 19.563.898
Total visits: 7.168.972
Database size: 10 GB

Archiving time: (how long does it take for you to run archive.sh script every hour): 1 minute or less

Number of websites tracked: 1 site tracked which are analyzed separately 16 directories. Then we can consider that there are 16 sites

Best regards.

Ivan Bottion
pdfforge [ # ]
March 28, 2011 11:17AM
@Gofer: depending on the number of visitors this can be a relatively easy task. Of course it is more work than just including the link from GA ;-)

We had some internal resistance with missing features in the beginning as well, but they are overcome as piwik has caught up in many places and also has some other advantages. Sepcially you should have a look at http://www.desktop-web-analytics.com/, as it does very well with piwik.

kind regards,
Philip



http://www.pdfforge.org
Fil [ # ]
April 01, 2011 08:54AM
actions = 3552343
visits = 2219318
db size on disk: 1.2Gb
db backup size (gz compressed): 424Mb
matt [ # ]
April 05, 2011 10:26AM
To all high traffic Piwik users, please try the latest version

It contains at least 3 performance improvements:
* Faster Tracking (we fixed an issue that a query wasn't using an INDEX) so now Tracking queries should always reply within a few milliseconds
* Live! should also now work again for high traffic Piwik servers. It was broken in 1.2 release, but it seems to be as fast as ever now!
* Faster Archiving: we now bulk INSERT data after archiving using LOAD DATA INFILE which results in slightly faster archiving

Please try it and report if you experience better performance.



Edited 2 time(s). Last edit at 05/04/2011 10:38PM by matt.
Cyril [ # ]
April 11, 2011 05:10PM
Since January 2011:

Total Pages: 261 millions
Total visits: 19.3 millions
Websites tracked: 18,730 (many are empty)
Total DB size: 29 GB

Running Piwik 1.1, upgrading to 1.2.5 as soon as it's out.
April 13, 2011 09:05PM
Current Stats
version 1.0 with 1.2.1 in testing
Tracked websites = 79

Monthly piwik_archive_blob's go back to 01/2009
piwik_log_visit table 13.3 M rows 3.8 Gb data 674 Mb index 4.5 Gb total
Piwik tables total =10.2 G
xcaliburs [ # ]
April 20, 2011 06:03AM
I have started using piwik in January, 2011 but only started tracking all sites in March.

Piwik version: 1.2
Websites tracked: 602
Total DB size: 30.4 GB



Edited 1 time(s). Last edit at 05/04/2011 10:37PM by matt.
May 23, 2011 01:39PM
Total Pages: 33,1 M
Total Visits: 5,5 M
DB-Size: 9,1 GB

MySQL db seems to fine this far (used tuning-primer.sh):
You have 13 out of 461423965 that take longer than 10.000000 sec. to complete

Piwik Version: 1.4. (always up 2 date)
pkgman [ # ]
July 06, 2011 02:31PM
We want to use Piwik for a Governament Agency, but reading this post seems that there are several problems with large scale monitoring.

These are actual data managed by the actual business solution:
Monitored websites: 100
Daily page views: 145.000
Daily visits: 22.000
Daily visitors: 11.000

Also they use IIS log files to read data from.

We want to know if we can replace the business solution with Piwik and, if yes, wich configuration we need to use (webservers, storages, etc.)

Thanks winking smiley



Edited 1 time(s). Last edit at 12/01/2011 04:36AM by matt.
matt [ # ]
July 07, 2011 12:29AM
150k page views should be manageable with a standard dedicated server. There are many Piwik users tracking around this without trouble, but you will definitely need a dedicated server to be safe.



Edited 1 time(s). Last edit at 07/15/2011 11:50PM by matt.
leoaxet [ # ]
July 15, 2011 12:32PM
Hi all,
I'm in the same situation of PKGMAN but with more datas to manage.

My detail:
Monitored websites: +100
Daily page views: 490.388,89
Daily visits : 86.269,44
Daily visitors: 45.113,61

wich kind of configuration do you recommend ?
Is a single dedicated server with standard piwik configuration sufficient to work without problem?

Our actual server has the following feature:
- 1 processor Quad Core
- 8 GB RAM DDR3
- HD SAS 15k rpm - RAID 5
do you think that we should change this server with a more performant one, or a cluster solution?

Any suggestion is welcome.

Tahnks for the aid.



Edited 1 time(s). Last edit at 07/15/2011 12:34PM by leoaxet.
matt [ # ]
July 15, 2011 05:42PM
leoaxet, I think that Piwik should work on your server since it is already top end. Please make sure you follow recommendations in: [piwik.org]

If you have any problem, please email me at matt att piwik.org and I can assist smiling smiley
August 21, 2011 01:22AM
My installation:

Total Actions: 1 million
Total Visits: 370.000
Websites tracked: 760
Total DB Size: 500MB





Mark Alderson



Edited 1 time(s). Last edit at 08/21/2011 01:23AM by Mark Alderson.
JMETERX [ # ]
August 22, 2011 10:52PM
Currently,

Total visits: 10mil
Total Actions: 50mil
Websites tracked: 7
Total DB size: 8GB

Many of these websites Started tracking in March of 2011

Plans of adding an 8th website which will receive close to 70M hits/actions per month

Currently testing architectures, strengths, and weaknesses
August 28, 2011 03:33AM
Hi all.
We want to use Piwik for a Manage of University with Large Database and high load, but reading this post seems that there are several problems with large scale monitoring.

These are actual data managed by the actual business solution:
Monitored websites: 150
Daily page views: 200.000
Daily visits: 100.000
Daily visitors: 25.000

Also they use IIS log files to read data from.

We want to know if we can replace the business solution with Piwik and, if yes, wich configuration we need to use (webservers, storages, etc.)

Thanks
tompal [ # ]
November 27, 2011 09:00PM
First of all for a high-traffic Piwik-system (with enough RAM) it`s absolute neccessary to use an PHP-Cache like Eaccerlator or APC. Also use the cronjob to clean your Database, don`t use the feature at the webfrontend...

I`ve also make some good experiences with InnoDB instead of the standard MyISAM-table-type of MySQL, cause with InnoDB you don`t have any locking as at MyISAM.
BUT: If your server runs with SSDs it makes no change what type you use. My server had 8 SSDs @ RAID10 at the begining, there the performance was perfect. This setup was faulty, so it was changed to SAS-HDDs, after this the performance went down.


Now some words to my setup:

I can`t exactly say since when piwik is running stable cause the server crashed many times as said before...
DB Size is 10GB. But this isn`t a problem if the table-cleanup-cron is setup correctly.



Out complete server-Net which only delivers Webpages is creating 30TB traffic per month.

mysql> SELECT COUNT(*) FROM piwik_log_link_visit_action;
+-----------+
| COUNT(*)  |
+-----------+
| 215886125 |
+-----------+


mysql> SELECT COUNT(*) FROM piwik_log_visit;
+----------+
| COUNT(*) |
+----------+
| 26695846 |
+----------+
1 row in set (0.09 sec)

And this is our server (only piwik with his DB is running on this):

analytics:~ # free -m
             total       used       free     shared    buffers     cached
Mem:         64432      49550      14881          0        584      24893
-/+ buffers/cache:      24073      40359
Swap:        10244        270       9973


analytics:~ # cat /proc/cpuinfo
......
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           E5472  @ 3.00GHz
stepping        : 6
cpu MHz         : 2999.852
....


8 x SEAGATE ST973452SS (RAID10) @ LSI MegaSAS 9260

Traffic:
Can`t say it exactly, cause the backup is also included in this stat:
          RX bytes:837060327167 (798282.9 Mb)  TX bytes:1686983246572 (1608832.5 Mb)
          8:56pm  up 95 days 11:29,  1 user,  load average: 26.21, 21.64, 32.00
Cool, at the moment my sites have many visits smiling smiley

If someone want`s to see the System-Load over one Year, just ask...



@PR0FESSI0NAL:
If you want we can get in touch via e-mail, as i`ve said i`ve got some experience in this nice kind of application-hosting... I`m not associated with any Hardware-OEM or some sort of this. Just a fat piwik-site (in my opinion) and some other fat solutions running.



Edited 4 time(s). Last edit at 11/27/2011 11:16PM by tompal.
matt [ # ]
November 28, 2011 02:07AM
Excellent to see you here tompal, will you break the 1 Billion pages tracked with Piwik??

If you can achieve such amount of tracking with normal hard disks, normal Mysql and a Piwik without hacks, I'm impressed, and this is great news!

1) Could you post a screenshot of all tables row count/sizes ? I would love to see the data reparition on such a high traffic setup. Maybe this is private, you know my email.
2) What does this query return: mysql> SELECT COUNT(*) FROM piwik_log_action;
3) What is your exact server setup to produce the screenshot above? Not sure if "SAS-HDDs" are normal hdd? I looked up the hard disk which don't seem SSD? [www.seagate.com]
4) how long does the archiving job takes to run every day or every hour?"
> If someone want`s to see the System-Load over one Year, just ask...
Over 1 month / 1 week would be great smiling smiley

Thanks!!
tompal [ # ]
November 28, 2011 12:49PM
Hey Matt,

of course i will break them.

I`ve done no hacks to the Piwik-source...

1)
mysql> SHOW TABLE STATUS FROM analytics;
+-------------------------------+--------+---------+-----------+----------------+-------------+-----------------+--------------+-----------+----------------+
| Name                          | Engine | Version |  Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
+-------------------------------+--------+---------+-----------+----------------+-------------+-----------------+--------------+-----------+----------------+
| piwik_access                  | MyISAM |      10 |        59 |             24 |        1424 | 281474976710655 |         3072 |         0 |           NULL |
| piwik_archive_blob_2010_01    | MyISAM |      10 |       150 |             72 |       10900 | 281474976710655 |         9216 |         0 |           NULL |
| piwik_archive_blob_2010_04    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_05    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_06    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_07    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_08    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_09    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_10    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_11    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2010_12    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2011_01    | MyISAM |      10 |     56759 |            323 |    18348716 | 281474976710655 |      1624064 |         0 |           NULL |
| piwik_archive_blob_2011_02    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_archive_blob_2011_03    | MyISAM |      10 |     17136 |            512 |     8780052 | 281474976710655 |       543744 |         0 |           NULL |
| piwik_archive_blob_2011_04    | MyISAM |      10 |    115728 |            446 |    51645960 | 281474976710655 |      3517440 |         0 |           NULL |
| piwik_archive_blob_2011_05    | MyISAM |      10 |    257690 |            369 |    95334972 | 281474976710655 |      7501824 |         0 |           NULL |
| piwik_archive_blob_2011_06    | MyISAM |      10 |    399993 |            318 |   127356088 | 281474976710655 |     11375616 |         0 |           NULL |
| piwik_archive_blob_2011_07    | MyISAM |      10 |    424127 |            306 |   130037956 | 281474976710655 |     11993088 |         0 |           NULL |
| piwik_archive_blob_2011_08    | MyISAM |      10 |    462804 |            328 |   152242184 | 281474976710655 |     16097280 |         0 |           NULL |
| piwik_archive_blob_2011_09    | MyISAM |      10 |     35915 |            342 |    12313328 | 281474976710655 |      1330176 |         0 |           NULL |
| piwik_archive_blob_2011_10    | MyISAM |      10 |    211355 |            368 |    77829352 | 281474976710655 |      6288384 |         0 |           NULL |
| piwik_archive_blob_2011_11    | MyISAM |      10 |    228943 |            371 |    84967216 | 281474976710655 |      6769664 |         0 |           NULL |
| piwik_archive_numeric_2010_01 | MyISAM |      10 |       314 |             46 |       14624 | 281474976710655 |        43008 |         0 |           NULL |
| piwik_archive_numeric_2010_04 | MyISAM |      10 |       347 |             40 |       13880 | 281474976710655 |        32768 |         0 |           NULL |
| piwik_archive_numeric_2010_05 | MyISAM |      10 |       711 |             40 |       28440 | 281474976710655 |        53248 |         0 |           NULL |
| piwik_archive_numeric_2010_06 | MyISAM |      10 |       700 |             40 |       28000 | 281474976710655 |        53248 |         0 |           NULL |
| piwik_archive_numeric_2010_07 | MyISAM |      10 |       720 |             40 |       28800 | 281474976710655 |        54272 |         0 |           NULL |
| piwik_archive_numeric_2010_08 | MyISAM |      10 |       740 |             40 |       29600 | 281474976710655 |        55296 |         0 |           NULL |
| piwik_archive_numeric_2010_09 | MyISAM |      10 |       713 |             40 |       28520 | 281474976710655 |        53248 |         0 |           NULL |
| piwik_archive_numeric_2010_10 | MyISAM |      10 |       756 |             40 |       30240 | 281474976710655 |        56320 |         0 |           NULL |
| piwik_archive_numeric_2010_11 | MyISAM |      10 |       757 |             40 |       30280 | 281474976710655 |        56320 |         0 |           NULL |
| piwik_archive_numeric_2010_12 | MyISAM |      10 |       789 |             40 |       31560 | 281474976710655 |        58368 |         0 |           NULL |
| piwik_archive_numeric_2011_01 | MyISAM |      10 |      1265 |             44 |       56152 | 281474976710655 |        95232 |         0 |           NULL |
| piwik_archive_numeric_2011_02 | MyISAM |      10 |       817 |             40 |       32680 | 281474976710655 |        59392 |         0 |           NULL |
| piwik_archive_numeric_2011_03 | MyISAM |      10 |      3265 |             50 |      164688 | 281474976710655 |       246784 |         0 |           NULL |
| piwik_archive_numeric_2011_04 | MyISAM |      10 |     13378 |             53 |      717732 | 281474976710655 |      1025024 |         0 |           NULL |
| piwik_archive_numeric_2011_05 | MyISAM |      10 |     15305 |             53 |      824628 | 281474976710655 |      1172480 |         0 |           NULL |
| piwik_archive_numeric_2011_06 | MyISAM |      10 |     15150 |             53 |      816788 | 281474976710655 |      1163264 |         0 |           NULL |
| piwik_archive_numeric_2011_07 | MyISAM |      10 |     15143 |             53 |      814720 | 281474976710655 |      1160192 |         0 |           NULL |
| piwik_archive_numeric_2011_08 | MyISAM |      10 |     16115 |             53 |      865968 | 281474976710655 |      1579008 |         0 |           NULL |
| piwik_archive_numeric_2011_09 | MyISAM |      10 |      2601 |             49 |      129156 | 281474976710655 |       244736 |         0 |           NULL |
| piwik_archive_numeric_2011_10 | MyISAM |      10 |     10971 |             53 |      585224 | 281474976710655 |       873472 |         0 |           NULL |
| piwik_archive_numeric_2011_11 | MyISAM |      10 |     13733 |             53 |      739000 | 281474976710655 |      1053696 |         0 |           NULL |
| piwik_goal                    | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_log_action              | MyISAM |      10 |   5317721 |             99 |   529694916 | 281474976710655 |    147832832 |         0 |        5317722 |
| piwik_log_conversion          | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_log_conversion_item     | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_log_link_visit_action   | MyISAM |      10 | 219105115 |             66 | 14570043584 | 281474976710655 |   8588289024 |         0 |      219605116 |
| piwik_log_profiling           | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
| piwik_log_visit               | MyISAM |      10 |  26654859 |            186 |  4980708308 | 281474976710655 |   1538563072 |         0 |       27154860 |
| piwik_logger_api_call         | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |              1 |
| piwik_logger_error            | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |              1 |
| piwik_logger_exception        | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |              1 |
| piwik_logger_message          | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |              1 |
| piwik_option                  | MyISAM |      10 |        67 |             46 |        3156 | 281474976710655 |         6144 |        44 |           NULL |
| piwik_pdf                     | MyISAM |      10 |         2 |            272 |         544 | 281474976710655 |         2048 |         0 |              3 |
| piwik_session                 | MyISAM |      10 |       121 |            291 |      381020 | 281474976710655 |        81920 |    345796 |           NULL |
| piwik_site                    | MyISAM |      10 |        26 |             68 |        1780 | 281474976710655 |         2048 |         0 |             28 |
| piwik_site_url                | MyISAM |      10 |        15 |             34 |         512 | 281474976710655 |         5120 |         0 |           NULL |
| piwik_user                    | MyISAM |      10 |         9 |            112 |        1012 | 281474976710655 |         4096 |         0 |           NULL |
| piwik_user_dashboard          | MyISAM |      10 |         5 |           2020 |       10104 | 281474976710655 |         3072 |         0 |           NULL |
| piwik_user_language           | MyISAM |      10 |         0 |              0 |           0 | 281474976710655 |         1024 |         0 |           NULL |
+-------------------------------+--------+---------+-----------+----------------+-------------+-----------------+--------------+-----------+----------------+
62 rows in set (0.00 sec)

2)
mysql> SELECT COUNT(*) FROM piwik_log_action;
+----------+
| COUNT(*) |
+----------+
|  5318820 |
+----------+

3) The above posted hardware. ST973452SS are SAS-(SATA with Enterprise-features)-HDDs with 15k RPM. First the server was equipped with 8 SSDs at RAID10, but after this caused some problems the SSDs were replaced by the Seagates.

4) Puuh, good question... I will have a look at this this week winking smiley
matt [ # ]
November 29, 2011 12:01AM
1) Impressive :-)

219105115 page views > Do you purge old logs? What is the oldest date in the piwiK_log_link_visit_action table?

2) Site of piwik_log_action is huge as well... 5M records in the URL lookup table. We should definitely find a way to purge this one.

3) 15k RPM sounds good!

Do you use archive.sh or the new archive.php (in trunk or 1.7-b1)? What version of Piwik did you start using?
matt [ # ]
November 29, 2011 11:49PM
I created a new ticket for the request Delete old unused records from piwik_log_action

Also new FAQ: - How do I setup Piwik for a high traffic website?



Edited 2 time(s). Last edit at 12/01/2011 05:00AM by matt.
nikosch86 [ # ]
December 07, 2011 10:47PM
Tracking since May 2009
Tracking 1 site
Actions: 127M
Visits: 2.4M
DB Size: 13GB

archive.sh running hourly
January 01, 2012 01:20PM
Total Visits: ~350.000
Total Actions: ~>1 million
Piwik Sites: ~1
Total DB Size: ~500MB



David Daly




Edited 3 time(s). Last edit at 01/01/2012 01:24PM by david daly.
gigius [ # ]
February 16, 2012 02:39PM
Hi all,

I 'm thinking about upgrade our awstat installation to piwik in the following months. After reading this thread, I'm not sure if I need to load balance the frontal piwik trackind server, or just one will be enought

We actually have 3 main sites, having 800K visits and 4.5Mio pages served/month each. A fourth one will appear shortly with more or less the same charge.

Trackins servers will be virtualized rhel machines (up to 4vcpus and 16gb ram each)
DB hosts will be the same vmachine class, discussions are wide open to use a physical plateforme if necessary.

User actually consulting tracking reports are max 20

Do you have some architectural suggestions to starts with?

Thanks

Gigius
matt [ # ]
February 16, 2012 11:17PM
Gigius, 20M pages per month, should be possible with Piwik. It is hard to say what server config to use etc. but check out this guide which contain all tips necessary: [piwik.org]



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

February 17, 2012 11:08AM
Daily pages: 1,1M
Databse: 23GB
Website: 1
xaviclave [ # ]
March 16, 2012 02:45PM
Yesterday I discovered Piwik today I am implanting it in a site that contains more than 20,000 websites and different users.
For now it works! and apparently consumes very few resources in about 6 hours and I have 75.3 Mb in the database.
I'll tell you.
March 24, 2012 04:02PM
My installation:

Total Actions: 0.8 million
Total Visits: 210.000
Total Visits including bots:450,000
Websites tracked: 2
Total DB Size: 2.6GB



http://xilanchem.com
April 27, 2012 07:17AM
Hello everyone, My installation:

Total Actions: 0.8 million
Total Visits: 510.000
Total Visits including bots:470,000
Websites tracked: 2
Total DB Size: 3.6GB

Thanks smiling smiley
alexmc [ # ]
April 27, 2012 12:35PM
I'm thinking of using Piwik for websites which may reach more than 5 million page requests a day. I am getting the impression from this thread that MySQL just can't cope with that amount of data.

Is that right? Anybody want to discuss it with me (possibly for future professional work - but not yet)
May 18, 2012 05:01AM
It looks like nightmare.
We have 22 websites.
MySQL cost 43GB for just 7 days.

It takes half of hour to archive each day.
We try to make the cron do more regular but it seem our database can't handle archiving each hour. I have some idea about partition log_visit and log_link_visit_action and change the code for just using one day data to calculate some information we need, like pageview, user, page title ...
Is there any suggestion? Thanks.



http://forum.ubuntu-vn.org - ubuntu linux for human beings
matt [ # ]
May 18, 2012 05:10AM
With so much data it's better to archive only once or twice a day, which will also decrease the DB size.

There is also a bug in Piwik causing that some temporary data is not correctly deleted, so when we fix this soon you will notice DB size improvements: [dev.piwik.org]

In the next release there will be a feature to delete old reports as welll so for example you could delete daily reports after a few weeks.

How many visits / pages per day do you track on these 22 websites?
Maybe try to remove the biggest websites until the situation is stable and the bug fixed?

Finally int he next release 1.7.2 there will be much improved database usage screen which will tell us exactly what reports are using more data so we can help you debug more.
hopefully piwik will work for you eventually!



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

May 18, 2012 05:50AM
It's about 4,5M visit and more than 28M pageview a day. Our goal is make this number double the finger smiley
- We already have an simple python script for bulk import request from scribe. This script is customized from pycurl (http://pycurl.cvs.sourceforge.net/pycurl/pycurl/examples/retriever-multi.py?view=markup). Everything is fine until the archive process.
- It's not about the archive table, it's about the raw log table.
- The problem come from 200 million record from log_link_visit_action table. Is it necessary to keep this table after daily archive?
- Another question can we is deleting log_visit weekly but still get correct unique vistor?

I think we should create a raw data buffer table for realtime report , and re-calculate it with normal archive script once a day.

We hope to share our knowledge to make a better world with floss.



http://forum.ubuntu-vn.org - ubuntu linux for human beings
matt [ # ]
May 21, 2012 09:03PM
afterlastangel,

> We already have an simple python script for bulk import request from scribe.
If possible it would be good to see the script maybe oither users would find it useful smiling smiley

> Is it necessary to keep this table after daily archive?
You can delete the log_link_visit_action table after daily archive

> can we is deleting log_visit weekly but still get correct unique vistor?
If you delete logs older than 7 days you will still get weekly unique visitors, but not monthly or yearly.



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

matt [ # ]
June 03, 2012 05:23AM
In the new Piwik release there is a new tool very useful for large traffic Piwik.

Enable DBStats plugin, click on "Database usage" to access the full database usage report!



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

useless [ # ]
June 06, 2012 10:27AM
Quote
afterlastangel
It's about 4,5M visit and more than 28M pageview a day. Our goal is make this number double the finger smiley

If piwik can handle up to 500k pageviews per day, how did you manage to handle 28M per day? I have to manage about 20M pageviews per day and for each one of them I need to track 5 custom variables. As far as I understood from this forum, this is quite difficult.



Edited 1 time(s). Last edit at 06/06/2012 11:12AM by useless.
baysao [ # ]
June 12, 2012 06:15PM
Thanks Matt for great improvement in new version 1.8.2. Our website now tracking 39M pageviews/22 websites everyday with 4 servers (1 gateway, 2 importer/reporter, 1 SQL Dcool smiley.
@useless: Let me share with you some ideas, maybe you may find it helpful.
- In Web gateway, replace web gateway with our built-in web service + scribed to received logs
- In importer/reported, run job queue to incremental import logs to Piwik
- Run archive and rotate table piwik_log_link_visit every hour for minimize row counts.
That's it.
matt [ # ]
June 17, 2012 09:00PM
@baysao fantastic to hear.
- "scribed to received logs " do you import the web server logs, or do you import log of requests to piwik.php?
- if you rotate table piwik_log_link_visit then you will lose data, are you sure it's OK to rotate these?

do you run the full archiving on the 39M pages per day?



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

baysao [ # ]
June 18, 2012 07:10AM
Hi Matt,
- "scribed to received logs " i mean write all requests sent by js script to disk through https://github.com/facebook/scribe and have job queue to import log data on disk to piwik.php
- I have problem when rotate table piwik_log_link_visit hourly. The report number visit on daily not correct, so i try rotate table piwik_log_link_visit daily. Now report daily is OK. Maybe report visit on weekly, monthly not correct if i reserve log only 1 day . Now system seem stable with strategy archive piwik_log_link_visit hourly and archive piwik_log_link_visit daily + rotate table piwik_log_link_visit.



Edited 2 time(s). Last edit at 06/18/2012 07:13AM by baysao.
matt [ # ]
June 19, 2012 03:42PM
- "scribed to received logs " i mean write all requests sent by js script to disk through [github.com] and have job queue to import log data on disk to piwik.php


This sounds VERY interesting. Would you please consider releasing this How to by email to us or publicly on the piwik forums? I'm sure many users will love to do the same!



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

baysao [ # ]
June 20, 2012 04:05PM
Hi Matt,
Very pleased to shared it.
Attachments:
open | download - piwik_archicture.pdf (28.8 KB)
matt [ # ]
June 21, 2012 11:55AM
Thanks for the diagram!
Wuold you be interested or willing to share the scripts used for Scribed/inotififywait and the python importer?

Ideally we would really like to put such scripts in the official piwik package. Please contact me at matt@piwik.org if youre keen to work together smiling smiley



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

July 09, 2012 01:25PM
My installation:

Total Actions: 0.9 million
Total Visits: 217.000
Total Visits including bots:480,000
Websites tracked: 2
Total DB Size: 2.9GB



Newbie Piwiker smiling smiley http://www.tektasim.com.tr



Edited 2 time(s). Last edit at 07/09/2012 01:31PM by Pırlanta.
karmakrif [ # ]
July 11, 2012 01:51PM
Thank you,
It's really informative.
mg_kh [ # ]
July 24, 2012 10:05PM
We are running Piwik 1.6 since 6 month now, THANK YOU to the dev team for the excelent job!

We run the web server on CENTOS 6.2 using nginx and php-cgi 5.3.3 (10 childs) on a VM configured with 8GB ram and 4cores, the database is on a windows 2008 server, mysql 5.5 64bit, 16GB RAM, 2 CPU x4 core 3GHz.

We track around 5 million visit per month; average of 25 million page views over 5 main sites and currently our Database size is 29GB.

Database CPU usage is not noticable high mysqld process shows around 2.7 to 3GB RAM usage, VM load is around 2, CPU peaks at 60%.

The main problem we are facing is that archive.sh is crashing regulary , at least every second time it runs, but only from cron, running it on the command line so far always worked smiling smiley We have set the php memory limit to 1GB and we use apc but still the archiver ... sad smiley

Second worry is the backup of the database, mysqldump takes 30min and makes the server very slow and sluggish. Anyone with any good ideas for a better backup solution then mysqldump?

Regards Mike
matt [ # ]
July 25, 2012 12:22AM
Mike, thanks for your comment here - very appreciated!

My first recommendation would be to update to 1.8.2 and use the new optimized archive.php script (the .sh still work but is slower since archive.php only runs the required process and does not call unncessarily the API) - could you try the update and then run archive.php?

Also if you have 8G of ram, you can safely increase php limit to 2G or 3G (even though we all agree this is ridiculous- we have some plans to make it better)



Cheers,
Matt
Piwik founder

Piwik FAQ - Piwik Help - before posting a new topic
Stay tuned on the Piwik Blog. You may follow me on twitter & on github

donny [ # ]
August 30, 2012 11:32AM
Thank you,
It's really informative.



Edited 2 time(s). Last edit at 08/30/2012 11:38AM by donny.
Courtney [ # ]
September 02, 2012 11:12AM
Just wanted to stop by and thank everyone contributing to this thread. Found everything I needed to wade through my load balancing issues. You guys rock!



Edited 1 time(s). Last edit at 09/02/2012 11:14AM by Courtney.
stblink [ # ]
September 10, 2012 10:37PM
Hi!

I'm here to place a question about the traffic you all have vs the fact to use google analytics.
You are free to delete my post as long as someone please answer this question.

1 - Why to use server memory and HD's (in somewhat VERY LARGE amounts as i have seen) when Google analytics is free of use, free hd space and memory?

I started using PIWIK today because i know from what i have seen that the PIWIK interface is far more user-friendly then Google Analytics. But i keep them both because of Google Keywords tracking not always liable in PIWIK (Google fault, i know).

So is it there any other advantage to the fact that you may require a good server to run PIWIK other then the fact that you have complete access to the database?

Regards,
Peter
matt [ # ]
September 11, 2012 09:02AM
The advantage of Piwik are among others: real time reports, visitor level details, access to the full raw data, keep control of it and data stays confidential, and more: [piwik.org]

Also, because Piwik is an open source/Free software that you host yourself, you can understand how it works, and it will always function as long as you keep the server running and have configured backups of the Mysql database.

Enjoy ;-)
mg_kh [ # ]
September 17, 2012 12:11PM
Hello Matt,

just got around to some test to upgrade piwik to 1.8.4. The new setup is running Centos 6.3, php 5.3 using php-fpm for php processing and mysql 5.5 (innodb), the system has a iCore 7 - 2600, 16GB RAM and a 256GB SSD.

piwik_log_link_visit_action = 180 million record
piwik_log_visit = 30 million record
innodb size on disk ~ 35GB

Regularly delete old visitor logs from the database is enabled and set to 180 days.

Here is my feedback; currently only tested the db reload and update, not yet in live production. Biggest issue before upgrading the stock Centos Mysql 5.1 package was recovery time... The piwik_log_link_visit_action table took 3 hours to restore and piwik_log_visit estimated 10 hours!! After upgrade to 5.5, piwik_log_visit restored in 2,5 hours and piwik_log_link_visit_action in 2 hours.

Piwik DB update script run for 1,5 hours to updated the databases. The archive script completes in under 5 minutes, but has shown a error, saying PIWIK API not available doing "Starting Scheduled tasks" ... error was to fast to catch ;( will try it again this evening.

i hope to have it in production this week then i can give you some feedback on performance. But i strongly advice everyone to upgrade mysql to 5.5 smiling smiley it makes a huge difference in performance as far as restores go.

Regards Mike
Sorry, only registered users may post in this forum.

Click here to login

Free Forum support is provided by the Piwik Community. If you require any urgent or professional help, contact Piwik Professional Services team!