Report archives are huge since update to 2.10

Hi,

Thanks for creating the issue, skyhawk669

Matt, I’ll be able to upgrade a clone of our Piwik installation to 2.11.0 beta7 (that’s the latest i saw in http://builds.piwik.org/ list) on Monday, and will let you know what happens. Yes, the email reports have been generated and sent as scheduled.

Thanks

I’ve upgraded a clone of our Piwik server to 2.11.0 Beta 7 and tried running the core:archive and core:run-scheduled-tasks through commandline with no effect on our database size.

I want to check on expected behavior for core:run-scheduled-tasks --force… if you see in the screenshot in my earlier post, when I run that I get 116 lines of INFO CoreAdminHome[2015-02-17 23:15:15] Purging temporary archives: skipped. This is the same number as the total number of piwik_archive_numeric_year_month.ibd and piwik_archive_blob_year_month.ibd files put together, so I assume the task is skipping the purging of those archives. This seems relevant because what I need it to do is purge the duplicates from the archives for January 2015 and February 2015.

skyhawk669, when you ran core:run-scheduled-tasks --force did you also see it skip the purging of your archives?

What conditions could be causing the core:run-scheduled-tasks --force to skip purging those archives? How can I force it to purge those archives of duplicates?
Also I’m wondering if it’s possible that the upgrade is what’s screwing up how Piwik decides whether or not to skip the purging of duplicates from archives? I’ve been following the instructions to “replace the Piwik files with the latest version” manual process.

Thanks

Pats,

I also see the skipped lines when running the core:run-scheduled-task --force. I assume these are normal since all the archives for the previous months are already processed (they were done before the upgrade, so they don’t have duplicates). The only archives that would be modified at this point are January and February 2015 (from what I understand the January archive includes some stats for the whole year, so it would get modified for the rest of 2015).

I haven’t tried 2.11 yet to see if it fixes the problem.

Yes, it seems that my Piwik is skipping the purging of duplicates from all archives (which is expected for nonactive archives 2014_12 and earlier), but I don’t understand why is it skipping purging the latest four archives (blob_2015_01, numeric_2015_01, blob_2015_02, numeric_2015_02). It is these most recent four archives that I want to trigger a successful purge on and see if that clears out all the duplicates.

I’m experiencing the same problem the piwik_archive_blob_2015_01 is 15GB

UPDATE> The console core:run-scheduled-task --force --verbose fixed my problem. I was not patient aneought to wait for the process to complete, the cleanup have take a while…

Can someone still experiencing the problem, do the following:

  1. Replace core/ArchiveProcessor/Rules.php with https://raw.githubusercontent.com/piwik/piwik/master/core/ArchiveProcessor/Rules.php and run the command? This should provide a slightly more detailed message if temporary archive purging is skipped, which may help us track down the cause of the problem.

  2. Run the SELECT idarchive, name, value, period, date1, date2, ts_archived FROM piwik_archive_numeric_2015_01 where name like 'done%' and value <> 1; on a bloated archive numeric table and post the results?

The archive purging should be happening during scheduled task execution, so please check your cron archive output for the new message, too.

@capedfuzz: Here’s the results of the query (I’m currently running the scheduled tasks with the modified file and will post ASAP).

For 2015-01:

For 2015-02:

@capedfuzz, I’ve ran the job with the modified Rules.php file. For each “skipped” line I get the following extra info:

" Purging temporary archives: skipped (no authorization) "

Until 2.12 comes out, I’ve created a makeshift solution for users experiencing this issue. I’ve created a new console command here: https://gist.githubusercontent.com/diosmosis/a6b9cc61ff08bbe9bab5/raw/e408cf7c6c495ea4da4b8a43b17681614264f450/PurgeOldArchiveData.php . The command requires using 2.11.2.

Download and copy the file to plugins/CoreAdminHome/Commands and run it like “./console core:purge-old-archive-data 2015-03-01 2015-02-01 2015-01-01” to purge all old + invalidated data from your archive tables. Note: if you don’t supply a date, it will purge from the most recent archive table, and if you use all, it will purge from all existing tables. Also, you may have to clear any PHP caches used for the command to become available.

If you plan on using it on a production instance w/ old tables, it would be a good idea to make sure your data is backed up, just in case something else goes wrong.

This command will only purge the tables once, which means until 2.12 comes out the tables may still grow. In the mean time, you could setup a cron job to run /path/to/console core:purge-old-archive-data daily or weekly to keep your most recent archive tables small.

If you do notice your tables continuing to grow, please check your core:archive cron job output for scheduled task logs, as there may be multiple bugs causing this problem. And if anyone is able to give us access to an instance experiencing the issue, that would also be very helpful.

Hello there! :slight_smile:
After 2.12.0 release, is this bug solved? Related #7181 issue still opened