What are those rare cases of 0 visits?

Hi,

I have the problem of zero visits all the time, it is mentioned here:
http://piwik.org/faq/troubleshooting/#faq_141

I applied the patch mentioned here: http://dev.piwik.org/trac/changeset/5566
but I still have the same problem.

It is not that simple to fix it by forcing archiving to run again because:

  • too many websites to check
  • too many data to check, I can’t go to hundred sites and check which metric is Zeroed every day, every week, etc…
  • I can’t force the visits for all periods because I delete old data very frequently (7 days)

so, it would be great to know what exactly is those rare cases so I can try to avoid or check if they occurred then I can immediately try again.

btw, I still use the archive.sh not the archive.php because I have more control using the .sh. The .php has too much intelligence on it that could compromise more controlled environments.

Thanks,
-lorieri

Definitely an annoying bug that we want to fix asap but it is still unknown cause and hard to replicate.

Is it the same as here: archive.php does not archive weeks/month/year in some cases on low traffic websites · Issue #2984 · matomo-org/matomo · GitHub

are the websites low traffic or even large traffic?

looks like some kind of race condition or when some date is set more than once. I will investigate it more :slight_smile:

thanks

Hi Matt,

both small and huge traffic websites are affected randomly

I think I found something, maybe piwik is changing the idsite during the archive.

for each day we have 14 values, one of those is the “done”.

for all problematic sites with no daily values, the next or previous id were counted twice.
but both counts for the next site has same values, it did not put the previous id counts in the next one.

id 62 with problem and 61 counted twice

mysql> select * from piwik_archive_numeric_2012_05 where idsite=62 and date1=“2012-05-20” ;
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
| idarchive | name | idsite | date1 | date2 | period | ts_archived | value |
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
| 8422 | done | 62 | 2012-05-20 | 2012-05-20 | 1 | 2012-05-21 04:37:17 | 1 |
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
1 row in set (0.00 sec)

mysql> select count() from piwik_archive_numeric_2012_05 where idsite=61 and date1=“2012-05-20” group by idsite;
±---------+
| count(
) |
±---------+
| 27 |
±---------+
1 row in set (0.00 sec)

id 15 problematic and 16 counted twice

mysql> select * from piwik_archive_numeric_2012_05 where idsite=15 and date1=“2012-05-20” ;
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
| idarchive | name | idsite | date1 | date2 | period | ts_archived | value |
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
| 8334 | done | 15 | 2012-05-20 | 2012-05-20 | 1 | 2012-05-21 04:31:08 | 1 |
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
1 row in set (0.00 sec)

mysql> select count() from piwik_archive_numeric_2012_05 where idsite=16 and date1=“2012-05-20” ;
±---------+
| count(
) |
±---------+
| 27 |
±---------+
1 row in set (0.00 sec)

id 5 problematic, id 4 counted twice

mysql> select * from piwik_archive_numeric_2012_05 where idsite=5 and date1=“2012-05-19” ;
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
| idarchive | name | idsite | date1 | date2 | period | ts_archived | value |
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
| 1439 | done | 5 | 2012-05-19 | 2012-05-19 | 1 | 2012-05-20 04:11:37 | 1 |
±----------±-----±-------±-----------±-----------±-------±--------------------±------+
1 row in set (0.00 sec)

mysql> select count() from piwik_archive_numeric_2012_05 where idsite=6 and date1=“2012-05-19” ;
±---------+
| count(
) |
±---------+
| 27 |
±---------+
1 row in set (0.00 sec)

now take a look in the ts_archive dates from one of the double counted ids:

mysql> select name,ts_archived from piwik_archive_numeric_2012_05 where idsite=61 and date1=“2012-05-20” ;
±------------------------------±--------------------+
| name | ts_archived |
±------------------------------±--------------------+
| bounce_count | 2012-05-21 04:35:42 |
| max_actions | 2012-05-21 04:35:42 |
| nb_actions | 2012-05-21 04:35:42 |
| nb_uniq_visitors | 2012-05-21 04:35:42 |
| nb_visits | 2012-05-21 04:35:42 |
| sum_visit_length | 2012-05-21 04:35:42 |
| UserCountry_distinctCountries | 2012-05-21 04:36:26 |
| bounce_count_returning | 2012-05-21 04:36:27 |
| max_actions_returning | 2012-05-21 04:36:27 |
| nb_actions_returning | 2012-05-21 04:36:27 |
| nb_uniq_visitors_returning | 2012-05-21 04:36:27 |
| nb_visits_returning | 2012-05-21 04:36:27 |
| sum_visit_length_returning | 2012-05-21 04:36:27 |
| bounce_count | 2012-05-21 05:00:42 |
| max_actions | 2012-05-21 05:00:42 |
| nb_actions | 2012-05-21 05:00:42 |
| nb_uniq_visitors | 2012-05-21 05:00:42 |
| nb_visits | 2012-05-21 05:00:42 |
| sum_visit_length | 2012-05-21 05:00:42 |
| bounce_count_returning | 2012-05-21 05:01:09 |
| done | 2012-05-21 05:01:09 |
| max_actions_returning | 2012-05-21 05:01:09 |
| nb_actions_returning | 2012-05-21 05:01:09 |
| nb_uniq_visitors_returning | 2012-05-21 05:01:09 |
| nb_visits_returning | 2012-05-21 05:01:09 |
| sum_visit_length_returning | 2012-05-21 05:01:09 |
| UserCountry_distinctCountries | 2012-05-21 05:01:09 |
±------------------------------±--------------------+
27 rows in set (0.00 sec)

the first block of values is close to the problematic id

mysql> select name,ts_archived from piwik_archive_numeric_2012_05 where idsite=62 and date1=“2012-05-20” ;
±-----±--------------------+
| name | ts_archived |
±-----±--------------------+
| done | 2012-05-21 04:37:17 |
±-----±--------------------+
1 row in set (0.00 sec)

take a look it ordered by ts_archived for all websites:

mysql> select idsite,date1,date2,name,ts_archived from piwik_archive_numeric_2012_05 where ts_archived > “2012-05-21 04:36:00” and ts_archived < “2012-05-21 04:37:28” ;
±-------±-----------±-----------±------------------------------±--------------------+
| idsite | date1 | date2 | name | ts_archived |
±-------±-----------±-----------±------------------------------±--------------------+
| 61 | 2012-05-20 | 2012-05-20 | UserCountry_distinctCountries | 2012-05-21 04:36:26 |
| 61 | 2012-05-20 | 2012-05-20 | bounce_count_returning | 2012-05-21 04:36:27 |
| 61 | 2012-05-20 | 2012-05-20 | max_actions_returning | 2012-05-21 04:36:27 |
| 61 | 2012-05-20 | 2012-05-20 | nb_actions_returning | 2012-05-21 04:36:27 |
| 61 | 2012-05-20 | 2012-05-20 | nb_uniq_visitors_returning | 2012-05-21 04:36:27 |
| 61 | 2012-05-20 | 2012-05-20 | nb_visits_returning | 2012-05-21 04:36:27 |
| 61 | 2012-05-20 | 2012-05-20 | sum_visit_length_returning | 2012-05-21 04:36:27 |
| 62 | 2012-05-20 | 2012-05-20 | done | 2012-05-21 04:37:17 |
| 65 | 2012-05-20 | 2012-05-20 | bounce_count | 2012-05-21 04:37:26 |
| 65 | 2012-05-20 | 2012-05-20 | max_actions | 2012-05-21 04:37:26 |
| 65 | 2012-05-20 | 2012-05-20 | nb_actions | 2012-05-21 04:37:26 |
| 65 | 2012-05-20 | 2012-05-20 | nb_uniq_visitors | 2012-05-21 04:37:26 |
| 65 | 2012-05-20 | 2012-05-20 | nb_visits | 2012-05-21 04:37:26 |
| 65 | 2012-05-20 | 2012-05-20 | sum_visit_length | 2012-05-21 04:37:26 |
| 67 | 2012-05-20 | 2012-05-20 | done | 2012-05-21 04:37:26 |
| 69 | 2012-05-20 | 2012-05-20 | bounce_count | 2012-05-21 04:37:27 |
| 69 | 2012-05-20 | 2012-05-20 | max_actions | 2012-05-21 04:37:27 |
| 69 | 2012-05-20 | 2012-05-20 | nb_actions | 2012-05-21 04:37:27 |
| 69 | 2012-05-20 | 2012-05-20 | nb_uniq_visitors | 2012-05-21 04:37:27 |
| 69 | 2012-05-20 | 2012-05-20 | nb_visits | 2012-05-21 04:37:27 |
| 69 | 2012-05-20 | 2012-05-20 | sum_visit_length | 2012-05-21 04:37:27 |
| 71 | 2012-05-20 | 2012-05-20 | bounce_count | 2012-05-21 04:37:27 |
| 71 | 2012-05-20 | 2012-05-20 | max_actions | 2012-05-21 04:37:27 |
| 71 | 2012-05-20 | 2012-05-20 | nb_actions | 2012-05-21 04:37:27 |
| 71 | 2012-05-20 | 2012-05-20 | nb_uniq_visitors | 2012-05-21 04:37:27 |
| 71 | 2012-05-20 | 2012-05-20 | nb_visits | 2012-05-21 04:37:27 |
| 71 | 2012-05-20 | 2012-05-20 | sum_visit_length | 2012-05-21 04:37:27 |
±-------±-----------±-----------±------------------------------±--------------------+
27 rows in set (0.01 sec)

I wonder if it would be counted twice for the week, month and year for the doubled Id.

I hope it helps to find the cause and fix it :slight_smile:

Thanks for your research! I have linked it from: archive.php does not archive weeks/month/year in some cases on low traffic websites · Issue #2984 · matomo-org/matomo · GitHub
it will definitely be useful. I’ll let you know here if I can find the fix

do you think in those cases the values would be count twice for week, month and year ?

thanks !!!

I forgot to mention I run the daily archives in parallel, because we have too many sites and takes few hours to run if we do it one by one.

cheers

since it not run in parallel anymore it is taking 12 hours to run all the archive for 2 days, 2 weeks and 2 years for all websites :frowning:

Why is it not running in parallel anymore?

please let me know how we can help you so piwik works better in your case :slight_smile:

most important is: will it be count wrongly ?

since it gets confused by idsites I thought running archives not in parallel would be a good ideia, and since that I got no “0 visits” anymore.

what is missing for me is fix this problem, everything else is going good so far :slight_smile:

best guess so far

            561393 Execute  SELECT max(idarchive) 
                                                    FROM piwik_archive_numeric_2012_05

Hi,

I believe I fixed the bug and now I’m able to run archives in parallel.

I’ve changed ArchiveProcessing.php from this:


        protected function loadNextIdarchive()
        {
                $db = Zend_Registry::get('db');
                $id = $db->fetchOne("SELECT max(idarchive) 
                                                        FROM ".$this->tableArchiveNumeric->getTableName());
                if(empty($id))
                {
                        $id = 0;
                }
                $this->idArchive = $id + 1;

        }

to this:


        protected function loadNextIdarchive()
        {
                $db = Zend_Registry::get('db');
                $myuniqid=uniqid("lock-it-");
                $mytable=$this->tableArchiveNumeric->getTableName();
                $mylock = $db->exec("LOCK TABLES $mytable write, $mytable as tb1 read");
                $mylock = $db->exec("INSERT into $mytable select ifnull(max(idarchive),0)+1, '$myuniqid','','','','','','' from $mytable as tb1");
                $mylock = $db->exec("UNLOCK TABLES");
                $id = $db->fetchOne("SELECT idarchive FROM $mytable where name='$myuniqid' limit 1");
        #       if(empty($id))
        #       {
        #               $id = 0;
        #       }
                if(!$id) return false;
                else $this->idArchive = $id;

        }

and I found 2 potential problematic scripts:

plugins/Goals/API.php: $idGoal = $db->fetchOne("SELECT max(idgoal) + 1
plugins/PDFReports/API.php: $idReport = $db->fetchOne("SELECT max(idreport) + 1

cheers,
-lorieri

Oh very cool thank you so much for investigating! I would never have found that, or it would have been extremely difficult !

The other 2 examples are not problems as they don’t get executed automatically during archiving (but only when 1) creating a new goal 2) creating a new email report

cool :slight_smile:

even though they are not so serious, I think it would be good to protect them too to avoid unexpected problems :slight_smile:

cheers,
-lorieri

Thank you very much for your detective work and the patch for this issue!
released in 1.8 RC5 301 Moved Permanently

Cool !!!