nb_uniq_visitors on high traffic

Hi,

The query that counts uniq_visitors for a week is running for several days on my server and I was wondering why it is not summed by the daily tables.
Then I found the codes below:

piwik/core/ArchiveProcessing/Period.php


...
                                if($name == 'nb_uniq_visitors') continue;
...

                if(!Piwik::isUniqueVisitorsEnabled($this->period->getLabel()))
                {
                        unset($results['nb_uniq_visitors']);
                }

                foreach($results as $name => $value)
                {
                        if($name == 'nb_uniq_visitors')
                        {
                            $value = (float) $this->computeNbUniqVisitors();
                        }
                        $this->insertRecord($name, $value);
                }

and further in same page:


        /**
         * Processes number of unique visitors for the given period
         * 
         * This is the only metric we process from the logs directly, 
         * since unique visitors cannot be summed like other metrics.
         * 
         * @return int
         */
        protected function computeNbUniqVisitors()
        {
                $select = "count(distinct log_visit.idvisitor) as nb_uniq_visitors";
                $from = "log_visit";
                $where = "log_visit.visit_last_action_time >= ?
                        AND log_visit.visit_last_action_time <= ? 
                        AND log_visit.idsite = ?";

        $bind = array($this->getStartDatetimeUTC(), $this->getEndDatetimeUTC(), $this->idsite);

        $query = $this->getSegment()->getSelectQuery($select, $from, $where, $bind);

                return Zend_Registry::get('db')->fetchOne($query['sql'], $query['bind']);
        }

My problem is, my server just can’t processes so many data and give it back to php. So, what happens if I comment that code and leave the uniq_visitors sum work as all the other metrics ?

Hi,

looks like it counts the unique users, right ? by that statement: count(distinct log_visit.idvisitor)

So if I’d preffer to ignore them and use only ‘visits’ as metrics, can I comment the code ?

It is me again :slight_smile:

If I’m right in the 2 posts above, I think it would be better, for big websites, to show only ‘new visitors’.

During the tracking time, piwik knows if it is a new visitor or not and it could count those a new visitors and track them, and in the end of the periods you could show how many new visitors you have for that period. It is not the same, but it is easier to do and compute.

Another possible improvement:
In the numeric archieve tables we have this field:
value | float | YES | | NULL

it could be unsigned, and maybe double instead of float, some of my metrics are bigger than it and the numbers are been converted to scientific notation, i think it is time to convert it to double float, take a look:

create table leo (id float, id2 double);
insert into leo values (‘99999999999999’,‘99999999999999’);
select * from leo where id=id2;

Empty set (0.48 sec)

insert into leo values (‘9999’,‘9999’);
select * from leo where id=id2;

1 row in set (0.11 sec)

As the MySql’s manual says:
http://dev.mysql.com/doc/refman/5.5/en/numeric-type-overview.html
Float: 'A single-precision floating-point number is accurate to approximately 7 decimal places. ’
Double: 'A double-precision floating-point number is accurate to approximately 15 decimal places. ’

more detailed here:
http://dev.mysql.com/doc/refman/5.5/en/problems-with-float.html

cheers,
-lorieri

Thanks for the suggestions. I created the tickets:

Performance: Faster algorithm to count unique visitors · Issue #3120 · matomo-org/matomo · GitHub Archiving Performance: improve algorithm to count unique visitors

Very large traffic: change archive numeric from float to double · Issue #3121 · matomo-org/matomo · GitHub Very large traffic: change archive numeric from float to double

Allow to disable unique visitor count query for days weeks and months · Issue #3122 · matomo-org/matomo · GitHub Allow to disable unique visitor count query for weeks and months.

Thanks :slight_smile: