Download files

Dear All,

I am using the following regular expression to import logFiles:

python /var/www/piwik/misc/log-analytics/import_logs.py --url=https://XYZ /media/ezproxy/ezp20150729.log --idsite=35 --dry-run --log-format-regex=’(?P.)\s-\s[a-zA-Z0-9-].[(?P.?) (?P.?)] “(?P.?)"\s(?P\S+) (?P\S+)\s"(?P<user_agent>.?)”\s"(?P.*?)"’ --recorders=4 --enable-http-errors --enable-http-redirects --download-extensions=csd,ccs,dmg,enf,ens,enz,7z,aac,arc,arj,asf,asx,avi,bin,csv,deb,dmg,doc,docx,exe,gzip,hqx,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ibooks,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ods,odt,ogg,ogv,pdf,phps,ppt,pptx,qt,qtm,ra,ram,rar,rpm,sea,sit,tar,tbz,bz2,tgz,torrent,txt,wav,wma,wmv,wpd,xls,xlsx,xml,xsd,z,zip,azw3,epub,mobi,apk,flv,gz

This works fine, except the download files are not counted in the Piwik Backend (empty download file report); in contrast, in the --dry-run modus the download files are recognized.

When I use --log-format-name=common instead of the regular expression, there are about one third of unknown lines but download files are counted. Furthermore, with --log-format-name=common the browser types are not analyzed which is the case with the regular expression.

Has anyone an idea how to solve these problems?

Thank you!

Best
mucctecc

This is the Logs import summary from last night; there are 24132 downloads recognized but I cannot see any download statistic in Piwik, i.e. there are zero download files for yesterday:


880754 requests imported successfully
24132 requests were downloads
25079 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    0 invalid log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    316 requests done by bots, search engines...
    24763 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

880754 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 2563 seconds
Requests imported per second: 343.61 requests per second

Hi there,

can you try without the parameter --download-extensions - maybe this will work better?

if you think this is a bug in Log Analytics, please create a bug report at: Issues · matomo-org/piwik-log-analytics · GitHub with a small log file of a few lines that can be used to reproduce the issue, and commands used, etc.

I definitely need the additional file types csd, ccs, dmg, enf, ens and enz. I think, I will add these file types directly in import_logs.py …

If this is not working, I will create a bug report.

Thank you for your help.

Okay, I reproduced the problem with some sample data, I will create a bug report.

There is no difference whether I use the --download-extensions parameter or not (I extended the download list in import_logs.py).