Tracking direct downloads with Piwik (script inside)

Hi all,

Preamble
The need for tracking direct downloads has arisen for many of us in these short lives of us as web analytics freaks. The forum has many a thread talking about that need, including this thread: 301 Moved Permanently

This thread (that you are currently reading) purports to satisfy the said need.

What is a direct download?
What we call here a direct download is a link that does not appear in a web page as a hyperlink. Put differently, the link below is a direct download IF it appears in an e-mail message (for instance) or if it’s entered by a user in their browser’s URL field or downloaded via a utility (such as wget) or somehow used outside a Web browser.


http://some.domain.com/downloads/dummy.pdf

It is NOT a direct download if it’s part of a normal construct embedded in a web page and accessed using a Web browser.

Down to business
A script for track direct downloads is available in the Download section further down this forum post. It’s based on code from
[ul]
[li]PHP Download Script with Resume option - Media Division
[/li][li]Tracking API - Analytics Platform - Matomo
[/li][/ul]

We added:
[ul]
[li] the MIME type for PDFs,
[/li][li] Piwik-specific code
[/li][li] several features that we needed
[/li][/ul]

Instructions
For instructions, read the two sections Installation and Usage on the website.

Support
Click the Support button on the download page specified below. Note that the issue tracker will require you to create a [free] account before you can post a ticket. To create an account, use the Registration link on the right hand side of the header.

Caveats
A visit tracked with this script won’t be factored in by Piwik when computing a “returning visitor” status. This probably has to do with the fact that the Piwik.js script isn’t running client-side. Hence, the server doesn’t receive some information (e.g. the screen resolution) that is normally provided by the Piwik client script (and that, I suspect, is taken into account in computing the visitor ID). Therefore, the visitor IDs won’t match although it’s actually the same human visitor. It may even be the same browser, OS, IP, etc.

Download
The script can be downloaded using the URL below. An account on the website is not needed. Put differently, just open the page and click Download.

http://www.khalemy.com/software/ddt-for-piwik/

How do we thank you?
We’ve got social networks accounts so follow us! :slight_smile: Fear no flooding: the information we’ll post there is mainly update news.

Cheers.

Last edited: 2015-05-11. Rewrote this OP to reflect the current status of this piece of software, specifically to integrate links to a dedicated web page which allows users of the script to get free support.

Thanks for this!

@bigworm: thank you for thanking me!

[EDIT of the OP]
I don’t know why, but the forum software won’t let me edit my own opening post. I can’t even refer people to this newer post!

I have changed the download.php script that I initially offered. Although the changes aren’t that substantial, they are important in that the script is now split into two files (both files are zipped in track_direct_downloads_script_piwik.zip):
[ul]
[li] the download.php file per se
[/li][li] an include file that is meant to store the specific configuration needed for a specific website
[/li][/ul]

The purpose of the splitting was to separate what might change from one website to the other (i.e. the include file) from what is rather stable (i.e. the download script).

Reminder: both files are supposed to be in the same folder of your web server’s filesystem.

Works great Amenel. Thank you very much for sharing

I am attempting to integrate your download tracking into pdf.js … any pointers.

I was hoping something as easy as viewer.html?file=download.php?file=myfile.pdf was going to work but it doesn’t

Hi Bigworm,

I guess pdf.js is a JavaScript library?

To sum things up, the script consists of two files to be moved to your server’s filesystem. On my different shared hosting spaces, I have a “www” folder that represents the root of my website. For me, and this is the first step, both files go in that folder. Because of that, the script is at http://.com/download.php

Second step: I have a folder named “mydownloads” at the same level as both files. The server space path is then ~/www/downloads and all my to-be-downloaded files go there.

Third step: YOU must choose a name for the folder where all your downloads must go.

Then, wherever I should have had a link like http://.com/mydownloads/anyfile.anyextension , you put instead something like http://.com/download.php?file=anyfile.anyextension

If you don’t understand what I wrote, you probably need to find someone tech-savvy to install and configure the script.

I don’t know what the “viewer.html” is supposed to do but this script that I’m offering doesn’t support having several “file” parameters. As said in the opening post that I can no longer edit, I can’t offer support for various environments and tools. Moreover, the whole thing seems simple enough (and easier than the link you gave) to me: it’s only about replacing a URL with another.

Amenel, I have setup your script and have it working correctly.

pdf.js is an open source project from Mozilla that takes a pdf and renders it to html5 inside your browser instead of a pdf.

I am trying to merge your download tracking technique with it.

Bigworm, then the merging boils down to making sure pdf.js gets what it needs while you also get what you want, that is the tracking of pdf files that are rendered to HTML5.

What does pdf.js need to render a file, for instance one that is called “instructions.pdf”? Does it need a URL or does it need a filesystem path?

If the viewer.html URL you gave previously is any indication to what it needs, I suspect the template URL for the rendering is “viewer.html?file=instructions.pdf”. Then the “file” parameter is likely, just as in my script, a filesystem path. In that case, I’m afraid you can’t have both.

( Btw, if “file” is a valid parameter to pdf.js, it is coincidentally the same spelling as my “file” but both are different. )

However, if viewer.html supports URLs to PDF files to be rendered then instead of “viewer.html?file=instructions.pdf” you’ll have “viewer.html?u=http://<yourdomain.com>/download.php?file=instructions.pdf”

Does pdf.js support URLs to files? If the answer is yes, you can have both; all it’ll take is changing to a different parameter: “u” in lieu of “file”. If it doesn’t, you’ll have to fork pdf.js and add support for URLs (which might very well not be a piece of cake). And all will be well :slight_smile:

Cheers.

It does support being passed a url! I am looking at the code to see about changing the parameter. Thanks for the pointer!

[quote=Amenel]However, if viewer.html supports URLs to PDF files to be rendered then instead of “viewer.html?file=instructions.pdf” you’ll have “viewer.html?u=http://<yourdomain.com>/download.php?file=instructions.pdf”

Does pdf.js support URLs to files? If the answer is yes, you can have both; all it’ll take is changing to a different parameter: “u” in lieu of “file”. If it doesn’t, you’ll have to fork pdf.js and add support for URLs (which might very well not be a piece of cake). And all will be well :slight_smile: [/quote]

On another note, would it be possible to block the user from using this to download your download.php file and the piwik_direct_downloads_config.inc file?

I am wanting to use it cloak the location of the files offered for download by putting them in a folder named something like 934gyo9iuhdfklasdfn and making that the root download folder, but a savvy user could manipulate the system to download the script files and then find out where the files are actually stored.

Found a possible answer while I was writing this post…I used PHP Encode | Obfuscate your PHP scripts easily to obfuscate the contents of the download.php file.

piwik is a analizer tool used for analyzing the users who access various web sites

@bigworm: normally, the PHP file can only be read on the server side. So the savvy user you’re concerned with can only be a system user or admin user. That would normally be only you or the administrator of your hosting space or webserver.

To block specifically the .inc file, I believe you may use the .htaccess file or its equivalent if you’re not using an Apache-compatible server.
You can also block your download folder (or entire folder structure) from web access. I believe it’s not even necessary to deny folder listing in the parent of your download folder.

When I referred to a savvy user, I meant someone that would change the url to http://waavsolutions.com/download.php?file=download.php and read the file and then do http://waavsolutions.com/download.php?file=piwik_direct_downloads_config.inc and thus be able to craft their own link directly to the files bypassing the download script.

OK :slight_smile:
However, a good practice would be what you did, i.e. change the download folder’s name from “download” to something less obvious. I hope everyone who downloads this script will do that.

As to download.php and piwik_direct_downloads_config.inc, these two files won’t be reachable via http://.com/download.php?file= because they are not meant to be in the download folder… They are in my case because this script is meant to be downloaded but even then, as one can see, the .inc file is redacted.

You can escape out of the download folder simply by crafting the url to be http://.com/download.php?file=…/download.php

If so, a control is missing in the script: checking that the absolute path to the file is prefixed by the path to the download folder.

I don’t know why, but the forum software won’t let me edit my own opening post. I can’t even refer people to this newer post!
Sorry about that, now there should be an “edit” button in the forum posts so you can edit your own posts.

I have edited the opening post. There now is a dedicated web page that offers support to whoever uses this script. Free, just like before.

Version 1.7 has been released this morning. Two new features:
[ul]
[li] the configuration is now totally externalized in the .inc file, which brings total portability to the main script
[/li][li] the number of times a link has been “triggered” can now be tracked in a database
[/li][/ul]

In case you are upgrading, I recommend that you backup your current installation (both the script and the .inc file) before updating, at least for copying the include path (which has been moved to the the configuration file).
I’ve tried to make the PHP code as portable as possible, by avoiding functions (such as get_result) with requirements or restrictions. But obviously, I can’t reckon with all possible configurations. Therefore, should you hit a snag, please use the Support button on the web page.