I'm investigating Piwik to get away from Google Analytics (and Analytics 360). I need to be able to...
- handle hundreds of millions of events per day; easily breaking 10k QPS (queries per second/events per second)
- have solid US responsiveness/coverage (e.g. distributing endpoints West, Central, East - no concern for International right now), meaning no crazy HTTP RTT delays or anything like that
- analyze user information <5min latency (basically I want something that competes with GA's "Real-Time" info)
- years of data retention; I'm fine with significantly slower reporting performance for >13month old data, meaning automatic datawarehousing and things like Glacier are fine with me
I understand this might have "significant cost" - that's fine, I just want it clearly quantified (I can't be the first guy asking about this, can I?). I don't need to manage complex transactions. I don't need to do any real customization of the Piwik system. I'm hoping there's some sort of KB article about "if you choose X, Y, or Z AWS systems, you'll get A, B, or C level of performance, as define by QPS or whatever, and you'll fill up your disks at D, E, or F rate."
I'm happy to use any AWS products that make sense, but I do not want to have to code custom stuff (meaning I'm interested in "how it's supposed to work out of the box"). I'm also happy to use other systems (e.g. Rackspace) and I have the ability to trivially CDN anything that needs wide distribution for HTTP reasons.