More problems with server failing to process

lesjokolat · December 12, 2012, 5:36pm

How high is your fcgi max requests?

FcgidMaxRequestsPerProcess

Maybe 10000 or higher will help?

summer · December 12, 2012, 9:27pm

unless I’m looking in the wrong spot, I don’t have that setting.

in fcgid.conf is where I set the FcgidBusyTimeout to 1800

httpd.conf has FcgidMaxRequestLen 1073741824, but neither conf file has an entry for MaxRequestsPerProcess

Should I add something like “FcgidMaxRequestsPerProcess 5000” to fcgid.conf?

lesjokolat · December 12, 2012, 9:37pm

http://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html#fcgidmaxrequestsperprocess

http://www.dev-smart.com/archives/54

The 3rd seems to have a nice example.

summer · December 14, 2012, 5:19am

I’m wondering if adjusting the fcgid timeout may have done the trick. I haven’t tried tweaking the MaxProcesses options yet, because it’s running better.

There’s only been one error in the cron archive run since midnight last night (almost 24hrs, running every 4hrs), and the latest cron run took a total of 8 minutes to run, with that year processing for Site 1 taking only 255s instead of giving up somewhere between 1200-1600s.

If the next few runs go smoothly, I’ll drop the archiving down to 2hrs and see if it stays in the normal operations range

summer · December 14, 2012, 5:33pm

nope, the server head faked me

These were the errors that showed up in the main error_log overnight:


[Fri Dec 14 05:01:02 2012] [error] mod_fcgid: process /fcgi-bin/php5.fcgi(2455) exit(communication error), get unexpected signal 11
[Fri Dec 14 08:59:39 2012] [error] mod_fcgid: process /fcgi-bin/php5.fcgi(29881) exit(communication error), get unexpected signal 11

and from the virtual server error_log:


[Fri Dec 14 05:00:58 2012] [warn] (104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server
[Fri Dec 14 05:00:58 2012] [error] Premature end of script headers: index.php
[Fri Dec 14 08:59:35 2012] [warn] (104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server
[Fri Dec 14 08:59:35 2012] [error] Premature end of script headers: index.php

time to add some of those other fcgid options…

lesjokolat · December 14, 2012, 6:41pm

I am just wondering here is some stuff i found relating to permission issues of others using fastcgi.

There is no problem with your apc.ini file - I have the exact same setup (CentOS/PHP 5.3.6/FastCGI/APC 3.1.6) and your apc.ini file worked without issue. I might guess that you have a permissions problem on your /tmp directory - try setting it to 0777 to diagnose; if that fixes it, either change the mmap path or narrow down the permissions to something acceptable. – cyberx86 Jul 7 '11 at 6:40

@cyberx86 Thanks a lot buddy. I ended up solving the problem using your advice. The problem was on multiple fronts. The apc.ini file was supplemented by w3 total cache’s plugin in its ini folder. It was owned by my ftp user (it was cp-ed directly from there to /etc/php.d/), so i just changed it to root/root, and the /tmp folder had restrictive permissions, so i loosened it up a little bit. Answer the question, so i can give you credit. Thanks again. – VicePrez Jul 7 '11 at 21:20
1

Give it to @Aleksey Korzun - he has essentially the same thing, just approached differently, and posted it before I did (which is why I didn’t post mine as an answer - didn’t seem right). Glad it worked out - thanks for thought. – cyberx86 Jul 8 '11 at 1:02

for sure @cyberx86. by the way, whats considered a secure set of permissions for the /tmp folder, and what would be an alternative to it? – VicePrez Jul 8 '11 at 3:07

Ideally if you can avoid execution permissions that is a good. Usually tmp is owned by root - and you will have other ‘users’ writing there - which means at least 0666 is common. Most linux distros have 0777 drwxrwxrwt (t=only user can delete). Make it as tight as possible - if you don’t have any users writing to tmp, try a 0664 or even a 0644, but I don’t think most things will work with those.

summer · December 15, 2012, 6:36pm

I have no idea where you came up with those, but no, not even remotely related to file or directory permissions

The past 4 archive runs have had the same error, generic “we have no idea what the problem is, but increase your memory in php.ini”, which is ridiculous because it’s already set to 5Gb. For crying out loud, how much memory does this thing need to process 4 million visits for a year?

The database itself is only 1.1Gb now, and that one site contains data back to Jan 1 2012, information from the 25Gb of logs that were imported between July and October.

This is highly frustrating, and having it work that one time the other day and not any other time in the past month is not helping my mood nor my troubleshooting!

lesjokolat · December 15, 2012, 10:21pm

I am looking and browsing and reference to reference leads me to some obscure sites… sorry as i know some will be irrelevant but im hoping i can trigger an aha moment…

i have taken a step back and seen I never asked you what version of mysql you run. Would it happen to be mysqli?

http://www.docunext.com/wiki/FastCGI seemed interesting.

linux - A single php-fastcgi process blocks all other PHP requests - Server Fault (this one has some promise i think as it tries to handle memoy issues in scripts)

mod_fcgid: read data from fastcgi server error | Virtualmin was internesting regarding some scipt handling.

summer · December 16, 2012, 12:10am

If it’s using mysqli, it’s not on purpose.

phpinfo says mysqli is configured, but how can I tell if Piwik is using it?

PHP script run time is unlimited
PHP Max Memory is 5Gb
PHP Input Parsing time is 1500 seconds
PHP Max Execution time is unlimited
MySQL connection timeout is 1200 seconds
MySQL persistent connections & total connections are unlimited

I had thought maybe that Piwik isn’t releasing the memory it’s allowed to use once it finishes archiving with an error, but I’m not so sure anymore.

I rebooted the server this morning, and it’s real memory usage was under 1Gb. But once the archive cron ran, and every time since then that the archive script has run, real memory usage shot up to 8.5Gb and stays there, and I’m wondering if it doesn’t have enough memory to grab (another 5Gb) to run again. And there’s literally nothing else on that server.

The error in the most recent cron run is this one, with complete output before and after for context:


Starting Piwik reports archiving...
Archived website id = 1, period = day, Time elapsed: 1.188s
Archived website id = 1, period = week, 27218 visits, Time elapsed: 5.929s
Archived website id = 1, period = month, 457092 visits, Time elapsed: 21.688s
ERROR: Got invalid response from API request: /index.php?module=API&method=VisitsSummary.getVisits&idSite=1&period=year&date=last3&format=php&trigger=archivephp. The response was empty. This usually means a server error. This solution to this error is generally to increase the value of 'memory_limit' in your php.ini file. Please check your Web server Error Log file for more details.
Archived website id = 1, period = year, 0 visits, Time elapsed: 1560.482s
Archived website id = 1, today = 714 visits, 4 API requests, Time elapsed: 1589.288s [1/7 done]

lesjokolat · December 16, 2012, 6:52am

http://piwik.org/docs/optimize/

in particluar

In your config/config.ini.php, check that under [database] you find adapter=PDO_MYSQL rather than adapter=MYSQLI which can be slower. If the line adapter= is not found, you are successfully the default PDO_MYSQL

Also i saw someone with cron archive issues turn off geoip plugin and have it work… be curious if that helped…

summer · December 16, 2012, 8:30am

No adapter=" directive, never had the geoip plugin in use. I did just activate the DBstats plugin, just to see what Piwik thinks it’s using and make it easier to find that info for the tables.

I keep hoping for an easy fix to this, something painfully obvious that we just missed from the start

Will give the tuning primer script a try in the morning.

matthieu · December 16, 2012, 10:59pm

The past 4 archive runs have had the same error, generic “we have no idea what the problem is, but increase your memory in php.ini”, which is ridiculous because it’s already set to 5Gb. For crying out loud, how much memory does this thing need to process 4 million visits for a year?

Check the “Web server error logs” to find out the error message ?

summer · December 17, 2012, 8:17am

You missed my earlier posts, Matt… there are no entries in the error logs other than the ones I’ve posted here.

Sometimes, even when there’s an error in the archive output (the generic 500 server error), there are no entries in the error logs at all, nothing in the error log for the virtual server nor for the main server.

That’s what makes this so frustrating.

matthieu · December 17, 2012, 12:26pm

Sorry I don’t know then, this is one of the first time I hear of no error logged… Maybe ask your webhost or sysadmin ?

summer · December 17, 2012, 5:28pm

that would be me, technically. This is the setup that Arvixe rejected because of database size, so I had to find another option.

The only errors that appear in the logs, 98% of the time, are the ones listed here[/url] and [url=http://forum.piwik.org/read.php?2,95599,page=1#msg-98315]here. No other errors appear in the logs that lead us to possible solutions.

Switching from fcgi to either cgi or mod_php leads to the archive failing faster, no other differences, and with either the same or no errors in the logs.

I shouldn’t need more than 5Gb of RAM to process the year stats from the one website, according to what I had read here, correct? Does this mean when we hit January that this problem will go away? What else could be going on?

lesjokolat · December 17, 2012, 5:58pm

Hi I realize this may already be in place so sorry if it sounds beginerish…(new word!). I am wondering if maybe there is someway to create a new error log directory to help us glean more info to teh source of teh problem…

summer · December 18, 2012, 9:59pm

I dunno… when I upgraded to PHP 5.4, it turned on some sort of extra logging by default… I saw it the first few runs, but afterwards, nothing, well nothing detailed like the php errors being logged when I’d upgraded Piwik to 1.9.3-b7…

And to make matters more confusing, every so often the archiving will just plain WORK


[2012-12-18 18:30:03] Starting Piwik reports archiving...
[2012-12-18 18:30:04] Archived website id = 1, period = day, Time elapsed: 1.565s
[2012-12-18 18:30:08] Archived website id = 1, period = week, 50228 visits, Time elapsed: 3.921s
[2012-12-18 18:30:35] Archived website id = 1, period = month, 1566994 visits, Time elapsed: 26.902s
[2012-12-18 18:35:28] Archived website id = 1, period = year, 4467852 visits, Time elapsed: 292.526s
[2012-12-18 18:35:28] Archived website id = 1, today = 553 visits, 4 API requests, Time elapsed: 324.952s [1/8 done]

Why it would process the year in 300 seconds one run, and not be able to complete it in 1800 seconds the next 10 runs is boggling.

cron is still running every 4hrs, so I’ll check the next run and see if that one completes.

lesjokolat · December 18, 2012, 10:09pm

In all my searches I saw references to how during some archiving the fcgi could be tripped up by bots hitting the site as one was running the archive on a given server. Could this be the cause here, whereby sometimes during an archive run if no bot trips things up its ok but on others it is not?

Another idea if you like me get alot of these w00t.w00t servers scans perhaps those resources can be playing havoc with the archive run. They are not huge requests but maybe enough to throw of the job.

http://pierre.linux.edu/2010/06/using-iptables-to-reject-w00tw00t-at-isc-sans-dfind-scanners/

summer · December 23, 2012, 10:04pm

do you mean bots hitting the Piwik server, or bots hitting the site while its data being archived?

I’ve never seen any bots calling themselves w00t.w00t, so this is new to me

lesjokolat · December 23, 2012, 11:41pm

Either actually. But i would guess a bot hitting the piwik server would be more severe a resource drain.

The access log file will often show the w00t w00t as well, what o read and have experienced on my own server is the attempt of a 1.1 html GET occurring alot in the scan process