A disappointing weekend

After I got that encoder working on Friday, the rest of the weekend’s been a bit of a let-down. I attempted to start implementing the queue manager, which as per my previously discussed design must be able to accept and acknowledge new jobs synchronously, and then possibly asynchronously monitor the job as it gets processed on a recode server. In other words, I need to be able to create a new process that I can have bi-directional communication with for specification of job parameters^*, and then can switch to an asynchronous profile and stay active beyond the lifetime of the parent php script. I had a few strategies in mind to do this purely in php, ranging from an apparently incorrect memory that there is a way to end the http request before the script terminates, to starting a new php script via proc_open and closing STDIN/OUT/ERR back to the parent when the asynchronous effect was desired, to starting a new php script via an http request and having the client script abort or close the connection when the synchronous communications were completed (the new script of course using ignore_user_abort()).

Unfortunately, none of these strategies works. While it is possible to close the pipes created with proc_open and have the two processes run concurrently with no interaction, the parent process still will not terminate until the child has. So, while it would be possible to output all the HTML associated with an upload confirmation immediately, the connection wouldn’t close until it timed out. (Using a combination of Connection: close and content-length headers theoretically compels the browser to close the connection at the appropriate time in a scenario like this, but there’s no guarantee…plus generating a content length really requires everything to be output-buffered 😦 ) The other method, starting a new php script via an http request, probably would work on some configurations, but falls apart when there are any web server modules buffering script output, ie output compression mods. Even when the client indicates the response may not be compressed, something is retaining the initial bytes of output until the entire script completes. flush() doesn’t work, nor do various other ideas I had like sending an EOF character through the lines. I tried padding the output up to 8k and got the same result, and decided a solution that required more padding than that would just suck.

So, this leaves me with few options. Because there seems to be simply no way to proc_open a new process that stays alive after the parent has terminated, I am left with starting the new process by making a web request. I am now seriously considering implementing the queue manager in Java, as an extension to one of the many lightweight Java http servers out there. In this way, I would have full control over when responses are pushed out to the client, and could finish a request and then continue doing some processing. The big downside, besides being a bit more difficult to create, is that it would require MediaWiki installations that want to asynch. recode a/v contributions to have Java and run this special http server for the purpose.

^*I want this system to be able to support a means for some job parameters such as frame size to be plugged in by the user, just as it is possible for images to be sized according to where they will be used now. Probably this capability would mostly be used when a video clip appears in multiple articles/pages. Because of the more expensive processing and storage requirements associated with video, however, I don’t want to accept jobs that are very close to an existing version of a file. As a result, I am reluctant to simply run a command through the shell, because of the difficulty of communicating with those processes. Another possibility is to do all the parameter & job error checking in the main script, add it to the queue, and then launch a process through the shell that seeks out a recode server for the job and oversees it. I will talk with my mentor about this option versus a non-php solution.

michael Says:

July 17, 2007 at 6:33 pm

I am confused, why does the parent script need to die? .. why not just run it as a demon that manages transcodes run per “re encodes” box. it can be ran all the time and pulls tasks from the mysql table as it completes them and report the child process exit single/ return value from the proc_open encode operation.
It can do bi-directional communication with the open proc and update the mysql table reporting its percentage done progress.
I am probably missing some details…but HTTP pushing of tasks does not seem ideal .. how does the pusher know which encoding node to target ..and… a bunch of other problems come to mind…

We do something similar for our backed for metavid encoding management (although its only on a single box and only dealing with mpeg2 video so much simpler problem)…but some code may be applicable I am on IRC as biggmammoth …

mikeswikidevs Says:

July 18, 2007 at 12:14 am

A first thing to consider is that the problem of “which encoding node to target” does not vanish by running daemons on all recoding nodes, unless you have them all periodically polling the database for new jobs when the queue becomes empty. I’m trying to be a bit slicker than that, so that if there is a free recode node, it immediately starts processing a new job by being notified by the script that handles the upload from the user. This is the “parent” script that I need to have die (thus ending the user’s http request) while a forked process hangs around to monitor the recode node’s status for reliability purposes.

July 18, 2007 at 9:36 am

….to start up a php script in the background that does various tasks while not holding up the web request i use:
function simple_run_background($command){
$PID = shell_exec(“nohup $command > /dev/null & echo $!”);
return $PID;
}
where $command is /usr/bin/php somescript.php

so you have a forked process on the upload script computer sending monitor requests to the recode node over http? what if client switch machines between requests due to the load balancer .. having proc monitoring pulling on a given box might be harder to hunt down… Seems to me status info be stored in the mysql tables.. I would think that process queue are generally implemented so that nodes doing task assignment don’t need to know about task processing & and task processing does not have to deal with incoming requests…but I guess either way works..

is mplayer over http limited to realtime encoding? or is the piped output make mplayer just spit out audio/video data as fast as possible? I guess its going to be be mostly encoder bound but have you done comparisons of normal network file system access vs over http It seems the verboseness of http ranged request could potentially bog things down a bit, or maybe its not a factor…

Its an interesting approach though.. could be useful for real time archival of mplayer playable streams…

Darrel O'Pry Says:

July 18, 2007 at 4:35 pm

What is the real gain of telling the conversion script to go instead of waiting on cron? You could keep the queue, job states, and active nodes in the DB, then ping/wakeup an inactive node to grab the next job instead of waiting on cron.

However in this model you complicate your conversion daemons by making them listen to a port instead of just being a simple infinite loop.

I have another environment where we have a looped that backgrounds processes up to a certain number of processes and hangs out looping until it has a free process. It goes to sleep for a few mins when it hits max concurrency or has nothing to do to save resources.

I’d love to see what you’re working on. One of my pet projects is a media conversion and management system for Drupal.

July 18, 2007 at 6:26 pm

I discussed why I wanted to avoid using shell_exec, although this may not apply any more because it looks like I’ll be using a few preset bitrates/sizes rather than allowing users to create custom sizes to fit particular articles/pages. This reduces the amount of stuff that must be communicated to the background process as well as the number of ways it might (silently) fail. I still sort of want to use proc_open and Unix process control techniques though, just to learn how to do it 🙂

My plan has been for the forked process on the upload script computer to just monitor one persistent http connection, which the recode node would send a few bits on periodically, the point being that you can assume the recode process died unexpectedly if the connection times out or closes. Your comments have caused me to re-evaluate this, though, and it acutally seems pretty bogus, because there will also be a script on the recode node itself overseeing these processes, and it is in a much better position to detect and respond to such crashes. That being said, whether a background process needs to be started at all is debatable; although if a recode node went down/was unreachable and showed as idle in the database, video upload confirmation repsonse time could be delayed by an attempt to notify the downed server.

MPlayer does not have any built-in speed limit arising from the input file happening to be via http, and I seem to have formulated the magic combo of MPlayer options to have it decompress at maximum possible speed. As far as network overhead of http ranged requests vs network file systems, I don’t care 🙂 (processing should be mostly linear anyway), I think http will be what gets used because it’s my understanding they’re phasing out transfer of media on their private network via nfs, and if not its just a matter of specifying a different file for MPlayer to open.

July 18, 2007 at 6:31 pm

Darrel: I’d be interested in an expansion of your first paragraph, which I’m having a hard time following. How do I ping/wakeup an inactive node *without* having it listening to a port?

July 19, 2007 at 10:35 pm

sounds good 🙂 especial if they are phasing out nfs…I can see why push works well thanks for explanation 🙂
But yea it seems machines should just report their own stats to the db rather then being pulled by the node running the upload script. If it misses a tick.. ie did not update in > $UpdateTime the an arbitrary node can ping it etc see if its crashed and send the task to *one* other recode node and update that recode machines state as unresponsive.

July 24, 2007 at 2:07 pm

I wasn’t suggesting to have a ping/wakeup without listening to a port in the first paragraph. I was suggesting that you use the db as your queue and job state storage. Then ping your conversion server to let it know it has a job waiting and should get to work. This would definitely require listening to the ping.

I just prefer to have a dumb loop somewhere checking for new jobs. I’d rather conserve those CPU ticks for encoding than listening to port. As far as assigning jobs to members I tend to look at the total number of bytes an individual server is scheduled to process, and assign new jobs to the server with the least byte load. It’s not perfect, but works for me.

Mike Baynton’s Mediawiki Dev Blog

8 Responses to “A disappointing weekend”

Leave a comment

Archived Entry

Mike Baynton’s Mediawiki Dev Blog