NYCPHP Meetup

NYPHP.org

[nycphp-talk] How would you do this ?

Jad madi syntux at gmail.com
Mon Sep 25 09:23:35 EDT 2006


I'm building an RSS aggregator so I'm trying to find out the best way to
parse users account feeds equally so Lets say we have 20.000 user with
average of 10 feeds in account so we have about
200.000 feed

How would you schedule the parsing process to keep all accounts always
updated without killing the server? NOTE: that some of the 200.000 feeds
might be shared between more than one user

Now, what I was thinking of is to split users into
1-) Idle users (check their account once a week, no traffic on their RSS
feeds)
2-) Idle++ (check their account once a week, but got traffic on their
RSS feeds)
2-) Active users (Check their accounts regularly and they got traffic on
their RSS feeds)

NOTE: The week is just an example but at the end it’s going to be
dynamic ratio

so with this classification I can split the parsing power and time to
1-) 10% idle users
2-) 20% idle++ users
3-) 70% active users.

NOTE: There is another factors that should be included but I don’t want
to get the idea messy now (CPU usage, Memory usage, connectivity issues
(if feed site is down) in general the MAX execution time for the
continues parsing loop shouldn’t be more than 30 minutes 60 minutes)
Actually I’m thinking of writing a daemon to do it “just keep checking
CPU/memory” and excute whenever a reasonable amount of resource
available without killing the server.


Please elaborate.




More information about the talk mailing list