[nycphp-talk] Regex for P Elements

John Campbell jcampbell1 at
Thu Jan 13 02:09:56 EST 2011

On Wed, Jan 12, 2011 at 10:55 PM, Rob Marscher
<rmarscher at> wrote:
>> On Wed, Jan 12, 2011 at 9:30 AM, Jim Yi <jim at> wrote:
>>> This problem is much better suited for an XML parser
> On Jan 12, 2011, at 9:39 AM, Randal Rust wrote:
>> I will have to try this out, because I am not sure that the approach I
>> was taking will work.
> I seem to remember having problems when I was using the DomDocument on rss feeds that were submitted by users and not under my control.  Many of them were not well-formed and that caused DomDocument to not work.  Getting the creator of the rss feed to fix it wasn't an option, but I did have some luck enabling the "recovery" mode of libxml which is available in DomDocument->recovery property -
> Also, if the content is not encoded in utf8, you might need to run utf8_decode on the strings to get the right data because I believe libxml uses utf8 internally.

If you are using DomDocument, passing everything though Tidy first is
a good idea.

John Campbell

More information about the talk mailing list