Know your ISP.

breath-hyenas
User #317809   218 posts
Forum Regular

Having a few problems trying to figure out what to do.

For example if I have a .html file with the following code:

<table width="200" border="1">
<tr>
<td>Subject</td>
<td>Name </td>
<td>Time</td>
</tr>
<tr>
<td>Subject 1</td>
<td>Name 1</td>
<td>Time 1</td>
</tr>
<tr>
<td>Subject 2</td>
<td>Name 2</td>
<td>Time 3</td>
</tr>
</table>

and in a php file I would like to grab this and put it in a handy sql db. How would you go about this?

Please note: The reasoning behind this – there is a form that submits to my uni course timetable search and returns a table of all the class times available. If i can get this into a sql db it would be awesome.

reference: whrl.pl/RccDYK
posted 2010-Mar-14, 4pm AEST
User #293650   177 posts
Forum Regular

Here is a great tutorial that I've used to do basically the same thing as you want to do.

http://www.developertutorials.com/tutorials/php/scraping-links-with-php-8-01-05/page1.html

reference: whrl.pl/RccD1L
posted 2010-Mar-14, 4pm AEST
User #166340   934 posts
Whirlpool Enthusiast

regular expressions, easy as pie. You don't need to futz around with some kind of plugin or crap like that.

10 minutes, about the length of time it takes to set up some DOM parser thing.


$str = '<table width="200" border="1">
<tr>
<td>Subject</td>
<td>Name </td>
<td>Time</td>
</tr>
<tr>
<td>Subject 1</td>
<td>Name 1</td>
<td>Time 1</td>
</tr>
<tr>
<td>Subject 2</td>
<td>Name 2</td>
<td>Time 3</td>
</tr>
</table>';
$pattern = "/<td>([^<]*)<\/td>\s<td>([^<]*)<\/td>\s<td>([^<]*)<\/td>\s/m";
$matches = array();
preg_match_all($pattern, $str, $matches);
echo '<pre>';
for ($i=1; $i<count($matches); $i++) {
print_r($matches[$i]);
}
reference: whrl.pl/RccEfG
posted 2010-Mar-14, 5pm AEST
edited 2010-Mar-14, 5pm AEST
User #44690   20646 posts
Whirlpool Forums Addict

ironheart writes...

$pattern = "/<td>([^<]*)<\/td>\s<td>([^<]*)<\/td>\s<td>([^<]*)<\/td>\s/m";

For those following along at home, the /m flag isn't necessary here.

/m only affects the behaviour of ^ and $ anchors. These are not present in this regular expression.

It seems to be a common misconception that /m is absolutely necessary for regular expressions to work over multiple lines. This isn't true. Regular expressions work just fine on multiple-line input. The /s and /m flags simply tune their behaviour on such input, modifying the . and ^/$ metacharacters respectively.

As an aside, I'd use different delimiters anyway, so that some of the backslashes weren't necessary in the middle of the regex. Also I'd use single quotes — it's only coincidence that PHP doesn't have its own \s double-quoted-string escape.

$pattern = '#<td>([^<]*)</td>\s<td>([^<]*)</td>\s<td>([^<]*)</td>\s#m';
reference: whrl.pl/RccEqw
posted 2010-Mar-14, 6pm AEST
edited 2010-Mar-14, 6pm AEST
User #166340   934 posts
Whirlpool Enthusiast

Foonly writes...

/m only affects the behaviour of the ^ and $ anchors. These are not present in this regular expression.

I did it out of habit, but you're right, they're not needed. Are there notable inefficiencies in using the /m flag when it's not needed?

As an aside, I'd use different delimiters

So THAT's what the hashes mean. Research time!

reference: whrl.pl/RccEtc
posted 2010-Mar-14, 6pm AEST
User #44690   20646 posts
Whirlpool Forums Addict

ironheart writes...

I did it out of habit, but you're right, they're not needed. Are there notable inefficiencies in using the /m flag when it's not needed?

No, there shouldn't be. The flag needn't even be considered unless ^ or $ is present.

So THAT's what the hashes mean. Research time!

I could've used just about anything:

/.../
#...#
x...x
(...)
{...}
<...>

are all valid delimiter pairs. Note that in those final three cases the delimiters are balanced... not only do they point in opposite directions, but you must also balance them inside the regex:

preg_match('< foo> >', $subject);   # error

This actually made sense in Perl, since regexes there are part of the syntax, so the parser needs to work out where the end of the regex is. It makes far less sense in PHP where regexes are just strings.

I can't for the life of me fathom why the PHP developers required delimiters at all... couldn't they have wrapped the PCRE library in some kind of nicer interface?! No... that would have been both smart and nice to coders, two things the PHP developers don't seem very inclined to be.

reference: whrl.pl/RccEtR
posted 2010-Mar-14, 6pm AEST
edited 2010-Mar-14, 7pm AEST
User #166340   934 posts
Whirlpool Enthusiast

Foonly writes...

I could've used just about anything:

Interesting. I'd only ever seen regular expressions using the forward slash, like how I've done it. It's only recently that I've seen one with hashes, and I didn't understand why they were used, nor did it cross my mind what they were even named.

reference: whrl.pl/RccEvD
posted 2010-Mar-14, 6pm AEST
User #317809   218 posts
Forum Regular

ironheart writes...

for ($i=1; $i<count($matches); $i++) {
print_r($matches[$i]);
}

exactly what i needed. cheers

reference: whrl.pl/RccF7X
posted 2010-Mar-15, 8am AEST
Hosted by
Bulletproof Managed Hosting
Big numbers
1,668,889 threads
32,921,414 posts
3,365,326 whims sent
3,986 wiki topics
195 ISPs listed
10,178 broadband plans
1,268 modems & routers
59,837 features filled