万里长征第一步,先从抓取题目列表开始,代码实在丑,淡定淡定
<?php header("Content-type: text/html; charset=utf-8"); function getProblem($page) { // 初始化一个 cURL 对象 $curl = curl_init(); // 设置你需要抓取的URL curl_setopt($curl, CURLOPT_URL, "http://acm.nyist.net/JudgeOnline/problemset.php?page=$page"); // 设置header curl_setopt($curl, CURLOPT_HEADER, 1); // 设置cURL 参数,要求结果保存到字符串中还是输出到屏幕上。 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // 运行cURL,请求网页 curl_setopt($curl, CURLOPT_HEADER, array("Accept: text/html,application/xhtml+xml,application/xml;q=0.9, */*;q=0.8", "Accept-Language: zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3")); curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0'); $data = curl_exec($curl); // 关闭URL请求 curl_close($curl); $res = array(); $tbodyStart = stripos($data, "<TBODY>"); $tbodyEnd = stripos($data, "</TBODY>", $tbodyStart); $start = $tbodyStart; for ($i = 0; $i < 100; $i++) { $tdStart1 = stripos($data, "<TD>", $start); $tdStart = stripos($data, "<TD>", $tdStart1 + 4); $tdEnd = stripos($data, "</TD>", $tdStart); $pid = substr($data, $tdStart + 4, $tdEnd - $tdStart - 4); $tdStart2 = stripos($data, "<TD>", $tdEnd + 4); $tdEnd2 = stripos($data, "</TD>", $tdStart2); $difficult = substr($data, $tdStart2 + 4, $tdEnd2 - $tdStart2 - 4); $aStart = stripos($data, "<a", $tdEnd2); $titleStart = stripos($data, "\">", $aStart); $titleEnd = stripos($data, "</a>", $titleStart); $title = substr($data, $titleStart + 2, $titleEnd - $titleStart - 2); $trEnd = stripos($data, "</TR>", $titleEnd); if ($trEnd > $tbodyEnd) { break; } $res[$pid] = array('pid' => $pid, 'difficult' => $difficult, 'title' => $title); $start = $trEnd + 4; } return $res; } function saveProblem($pid, $difficult, $title) { $dbms = 'mysql'; $host = 'localhost'; //数据库主机名 $dbname = 'test'; //使用的数据库 $user = 'jtahstu'; //数据库连接用户名 $pass = 'jtahstu'; //对应的密码 $dsn = "$dbms:host=$host;dbname=$dbname"; try { $dbh = new PDO($dsn, $user, $pass); //默认这个不是长连接,如果需要数据库长连接,需要最后加一个参数:array(PDO::ATTR_PERSISTENT => true) 变成这样: //$db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true)); $dbh -> exec("set names utf8"); $stmt = $dbh -> prepare("insert into nyojproblem values(?,?,?);"); $stmt -> bindParam(1, $pid); $stmt -> bindParam(2, $difficult); $stmt -> bindParam(3, $title); $res = $stmt -> execute(); return $res; } catch (PDOException $e) { die("Error!: " . $e -> getMessage() . "<br/>"); } } $page = 12; for ($i = 1; $i <= $page; $i++) { $res = getProblem($i); foreach ($res as $pid => $value) { if (saveProblem($value['pid'], $value['difficult'], $value['title'])) ; echo $pid . "ok<br>"; } } // var_dump(getProblem(5)); ?>
运行后存入mysql数据库,额,表已经被我删了,没有数据库格式了,就三个字段pid,difficult,title,代码比较丑,3月31日coding
---
本文章采用 知识共享署名2.5中国大陆许可协议 进行许可,转载必须注明作者和本文链接。
---
发表评论