jtahstu的博客

1373758426   root@jtahstu.com   Git仓库   英文博客  

最新碎语:以后没事写写小的知识点吧

您的位置:jtahstu的博客 >笔记> AC自动机一之抓取南阳OJ题目列表 PHP版

AC自动机一之抓取南阳OJ题目列表 PHP版

万里长征第一步,先从抓取题目列表开始,代码实在丑,淡定淡定

<?php
header("Content-type: text/html; charset=utf-8");
function getProblem($page) {
 // 初始化一个 cURL 对象
 $curl = curl_init();
 // 设置你需要抓取的URL
 curl_setopt($curl, CURLOPT_URL, "http://acm.nyist.net/JudgeOnline/problemset.php?page=$page");
 // 设置header
 curl_setopt($curl, CURLOPT_HEADER, 1);
 // 设置cURL 参数,要求结果保存到字符串中还是输出到屏幕上。
 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 // 运行cURL,请求网页
 curl_setopt($curl, CURLOPT_HEADER, array("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,
*/*;q=0.8", "Accept-Language: zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3"));
 curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0');
 $data = curl_exec($curl);
 // 关闭URL请求
 curl_close($curl);
 $res = array();
 $tbodyStart = stripos($data, "<TBODY>");
 $tbodyEnd = stripos($data, "</TBODY>", $tbodyStart);
 $start = $tbodyStart;
 for ($i = 0; $i < 100; $i++) {
  $tdStart1 = stripos($data, "<TD>", $start);
  $tdStart = stripos($data, "<TD>", $tdStart1 + 4);
  $tdEnd = stripos($data, "</TD>", $tdStart);
  $pid = substr($data, $tdStart + 4, $tdEnd - $tdStart - 4);
  $tdStart2 = stripos($data, "<TD>", $tdEnd + 4);
  $tdEnd2 = stripos($data, "</TD>", $tdStart2);
  $difficult = substr($data, $tdStart2 + 4, $tdEnd2 - $tdStart2 - 4);
  $aStart = stripos($data, "<a", $tdEnd2);
  $titleStart = stripos($data, "\">", $aStart);
  $titleEnd = stripos($data, "</a>", $titleStart);
  $title = substr($data, $titleStart + 2, $titleEnd - $titleStart - 2);
  $trEnd = stripos($data, "</TR>", $titleEnd);
  if ($trEnd > $tbodyEnd) {
   break;
  }
  $res[$pid] = array('pid' => $pid, 'difficult' => $difficult, 'title' => $title);
  $start = $trEnd + 4;
 }
 return $res;
}
function saveProblem($pid, $difficult, $title) {
 $dbms = 'mysql';
 $host = 'localhost';
 //数据库主机名
 $dbname = 'test';
 //使用的数据库
 $user = 'jtahstu';
 //数据库连接用户名
 $pass = 'jtahstu';
 //对应的密码
 $dsn = "$dbms:host=$host;dbname=$dbname";
 try {
  $dbh = new PDO($dsn, $user, $pass);
  //默认这个不是长连接,如果需要数据库长连接,需要最后加一个参数:array(PDO::ATTR_PERSISTENT => true) 变成这样:
  //$db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true));
  $dbh -> exec("set names utf8");
  $stmt = $dbh -> prepare("insert into nyojproblem values(?,?,?);");
  $stmt -> bindParam(1, $pid);
  $stmt -> bindParam(2, $difficult);
  $stmt -> bindParam(3, $title);
  $res = $stmt -> execute();
  return $res;
 } catch (PDOException $e) {
  die("Error!: " . $e -> getMessage() . "<br/>");
 }
}
$page = 12;
for ($i = 1; $i <= $page; $i++) {
 $res = getProblem($i);
 foreach ($res as $pid => $value) {
  if (saveProblem($value['pid'], $value['difficult'], $value['title']))
   ;
  echo $pid . "ok<br>";
 }
}
// var_dump(getProblem(5));
?>

运行后存入mysql数据库,额,表已经被我删了,没有数据库格式了,就三个字段pid,difficult,title,代码比较丑,3月31日coding

---

本文章采用 知识共享署名2.5中国大陆许可协议 进行许可,欢迎转载,演绎或用于商业目的。

---

二维码加载中...

扫一扫移动端访问O(∩_∩)O

发表评论

15 + 66 =
路人甲 表情
看不清楚?点图切换 Ctrl+Enter快速提交
正在加载中……