Scrapy的基础架构：

## Did Scrapy “steal” X from Django?

Probably, but we don’t like that word. We think Django is a great open source project and an example to follow, so we’ve used it as an inspiration for Scrapy.

We believe that, if something is already done well, there’s no need to reinvent it. This concept, besides being one of the foundations for open source and free software, not only applies to software but also to documentation, procedures, policies, etc. So, instead of going through each problem ourselves, we choose to copy ideas from those projects that have already solved them properly, and focus on the real problems we need to solve.

We’d be proud if Scrapy serves as an inspiration for other projects. Feel free to steal from us!

## 三、跃跃欲试

### 获取settings配置

os.environ是系统的环境变量字典：

## 四、豁然开朗

### Scraper的初始化

Scraper是什么，之前好像没听说过？带着这个问题，先来看看Scraper的初始化：

### 调度器的初始化

~~这里比较奇怪的是，磁盘队列和内存队列都是LIFO，明明是栈的特性，却偏要叫成队列。~~其实可以修改这些默认值，将其变为FIFO的队列，在scrapy.squeues.py文件中，可以看到许多队列的定义，你可以为调度器选择FIFO的队列：

## 五、游刃有余

### Slot初始化

schedule方法并不会立即执行，而只是注册到self.heartbeat属性中。

### 请求入队

• 第一种是返回了Request，那么就会重新调用crawl，在前面从调度器获取下一个请求已经讲过了。
• 第二种情况是返回Response或者Failure，那么会调用scraper.enqueue_scrape

Scraper类的_scrape中，调用_scrape2之后，还添加了两个回调handle_spider_errorhandle_spider_output，一个用于处理错误，另一个处理爬虫的返回值。