Sunday, 26 April 2020

How to install Scrapy on Ubuntu

$ which python3
/usr/bin/python3

$ python3 --version
Python 3.6.9

$ virtualenv --python=python3.6 venv
Running virtualenv with interpreter /usr/bin/python3.6
Already using interpreter /usr/bin/python3.6
Using base prefix '/usr'
New python executable in /home/bojan/dev/github/scrapy-demo/venv/bin/python3.6
Also creating executable in /home/bojan/dev/github/scrapy-demo/venv/bin/python
Installing setuptools, pip, wheel...
done.

$ source venv/bin/activate

(venv) $ pip install scrapy
Collecting scrapy
  Downloading Scrapy-2.1.0-py2.py3-none-any.whl (239 kB)
Collecting service-identity>=16.0.0
  Downloading service_identity-18.1.0-py2.py3-none-any.whl (11 kB)
Collecting cryptography>=2.0
  Downloading cryptography-2.9.2-cp35-abi3-manylinux2010_x86_64.whl (2.7 MB)
Collecting cssselect>=0.9.1
  Downloading cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
Collecting w3lib>=1.17.0
  Downloading w3lib-1.21.0-py2.py3-none-any.whl (20 kB)
Collecting pyOpenSSL>=16.2.0
  Downloading pyOpenSSL-19.1.0-py2.py3-none-any.whl (53 kB)
Collecting zope.interface>=4.1.3
  Downloading zope.interface-5.1.0-cp36-cp36m-manylinux2010_x86_64.whl (234 kB)
Collecting protego>=0.1.15
  Downloading Protego-0.1.16.tar.gz (3.2 MB)
Collecting parsel>=1.5.0
  Downloading parsel-1.5.2-py2.py3-none-any.whl (12 kB)
Collecting queuelib>=1.4.2
  Downloading queuelib-1.5.0-py2.py3-none-any.whl (13 kB)
Collecting lxml>=3.5.0
  Downloading lxml-4.5.0-cp36-cp36m-manylinux1_x86_64.whl (5.8 MB)
Collecting PyDispatcher>=2.0.5
  Downloading PyDispatcher-2.0.5.tar.gz (34 kB)
Collecting Twisted>=17.9.0
  Downloading Twisted-20.3.0-cp36-cp36m-manylinux1_x86_64.whl (3.1 MB)
Collecting pyasn1-modules
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting pyasn1
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting attrs>=16.0.0
  Using cached attrs-19.3.0-py2.py3-none-any.whl (39 kB)
Collecting six>=1.4.1
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Collecting cffi!=1.11.3,>=1.8
  Downloading cffi-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (399 kB)
Requirement already satisfied: setuptools in ./venv/lib/python3.6/site-packages (from zope.interface>=4.1.3->scrapy) (46.1.3)
Collecting incremental>=16.10.1
  Downloading incremental-17.5.0-py2.py3-none-any.whl (16 kB)
Collecting Automat>=0.3.0
  Downloading Automat-20.2.0-py2.py3-none-any.whl (31 kB)
Collecting hyperlink>=17.1.1
  Downloading hyperlink-19.0.0-py2.py3-none-any.whl (38 kB)
Collecting PyHamcrest!=1.10.0,>=1.9.0
  Downloading PyHamcrest-2.0.2-py3-none-any.whl (52 kB)
Collecting constantly>=15.1
  Downloading constantly-15.1.0-py2.py3-none-any.whl (7.9 kB)
Collecting pycparser
  Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
Collecting idna>=2.5
  Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Building wheels for collected packages: protego, PyDispatcher
  Building wheel for protego (setup.py) ... done
  Created wheel for protego: filename=Protego-0.1.16-py3-none-any.whl size=7765 sha256=7696d4fa63732f3509349e932b23800b7dacc6034fc150eaa3f4448fa401c6aa
  Stored in directory: /home/bojan/.cache/pip/wheels/b2/74/25/517a0ec6186297704db56664268e72686f5cfa8ab398582f33
  Building wheel for PyDispatcher (setup.py) ... done
  Created wheel for PyDispatcher: filename=PyDispatcher-2.0.5-py3-none-any.whl size=11515 sha256=17b011eb905d7eccda1eaafb51833fbe7a1b8406fb4a7764e7216ef19fafd698
  Stored in directory: /home/bojan/.cache/pip/wheels/28/db/61/691c759da06ba9b86da079bdd17cb3e01828d49d5c152cb3af
Successfully built protego PyDispatcher
Installing collected packages: six, pycparser, cffi, cryptography, pyasn1, pyasn1-modules, attrs, service-identity, cssselect, w3lib, pyOpenSSL, zope.interface, protego, lxml, parsel, queuelib, PyDispatcher, incremental, Automat, idna, hyperlink, PyHamcrest, constantly, Twisted, scrapy
Successfully installed Automat-20.2.0 PyDispatcher-2.0.5 PyHamcrest-2.0.2 Twisted-20.3.0 attrs-19.3.0 cffi-1.14.0 constantly-15.1.0 cryptography-2.9.2 cssselect-1.1.0 hyperlink-19.0.0 idna-2.9 incremental-17.5.0 lxml-4.5.0 parsel-1.5.2 protego-0.1.16 pyOpenSSL-19.1.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.20 queuelib-1.5.0 scrapy-2.1.0 service-identity-18.1.0 six-1.14.0 w3lib-1.21.0 zope.interface-5.1.0

(venv) $ pip list --local
Package          Version
---------------- -------
attrs            19.3.0 
Automat          20.2.0 
cffi             1.14.0 
constantly       15.1.0 
cryptography     2.9.2  
cssselect        1.1.0  
hyperlink        19.0.0 
idna             2.9    
incremental      17.5.0 
lxml             4.5.0  
parsel           1.5.2  
pip              20.0.2 
Protego          0.1.16 
pyasn1           0.4.8  
pyasn1-modules   0.2.8  
pycparser        2.20   
PyDispatcher     2.0.5  
PyHamcrest       2.0.2  
pyOpenSSL        19.1.0 
queuelib         1.5.0  
Scrapy           2.1.0  
service-identity 18.1.0 
setuptools       46.1.3 
six              1.14.0 
Twisted          20.3.0 
w3lib            1.21.0 
wheel            0.34.2 
zope.interface   5.1.0  

(venv) $ pip freeze 
attrs==19.3.0
Automat==20.2.0
cffi==1.14.0
constantly==15.1.0
cryptography==2.9.2
cssselect==1.1.0
hyperlink==19.0.0
idna==2.9
incremental==17.5.0
lxml==4.5.0
parsel==1.5.2
Protego==0.1.16
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
PyDispatcher==2.0.5
PyHamcrest==2.0.2
pyOpenSSL==19.1.0
queuelib==1.5.0
Scrapy==2.1.0
service-identity==18.1.0
six==1.14.0
Twisted==20.3.0
w3lib==1.21.0
zope.interface==5.1.0


Verification:

(venv) $ scrapy
Scrapy 2.1.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

(venv) $ scrapy bench
2020-04-25 23:57:01 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: scrapybot)
2020-04-25 23:57:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.6.9 (default, Apr 18 2020, 01:56:04) - [GCC 8.4.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 2.9.2, Platform Linux-4.15.0-96-generic-x86_64-with-Ubuntu-18.04-bionic
2020-04-25 23:57:01 [scrapy.crawler] INFO: Overridden settings:
{'CLOSESPIDER_TIMEOUT': 10, 'LOGSTATS_INTERVAL': 1, 'LOG_LEVEL': 'INFO'}
2020-04-25 23:57:01 [scrapy.extensions.telnet] INFO: Telnet Password: 6642b95b9fd73c04
2020-04-25 23:57:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.closespider.CloseSpider',
 'scrapy.extensions.logstats.LogStats']
2020-04-25 23:57:01 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-04-25 23:57:01 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-04-25 23:57:01 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-04-25 23:57:01 [scrapy.core.engine] INFO: Spider opened
2020-04-25 23:57:01 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-04-25 23:57:03 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 5580 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:03 [scrapy.extensions.logstats] INFO: Crawled 157 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:04 [scrapy.extensions.logstats] INFO: Crawled 230 pages (at 4380 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:06 [scrapy.extensions.logstats] INFO: Crawled 301 pages (at 4260 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:07 [scrapy.extensions.logstats] INFO: Crawled 365 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:08 [scrapy.extensions.logstats] INFO: Crawled 422 pages (at 3420 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:09 [scrapy.extensions.logstats] INFO: Crawled 485 pages (at 3780 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:10 [scrapy.extensions.logstats] INFO: Crawled 541 pages (at 3360 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:11 [scrapy.extensions.logstats] INFO: Crawled 597 pages (at 3360 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:12 [scrapy.core.engine] INFO: Closing spider (closespider_timeout)
2020-04-25 23:57:12 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 2940 pages/min), scraped 0 items (at 0 items/min)
2020-04-25 23:57:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 301439,
 'downloader/request_count': 661,
 'downloader/request_method_count/GET': 661,
 'downloader/response_bytes': 2092067,
 'downloader/response_count': 661,
 'downloader/response_status_count/200': 661,
 'elapsed_time_seconds': 10.596581,
 'finish_reason': 'closespider_timeout',
 'finish_time': datetime.datetime(2020, 4, 25, 22, 57, 12, 551514),
 'log_count/INFO': 20,
 'memusage/max': 55128064,
 'memusage/startup': 55128064,
 'request_depth_max': 22,
 'response_received_count': 661,
 'scheduler/dequeued': 661,
 'scheduler/dequeued/memory': 661,
 'scheduler/enqueued': 13220,
 'scheduler/enqueued/memory': 13220,
 'start_time': datetime.datetime(2020, 4, 25, 22, 57, 1, 954933)}

2020-04-25 23:57:12 [scrapy.core.engine] INFO: Spider closed (closespider_timeout)

No comments: