Custom BaseSpider Scrapy

I want to have some generic functionalities for spiders in a custom base spider class.

Usually scrapy spiders inherit from scrapy.Spider class.

I tried creating a BaseSpider class in the spiders folder of scrapy which didn't work

import scrapy


class BaseSpider(scrapy.Spider):
    def __init__(self):
        super(scrapy.Spider).__init__()

    def parse(self, response):
        pass

And here is my actual spider

import scrapy
import BaseSpider


class EbaySpider(BaseSpider):
    name = "ebay"
    allowed_domains = ["ebay.com"]

    def __init__(self):
        self.redis = Redis(host='redis', port=6379)
    # rest of the spider code

Gives this error

TypeError: Error when calling the metaclass bases
    module.__init__() takes at most 2 arguments (3 given)

Then i tried to use multi inheritance and make my ebay spider looks like

class EbaySpider(scrapy.Spider, BaseSpider):

    name = "ebay"
    allowed_domains = ["ebay.com"]

    def __init__(self):
        self.redis = Redis(host='redis', port=6379)
    # rest of the spider code 

which gives

TypeError: Error when calling the metaclass bases

metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

I am new in python as well as scrapy and i am trying to implement my PHP style of coding in it which isn't working i guess.

I am looking for a proper approach.

Thanks

Updated

Changed the init signature as per scrapy.Spider

BaseSpider

def __init__(self, *args, **kwargs):
        super(scrapy.Spider, self).__init__(*args, **kwargs)

EbaySpider

class EbaySpider(BaseSpider):
    def __init__(self, *args, **kwargs):
        super(BaseSpider,self).__init__(*args, **kwargs)
        self.redis = Redis(host='redis', port=6379)

Still getting

File "/scrapper/scrapper/spiders/ebay.py", line 11, in <module>
    class EbaySpider(BaseSpider):
TypeError: Error when calling the metaclass bases
    module.__init__() takes at most 2 arguments (3 given)

1 answer

  • answered 2017-06-17 18:11 Mikhail Korobov

    Take a look at scrapy.Spider.__init__ signature:

    def __init__(self, name=None, **kwargs):
        # ...
    

    Subclasses should define __init__ methods with the same signature. If you don't care about name and kwargs, just pass them to the base class:

    class BaseSpider(scrapy.Spider):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
    
        def parse(self, response):
            pass
    

    EbaySpider doesn't have to inherit from scrapy.Spider if it already inherits from BaseSpider. It should also have the same __init__ signature, and it also needs to call super():

    class EbaySpider(BaseSpider):
    
        name = "ebay"
        allowed_domains = ["ebay.com"]
    
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.redis = Redis(host='redis', port=6379)
    

    (I'm using Python 3 syntax for super())

    EDIT

    There is one additional issue: you're importing BaseSpider like this:

    import BaseSpider
    

    Likely you have a module named BaseSpider (BaseSpider.py file), and a class named BaseSpider inside this module. import BaseSpider gives you module object, not the spider class. Try it with from BaseSpider import BaseSpider, and better rename the module to avoid confusion and to follow pep-8.