None type object has no attribute error in webcrawler Python
I am new to Python so any help would be appreciated. I have a web crawler using beautifulsoup. It works but for the below it returns the error 'None type object has no attribute'. I know it means it has come across a page where there is no entry. How do I stop this error and make it return all the other pages that have entries. Some of the pages in the web crawler have the entry and some are blank.
bbb = re.compile('First listed') next_s = soup.find(text=bbb).parent.parent.get_text(strip=True)
bbb = re.compile('First listed') next_s = soup.find(text=bbb) if next_s is not None: # node exists else: # node does not exists
See also questions close to this topic
Atom not recognizing webdriver
So I am trying to run a script that I made in PyCharm in Atom. In the project folder, the script exists along with a webdriver.
When writing this
driver = webdriver.Chrome()
I get this
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
I only get this on Atom. It works perfectly fine in Pycharm and Sublime Text 3.
Is there a setting in Atom I need to enable or something?
Invalid format string in Python
I get the invalid format string error when I try to run the below code (last line), not sure where I am missing the point:
import datetime DAYS = 2 SINCE = datetime.datetime.now() - datetime.timedelta(days=DAYS) params = "?fields=feed.since(" + SINCE.strftime("%s") + ").limit(1),name,updated_time&"
Any suggestions would be much appreciated !!
dlib facial recognition opencv resize python
The chin is a bit off in this photo.
Not this one.
image = cv2.resize(image,(2170, 2894), interpolation = cv2.INTER_AREA)
The second one does not have this line.
Here is the complete source.
import cv2 import sys import dlib import numpy as np from PIL import Image import rawpy # Get user supplied values imagePath = sys.argv cascPath = "HS.xml" pointOfInterestX = 200 detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor("okgood.dat") raw = rawpy.imread(imagePath) rgb = raw.postprocess() image = Image.fromarray(rgb) #image.save("WOO.jpg") open_cv_image = np.array(image) open_cv_image = open_cv_image[:, :, ::-1].copy() image = open_cv_image image = cv2.resize(image,(2170, 2894), interpolation = cv2.INTER_AREA) widthO, heightO = image.shape[:2] faceCascade = cv2.CascadeClassifier(cascPath) # Read the image #image = cv2.imread(imagePath) gray = cv2.cvtColor((image), cv2.COLOR_RGB2BGR) #height, width = image.shape[:2] # Detect faces in the image faces = faceCascade.detectMultiScale( gray, scaleFactor=1.1, minNeighbors=4, minSize=(500, 500) #flags = cv2.CV_HAAR_SCALE_IMAGE ) newdigit = 0 def test(): for l in range(y, y+h): for d in range(x, x+w): # print(image[l,d]) font = cv2.FONT_HERSHEY_SIMPLEX if all(item < 150 for item in image[l, d]): cv2.putText(image,"here",(d,l), font, .2,(255,255,255),1,cv2.LINE_AA) return l; image[l,d] = [0,0,0] ### ### put hairline 121 pixels from the top. ### def shape_to_np(shape, dtype="int"): # initialize the list of (x, y)-coordinates coords = np.zeros((68, 2), dtype=dtype) # loop over the 68 facial landmarks and convert them # to a 2-tuple of (x, y)-coordinates for i in range(0, 68): coords[i] = (shape.part(i).x, shape.part(i).y) # return the list of (x, y)-coordinates return coords two = 1 # Draw a rectangle around the faces for (x, y, w, h) in faces: print(str(len(faces))) cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2) pointOfInterestX = test() break dets = detector(image, 1) one = 0 pointOfEight = 0 for k, d in enumerate(dets): shape = predictor(image, d) shape = shape_to_np(shape) for (x, y) in shape: if one == 8: pointOfEight = y font = cv2.FONT_HERSHEY_SIMPLEX cv2.putText(image,str(one),(x,y), font, .2,(255,255,255),1,cv2.LINE_AA) one = one + 1 cv2.circle(image, (x, y), 1, (0, 0, 255), -1) # loop over the (x, y)-coordinates for the facial landmarks # and draw them on the image new_dimensionX = heightO * 631 / (pointOfEight - pointOfInterestX) new_dimensionY = widthO * 631 / (pointOfEight - pointOfInterestX) print(str(new_dimensionY)) image = cv2.resize(image,(int(new_dimensionX), int(new_dimensionY))) Rx = new_dimensionX / heightO Ry = new_dimensionY / widthO crop_img = image[int((pointOfInterestX * Rx)-121):int(new_dimensionY), 0:int(new_dimensionX-((Rx *pointOfInterestX)+121))] font = cv2.FONT_HERSHEY_SIMPLEX cv2.putText(image,"xxxx",(100,pointOfInterestX ), font, 4,(255,255,255),1,cv2.LINE_AA) cv2.imshow("Faces found", crop_img) cv2.imwrite("cropped.jpg", crop_img) cv2.waitKey(0)
Towards the top you will see the line where I resize the image to 2170,2894. Like I said, with this line absent, the chin detection is accurate. With it, it is not. I need the chin detection accurate at this resolution. Help?
Run Scrapy through Django View
So, I am working on the following project:
I am using
Djangoto develop a website that will work as a remote manager of a web crawler. To be more specific, I've created a spider with
Scrapythat downloads some PDF files from another website.
My goal is to find a way to call the spider via a
POST(I guess) request and have the crawler running through my
Djangoview. The files that are downloaded will be stored to the server that has the website running, not to the personal computer of anyone who runs the spider.
So when I log into my website and press the
Crawlbutton, the new files get downloaded into the server's file library.
I am fairly new to Django and Scrapy so I have no clue how to make them work together to achieve what I am looking for, can someone guide me to a direction?
I've seen some questions regarding running
Scrapyscripts through other
Pythonscripts but I do not understand how to connect them, where to put the
Scrapyproject files etc.
Thank you for your time, I hope I didn't confuse you!
JS generated page doesn't fully render into html when scraping
I can't seem to get my python web scraper to work with JS rendered websites that make calls to a server to fill the webpage. Take this website (https://playon.co/#/en/games-lobby) for example, if I use this script:
import sys from PyQt4.QtGui import * from PyQt4.QtCore import * from PyQt4.QtWebKit import * from lxml import html class Render(QWebPage): def __init__(self, url): self.app = QApplication(sys.argv) QWebPage.__init__(self) self.loadFinished.connect(self._loadFinished) self.mainFrame().load(QUrl(url)) self.app.exec_() def _loadFinished(self, result): self.frame = self.mainFrame() self.app.quit() url = 'https://playon.co/#/en/games-lobby' r = Render(url) result = r.frame.toHtml() print(result)
It works fine for most JS rendered website, just like any other approach, like Selenium, BeatifulSoup, etc., but all of them fail to properly render html when websites make calls to a server to populate the content of the page.
I have found one similar question on stackoverflow that seemed to tackle the same problem, but as hard as I tried I just couldn't understand the solution and adopt it in my code. It seems like a tailored solution to that specific question and I can't figure out how exactly it applies to my problem even though it seems similar.
Any help would be appreciated, thanks!
vba referencing html elements by xpath
I am a beginner to web scraping with excel vba and need some help.
I am trying to reference an element. If there was an id then I could use getElementByID but sometimes there is no id. I could use getElementByClassName but sometimes there are too many elements of the same class.
Is there some way to refer to an element by xpath?
(I can't post the actual website since there is personal info so let us say this is the html)
<!DOCTYPE html> <html> <body> <a href="https://google.com">Link</a> </body> </html>
is there something like ie.document.getElementByXPath.(/html/body/a).click? I've searched all over the web and can't seem to find anything on the topic.
Not able to fetch the required page content in Selenium
The problem is that I have this url http://10.11.0.248:9090/ and on using this I get auto redirected to http://10.11.0.248:9090/login?from=%2F. Once I have logged in, I get back this url http://10.11.0.248:9090/ (as before) (so the get function of Selenium gives back the login page) in my code i am not able to get the html content of the logged in page. in this page i want to click on a button. Any way I can get through this?
If I use http://10.11.0.248:9090/view/All/ (url after the button is pressed) even then I get the login page content.
Scraping tabular data with Python-BeautifulSoup
Can't figure out how to scrape the first table data instead of both.
<tr> <td>WheelDust </td> <td>A large puff of barely visible brown dust </td></tr>
I only want WheelDust but instead I get WheelDust and A large puff of barely visible brown dust
import requests from bs4 import BeautifulSoup r = requests.get("https://wiki.garrysmod.com/page/Effects") soup = BeautifulSoup(r.content, "html.parser") for td in soup.findAll("table"): #--print(td) for a in td.findAll("tr"): print(a.text)
I wanted to download videos in bulk from Tutsplus.com using python. I have a legit subscription
Below is the the section for downloading videos from a given course. Now the information is printed perfectly on the terminal but the videos do not get downloaded. Please help!!
def download_binary(_file_name_, _url_): print "Downloading %s " % _file_name_ response = urllib.request.urlopen(_url_) # req = urllib2.urlopen(_url_) output = open(_filename_,"wb") output.write(response.read()) CHUNK = 16 * 1024 with open(_file_name_, 'wb') as fp: while True: chunk = response.read(CHUNK) if not chunk: break fp.write(chunk)