How to pull json data from multiple http requests
I am trying to write code to pull jason data from multiple stores (The format is where just the final digit in the url changes). For example:
https://search.mobile.walmart.com/v1/products-by-code/UPC/037000754213?storeId=1 https://search.mobile.walmart.com/v1/products-by-code/UPC/037000754213?storeId=2 https://search.mobile.walmart.com/v1/products-by-code/UPC/037000754213?storeId=3
I want to pull the jason data for each store and print to file. What is the best way to go about this?
Here is what I have so far:
import requests
import json
from collections import defaultdict
from tqdm import tqdm
results = defaultdict(list)
url = "https://search.mobile.walmart.com/v1/products-by-code/UPC/037000754213?storeId={}"
query = "{}"
for store in tqdm(range(1, 5)):
r = requests.post(url, data={'searchQuery': query.format(store)})
r.raise_for_status()
try:
data = json.loads(r.json()['searchResults'])['results']['online']
results[store].append((data['price'], data['inventory']))
except (IndexError, KeyError):
continue
for store, data in results.items():
print('Store: {}'.format(store))
if data:
for inventory, price in data:
print('\t{} - {}'.format(price, inventory))
See also questions close to this topic
-
Pretty Print a JSON string in Rust
I have a String
let str = String::from_utf8(data.to_vec()).unwrap();
How do I pretty-print it with newlines and tabs as JSON?
Essentially I want to do the Rust equivalent of the JavaScript
JSON.stringify(JSON.parse(my_string), null, 4);
This is different than the existing question because I want to parse and then pretty-ify an existing String of JSON, rather than an existing struct.
-
Android - cz.msebera.android.httpclient.entity.ByteArrayEntity required: org.apache.http.HttpEntity
I am using
loopj AsyncHttpClient
to call web services. I am trying register a user. So I need to sendJSON
data to Web Service.ByteArrayEntity entity = new ByteArrayEntity(json.toString().getBytes("UTF-8")); entity.setContentEncoding(new BasicHeader(HTTP.CONTENT_TYPE, "application/json")); client.post(getApplicationContext(), "http://10.0.3.2:8080/WebService/rest/user/insert", entity, new JsonHttpResponseHandler(){
When I put cursor on the
entity
inclient.post
line it gives this error.cz.msebera.android.httpclient.entity.ByteArrayEntity required: org.apache.http.HttpEntity
Example That I am trying is also from stack-overflow - Send JSON as a POST request to server by AsyncHttpClient
Libraries that I am using
compile files('libs/android-async-http-1.4.4.jar') compile 'cz.msebera.android:httpclient:4.3.6'
Anybody can help me? Thanks in advance.
-
Couchbase Lite 2 + JsonConvert
The following code sample writes a simple object to a couchbase lite (version 2) database and reads all objects afterwards. This is what you can find in the official documentation here
This is quite a lot of manual typing since every property of every object must be transferred to the
MutableObject
.class Program { static void Main(string[] args) { Couchbase.Lite.Support.NetDesktop.Activate(); const string DbName = "MyDb"; var db = new Database(DbName); var item = new Item { Name = "test", Value = 5 }; // Serialization HERE var doc = new MutableDocument(); doc.SetString("Name", item.Name); doc.SetInt("Value", item.Value); db.Save(doc); using (var qry = QueryBuilder.Select(SelectResult.All()) .From(DataSource.Database(db))) { foreach (var result in qry.Execute()) { var resultItem = new Item { // Deserialization HERE Name = result[DbName].Dictionary.GetString("Name"), Value = result[DbName].Dictionary.GetInt("Value") }; Console.WriteLine(resultItem.Name); } } Console.ReadKey(); } class Item { public string Name { get; set; } public int Value { get; set; } } }
From my research Couchbase lite uses JsonConvert internally, so there might be a way to simplify all that with the help of JsonConvert.
Anything like:
var json = JsonConvert.SerializeObject(item); var doc = new MutableDocument(json); // No overload to provide raw JSON
or maybe
var data = JsonConvert.SerializeToDict(item); // JsonConvert does not provide this var doc = new MutableDocument(data);
Is there or is this some kind of optimization and the preferred approach is by intend?
-
Calculating entropy in ID3 log2(0) in formula
import numpy as np udacity_set = np.array( [[1,1,1,0], [1,0,1,0], [0,1,0,1], [1,0,0,1]]) label = udacity_set[:,udacity_set.shape[1]-1] fx = label.size positive = label[label == 1].shape[0] positive_probability = positive/fx negative = label[label == 0].shape[0] negative_probability = negative/fx entropy = -negative_probability*np.log2(negative_probability) - positive_probability*np.log2(positive_probability) atribute = 0 V = 1 attribute_set = udacity_set[np.where(udacity_set[:,atribute] == 1)] #selecting positive instance of occurance in attribute 14 instances = attribute_set.shape[0] negative_labels = attribute_set[np.where(attribute_set[:,attribute_set.shape[1]-1]== 0)].shape[0] positive_labels = attribute_set[np.where(attribute_set[:,attribute_set.shape[1]-1]== 1)].shape[0] p0 = negative_labels/instances p1 = positive_labels/instances entropy2 = - p0*np.log2(p0) - p1*np.log2(p1) attribute_set2 = udacity_set[np.where(udacity_set[:,atribute] == 0)] #selecting positive instance of occurance in attribute 14 instances2 = attribute_set2.shape[0] negative_labels2 = attribute_set[np.where(attribute_set2[:,attribute_set2.shape[1]-1]== 0)].shape[0] positive_labels2 = attribute_set[np.where(attribute_set2[:,attribute_set2.shape[1]-1]== 1)].shape[0] p02 = negative_labels2/instances2 p12 = positive_labels2/instances2 entropy22 = - p02*np.log2(p02) - p12*np.log2(p12)
Problem is when attribute is pure and entropy is meant to be 0. But when i put this into a formula i get NaN. I know how to code workaround, but why is this formula rigged?
-
python list to dataframe with column headers and removing data types
Hoping someone can help me out here to write an efficient code. I am fairly new to python and not the most efficient.
I have a ODBC connection to a SQL database and here is the code for that:
import pyodbc import pandas as pd import csv cnxn = pyodbc.connect("DSN=acc_DB") cursor = cnxn.cursor() cursor.execute("select top 10 * from Table_XX") rows = cursor.fetchall()
Now from here what i get is a this:
Then i put that in a dataframe and out to a csv using this code
DF = pd.DataFrame(rows) DF.to_csv('out.csv',sep=',')
The problem is:
DF is not recognizing the column names, there is just a value of 0 for columns
How do i put this in the Dataframe in a table format with the column headers and no column types?? Like i would in a SQL query i execute on MS SQL management studio?
-
Python 3 if elif else statement isn't working
After running the code if I type anything other than john or johnny, it still prints out JOHNNY!!! Why is this?
user_name = input('What is your name?') if user_name.lower() == 'john' or 'johnny': print ('JOHNNY!!!') elif user_name.lower() == 'bill' or 'billy': print ('BILLY!!!') else: print ('Hello {0}'.format(user_name))
-
How can I switch an existing Azure web-role from http over to https
I have a working Azure web role which I've been using over an http endpoint. I'm now trying to switch it over to https but struggling mightily with what I thought would be a simple operation. (I'll include a few tips here for future readers to address issues I've already come across).
I have created (for now) a self-signed certificate using the powershell commands documented by Microsoft here and uploaded it to the azure portal. I'm aware that 3rd parties won't be able to consume the API while it has a self-signed certificate but my plan is to use the following for local client testing before purchasing a 'proper' certificate.
ServicePointManager.ServerCertificateValidationCallback += (o, c, ch, er) => true;
Tip: you need upload the .pfx file and then supply the password you used in the powershell script. Don't be confused by suggestion to create a .cer file which is for completely different purposes.
I then followed the flow documented for configuring azure cloud services here although many of these operations are now done directly through visual studio rather than by hand-editing files.
In the main 'cloud service' project under the role I wanted to modify:
- I imported the newly created certificate. Tip: the design of the dialog used to add the thumbprint makes it very easy to incorrectly select the developer certificate that is already installed on your machine (by visual studio?). Click 'more options' to get to _your_ certificate and then check the displayed thumbprint matches that shown in the Azure portal in the certificates section.
- Under 'endpoints' I added a new https endpoint. Tip: use the standard https port 443, NOT the 'default' port of 8080 otherwise you will get no response from your service at all
- In the web.config of the service itself, I changed the endpoint binding for the service so that the name element matched the new endpoint.
- I then published the cloud project to Azure (using Visual Studio).
At this point, I'm not seeing the results I expected. The service is still available on http but is not available on https. When I try to browse for it on https (includeExceptionDetailInFaults is set to true) I get:
HTTP error 404 "The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable"
I interpret this as meaning that the https endpoint is available but the service itself is bound to http rather than https despite my changes to web.config.
I have verified that the publish step really is uploading the new configuration by modifying some of the returned content. (Remember this is still available on http.)
I have tried removing the 'obsolete' http endpoint but this just results in a different error:
"Could not find a base address that matches scheme http for the endpoint with binding WebHttpBinding. Registered base address schemes are [https]"
I'm sure I must be missing something simple here. Can anyone suggest what it is or tips for further trouble-shooting? There are a number of stack-overflow answers that relate to websites and suggest that IIS settings need to be tweaked but I don't see how this applies to a web-role where I don't have direct control of the server.
Edit Following Gaurav's suggestion I repeated the process using a (self-signed) certificate for our own domain rather than cloudapp.net then tried to access the service via this domain. I still see the same results; i.e. the service is available via http but not https.
Edit2 Information from csdef file... is the double reference to "Endpoint1" suspicious?
<Sites> <Site name="Web"> <Bindings> <Binding name="Endpoint1" endpointName="HttpsEndpoint" /> <Binding name="Endpoint1" endpointName="HttpEndpoint" /> </Bindings> </Site> </Sites> <Endpoints> <InputEndpoint name="HttpsEndpoint" protocol="https" port="443" certificate="backend" /> <InputEndpoint name="HttpEndpoint" protocol="http" port="80" /> </Endpoints> <Certificates> <Certificate name="backend" storeLocation="LocalMachine" storeName="My" /> </Certificates>
-
Node Restify request ESOCKETTIMEDOUT before default 2 minutes has been reached
I have a node restify server and I am mainly using it for verifying image urls are valid. I get a lot of
ESOCKETTIMEDOUT
errors for valid image urls. I was using the request module, which has a default of5000ms
for requests, which is just fine for me. But other issues led me to switch to the node https module. From what I can see in my research, I can modify the restify default of 2 minutes by doing something likeserver.server.setTimeout(300000)
.My questions are simply:
Would this even resolve my issue? I think not as the default of 2 minutes is more than enough for what I'm doing.
Is this even the incorrect setting that I am looking at? Is there something else to set the request timeout for node https requests within restify?
Here is my basic server:
const server = restify.createServer({ formatters: { 'text/html': function htmlFormatter(req, res, body) { let data = ''; if (body) { data = body.toString(); } res.setHeader('Content-Length', Buffer.byteLength(data)); return data; } } });
-
Python HTTPS Post: can not find existing url
I am currently working on a project where i have to send json object to an embedded pc, which implements Json RPC. The device is in my LAN, has an ip address and is reachable.
Like two weeks ago my code worked, but as I sat down today to go on it didn't work anymore.
I working with Visual Studio 2017. So far I tried Python 3.4 (64 bit), Python 3.6 (64 and 32bit)
Here is what I tried:
import json import urllib3 import sys import base64 import os import io import base64 import hashlib from base64 import b64encode role = "admin" passwd = "*******" service_url = "https://192.168.0.65/base_service/" http = urllib3.PoolManager( assert_hostname=False, ca_certs="cert/myCertificate.crt" ) passwd_b64 = str( b64encode( bytes( passwd, "utf8" ) ), "utf8" ) rpc_obj = {"jsonrpc": "2.0","id":1,"method":"get_auth_token","params":{"user": "admin", "pwd": passwd_b64}} at = http.urlopen( "POST", service_url, body=json.dumps( rpc_obj ) )
This should generate an token used for other functions. but at the last line I get this error:
urllib3.exceptions.MaxRetryError Nachricht = HTTPSConnectionPool(host='192.168.0.65', port=443): Max retries exceeded with url: /base_service/ (Caused by SSLError(FileNotFoundError(2, 'No such file or directory'),)) Stapelüberwachung: C:\Program Files (x86)\Entwicklung\Python\lib\site-packages\urllib3\util\retry.py:388 in "Retry.increment" C:\Program Files (x86)\Entwicklung\Python\lib\site-packages\urllib3\connectionpool.py:639 in "HTTPConnectionPool.urlopen" C:\Program Files (x86)\Entwicklung\Python\lib\site-packages\urllib3\connectionpool.py:668 in "HTTPConnectionPool.urlopen" C:\Program Files (x86)\Entwicklung\Python\lib\site-packages\urllib3\connectionpool.py:668 in "HTTPConnectionPool.urlopen" C:\Program Files (x86)\Entwicklung\Python\lib\site-packages\urllib3\connectionpool.py:668 in "HTTPConnectionPool.urlopen" C:\Program Files (x86)\Entwicklung\Python\lib\site-packages\urllib3\poolmanager.py:321 in "PoolManager.urlopen" C:\Users\UserXY\source\repos\containerController\containerController.py:20 in "<module>"
I have no idea why I get this error. I also have a java programm, which does the same, but works perfectly. Sadly the documentation of the device only has some python examples and I am not able to implement all functions in java.
Does anyone have an idea or hint?
-
How to suppress http.client exceptions logging during requests.get() request with Python 3
I am using Python3 and making requests with the following code:
try: resp = self.s.get(url, proxies=self.proxies) return resp.text except ConnectTimeout: self.logger.exception('{}: connection timeout!'.format(self.name)) except ConnectionError: self.logger.exception('{}: ConnectionError!'.format(self.name)) except RemoteDisconnected: self.logger.exception('{}: RemoteDisconnected'.format(self.name)) except MaxRetryError: self.logger.exception('{}: MaxRetryError'.format(self.name)) except Exception as e: self.logger.exception(e)
And if I get exception during session().get(url, proxies=proxies), I get the following in the log file:
2018-04-20 17:14:27 | RF5MR: ConnectionError! Traceback (most recent call last): File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked) File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request six.raise_from(e, None) File "<string>", line 2, in raise_from File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse response.begin() File "/usr/lib/python3.6/http/client.py", line 297, in begin version, status, reason = self._read_status() File "/usr/lib/python3.6/http/client.py", line 266, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/adapters.py", line 440, in send timeout=timeout File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen _stacktrace=sys.exc_info()[2]) File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.thesite.com', port=443): Max retries exceeded with url: /asdzxcqwe/2058706 (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response',))) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/scripts/this-is-my-app/fetcher.py", line 417, in make_request resp = self.s.get(url, proxies=self.proxies) File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/sessions.py", line 521, in get return self.request('GET', url, **kwargs) File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs) File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/sessions.py", line 640, in send history = [resp for resp in gen] if allow_redirects else [] File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/sessions.py", line 640, in <listcomp> history = [resp for resp in gen] if allow_redirects else [] File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/sessions.py", line 218, in resolve_redirects **adapter_kwargs File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs) File "/opt/scripts/this-is-my-app/.venv/lib/python3.6/site-packages/requests/adapters.py", line 502, in send raise ProxyError(e, request=request) requests.exceptions.ProxyError: HTTPSConnectionPool(host='www.thesite.com', port=443): Max retries exceeded with url: /asdzxcqwe/2058706 (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response',)))
But what I want to get is only that string logged:
2018-04-20 17:14:27 | RF5MR: ConnectionError!
Could you please point me where my mistake is and how could I suppress those exception texts and get only the one I want to log?
Many thanks!
-
Python memory leak (requests and threading)
My program is exhibiting memory leaks. I would like to find out where:
def function(): proxies = { 'https': proxy } session = requests.Session() session.headers.update({'User-Agent': 'user - agent'}) try: # login = session.get(url, proxies=proxies) # HERE IS WHERE MEMORY LEAKS except: # return -1 # return 0
I am unable to find the solution because I'm new to programming. But I think the problem is because of an exception.
Most of the times the exception will happen (because i use free proxies). I think the solution will be "cleaning" the exceptions... or cleaning the "login" but i don't know how to proceed.
-
How to craw pictures of pixiv?
I've got lots of pixiv picture url by using a python crawler.
And I use those code to download them:picture_name = 0 for t_pic in t_pics: t_pic_conn = session.get(t_pic, cookies=cookies) picture_name += 1 if t_pic_conn.status_code != 200: print(picture_name, t_pic_conn.status_code, t_pic_conn.reason) continue with open('./Downloads/pic-{}.jpg'.format(picture_name), 'wb') as f: f.write(t_pic_conn.content) t_pic_conn.close()
t_pics is lots of urls like https://i.pximg.net/img-original/img/2018/04/20/00/00/02/68308525_p0.jpg
And the result is 403 Forbidden.
So how can I download those pictures?