BeautifulSoup table data extraction

Issue I am having is that the data needed is not showing up when running the Python code. It is visible when I "inspect element" on Chrome but not "View Source".

My code:

import bs4 as bs
import urllib 
import urllib.request
url='https://ethplorer.io/address/0x8b353021189375591723e7384262f45709a3c3dc'
page=urllib.request.urlopen(url)
soup=bs.BeautifulSoup(page,'html.parser')

cat=0
for category in soup.findAll('td',{'class':'list-field'}):
    print(category)
    cat=cat+1

It pulls out the needed line

<td class="list-field" id="address-token-holdersCount"></td>

However it has value for it. The 2345 as below.

When I check the page using "Inspect Element". The needed part looks like this:

<table class="table">
    <tbody>
        <tr class="even last">
            <td>Holders</td>
            <td id="address-token-holdersCount" 
               class="list-field">"2345"</td>
        </tr>
    </tbody>
</table>

What do you recommend to fix this issue?

1 answer

  • answered 2018-03-22 10:33 Keyur Potdar

    As you yourself found out, the element is not present in the page source, and is loaded dynamically through an AJAX request. The urllib module (or requests) returns the page source, which is why you won't be able to get that value directly.

    Go to Developer Tools > Network > XHR and refresh the page. You'll see an AJAX request made to this url:

    https://ethplorer.io/service/service.php?data=0x8b353021189375591723e7384262f45709a3c3dc
    

    This url returns the data in the form of JSON. If you have a look at it, you can get the Holders number from it using requests module and the built-in .json() method.

    import requests
    
    r = requests.get('https://ethplorer.io/service/service.php?data=0x8b353021189375591723e7384262f45709a3c3dc')
    data = r.json()
    
    holders = data['pager']['holders']['total']
    print(holders)
    # 2346