Python BeautifulSoup4 parsing XML and searching for attribute string value – Emalis : QA Automation and Web Development

This example was not on the BeautifulSoup4 documentation page, so I’ll put it here:

I had an XML document with a series of these tags/attributes (removed the brackets as WP did not like it):


token name="Street1" value="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeee1"/
token name="Street2" value="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeee2"/
token name="Street3" value="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeee3"/

None of the examples in the documentation were working for me when needing to use regex to search for string inside an attribute. This was the recommended method but it did not work:

souppage.find_all(name=re.compile("Street"))

There was a different note about searching for an attribute with specific text in the doc. Something like this:

souppage.find_all(attrs={"name": "Street"})

Slightly modified to include regex and it becomes:

souppage.find_all(attrs={"name": re.compile("^Street")})

A quick summary below should provide a list of tokens where the name attribute starts with Street:


import re
from bs4 import BeautifulSoup
xmlfile = open("SomeImportantXML.xml","r")
xmlfile_contents = xmlfile.read()
souppage = BeautifulSoup(xmlfile_contents,'lxml')
souppage.find_all(attrs={"name": re.compile("^Street")})