Appy regular expresions in ElementTree item

  elementtree, parsing, perl, python, wikipedia

I am reading the content of xml files that contains comparable wikipedia articles using

context = ET.iterparse(file_name, events=("start", "end"))
context = iter(context)

# get the root element
ev, root = next(context)

for event, elem in context:
    # get the root element
    if event == "start" and elem.tag == 'articlePair':
        children = elem.getchildren()
        article_lang_1 = ET.tostring(children[0], encoding='utf-8', method='html').decode('utf-8', errors='ignore')
        article_lang_2 = ET.tostring(children[1], encoding='utf-8', method='html').decode('utf-8', errors='ignore')

        

The variable article_lang contains an article in a specific language along with links, math, tables, etc. Here, is a script in Perl language, xml2txt.pl, that removes information related to links, math, tables, etc using regular expressions. Could you please suggest a way to apply these regular expressions in the articles contained in variable article_lang?

Source: Python Questions

LEAVE A COMMENT