Why is the or operator "|" not working in this Python regex expression?

  nltk, nsregularexpression, python, re

I have been working on this problem for a couple of weeks.

My regex expression works for either the "NN" (noun) or "JJ" (adjective) tags in the example below. But I want to capture both nouns and adjectives in one expression using the "|" operator and it does not work. I can run the expressions separately and merge the lists, but its not an efficient approach. To be more specific, this lookbehind captures the word ‘custard’ tagged as a noun in the chunked data below.

Regex: (w+(?=/NN.?))

Chunk: (S (NP custard/NN) (NP puerto/JJ rican/JJ style/NN))

If I add the adjective JJ tag using the or "|" operator, however, it still captures the noun "custard" but not the adjective "Puerto Rican." If I replace NN with JJ, it captures the adjectives marked JJ but not the nouns.

This issue is true no matter what chunk I am using. The same is true for the word danish also marked also tagged as an adjective and proper noun like Puerto Rican. It is true for other adjectives that are not also nouns.

Regex: (w+(?=/NN.?|JJ)). Does not work. I’ve tried variations, but no luck.

Here is the problem as shown in the compiler with more data:

https://regex101.com/r/lGb8IT/1

Thank you in advance for any solutions or pointers.

Source: Python Questions

LEAVE A COMMENT