How to count duplicates based on common keyword in python

  duplicates, indexing, python, word

I’m struggling with this problem for several days and maybe I found a solution in python
but it a bit hard to accomplish.

#!/usr/bin/env python3.7

import collections
from collections import Counter


list1 = [
"*** hello",           #<----- duplicate
"*** bye",             #<----- duplicate    
"*** just me",         #<----- DUPLICATE(1)
"*** good morning",    #<----- duplicate

"just me",              #<----- not duplicate
"never story"           #<----- not duplicate

]

the substring *** is a common word so I don’t want to count also ‘just me’ and ‘never story’.
(1)There is also a ‘just me’ next to *** , that’s another duplicate becouse of ***.
so the substring *** tell us what to count and what doesn’t ( a marker).

Unfortunately I dont know what’s the substring . It is the unknown factor and it could be anything. I thought : We may recognise it by searching for common words ?

common = [word for word, word_count in Counter(list1).most_common(4)]

‘4’ is another unknown factor and to get it I have to count the occourences
so I have to search for common words based on occourences and grab every words next to

*** hello
*** bye
*** just me
*** good morning

by respecting the "positional order" and discarding the rest
if you find only the marker (***) it must return only the marker, example:

list1 = [
"*** hello",           #<----- duplicate
"***",                 #<----- returns '***'

"just me",              #<----- not duplicate
"never story"           #<----- not duplicate

]

I need also to access to marker , that’s a variable that contains the marker for later process

How to : count duplicate based on common words, access to common word string and the discarted
words ?
it sound a bit challenge, thanks

Source: Python Questions

LEAVE A COMMENT