I’m struggling with this problem for several days and maybe I found a solution in python
but it a bit hard to accomplish.
#!/usr/bin/env python3.7 import collections from collections import Counter list1 = [ "*** hello", #<----- duplicate "*** bye", #<----- duplicate "*** just me", #<----- DUPLICATE(1) "*** good morning", #<----- duplicate "just me", #<----- not duplicate "never story" #<----- not duplicate
*** is a common word so I don’t want to count also ‘just me’ and ‘never story’.
(1)There is also a ‘just me’ next to
*** , that’s another duplicate becouse of
so the substring
*** tell us what to count and what doesn’t ( a marker).
Unfortunately I dont know what’s the substring . It is the unknown factor and it could be anything. I thought : We may recognise it by searching for common words ?
common = [word for word, word_count in Counter(list1).most_common(4)]
‘4’ is another unknown factor and to get it I have to count the occourences
so I have to search for common words based on occourences and grab every words next to
*** hello *** bye *** just me *** good morning
by respecting the "positional order" and discarding the rest
if you find only the marker (
***) it must return only the marker, example:
list1 = [ "*** hello", #<----- duplicate "***", #<----- returns '***' "just me", #<----- not duplicate "never story" #<----- not duplicate
I need also to access to marker , that’s a variable that contains the marker for later process
How to : count duplicate based on common words, access to common word string and the discarted
it sound a bit challenge, thanks
Source: Python Questions