Given a string of text, how can I find duplicate words separated by a comma?

  • Page Owner: Not Set
  • Last Reviewed: 2020-02-25

I really, really have a very, very bad tendency to over, over emphasize words by repeating them, separated by a comma.

How can I find all these?

Given a body of text like this:

I like this very, very much. But that other thing is super, super bad.

I would like to extract the following text snippets.

very, very
super, super

Anytime any word is repeated twice, separated by a comma.


Answer

Use regex to match a word ahead of a comma, then use a capture group result to check if you have the same word repeated

(.+)\W+(\1)

https://regex101.com/r/FuuYtP/1

Comments

  • This worked: (.+),\s+(\1). Your version picked up any two characters that spanned a non-word, so if I wrote "but that" it picked up t t.I need to test some more edge cases.
  • Here's the final: \W(.+),\s+(\1)\W. This catches the space on the other side of each of the words.

Additional Posts

You would have to test this more but it solves the test case at least.

void Main()
{
	var foo = "I like this very, very much. But that other thing is super, super bad.";
	
	var bar = foo.Split(',');
	var list = new List<string>();
	
	var last = "";
	foreach(var thing in bar)
	{
		var foobar = thing.Split(' ');
		
		var veryFoobar = foobar.Where(x => !string.IsNullOrEmpty(x)).ToList();
		
		if(veryFoobar.First() == last)
		{
			list.Add(string.Format("{0}, {1}", last, veryFoobar.First()));
		}
		last = foobar.Last();
		
		
	}
	
	list.Dump();
}

Comments

  • A more robust test string:
  • Also worked. Across 30,000 words of text, it found the same single instance that @JohnPavek's solution did.
  • @DeaneBarker Please upvote my answer so  I can get more karma
  • I have deep concerns about the usage of foobar and veryFoobar as variable names.
  • @DeaneBarker I'll use this next time: http://sph.mn/dynamic/svn