Irregular expressions

I've dictionary (word's name : word's points).

dict = {"fe": 1, "je": 2, "jee": 3, "ain": 3, "dai": 5, "ne": 2, "ai": 2}

Any character that does not create a word in the dictionary is minus 1 point

Example: fejeain - 1+2+3 = 6 feje**b**ain - 1+2-**1**+3 = 5

Here is my string

string = "feaineai"

I would like to award points for individual parts of a word. Their sum will be the result.

But there are several ways to calculate the result:

fe-ai-ne-ai = 1+2+2+2=7
fe-ain-e-ai = 1+3-1+2=5

Is there any algorithm that will help me? The algorithm should find the highest possible result.

3 answers

  • answered 2017-06-17 19:07 Damian Lattenero

    first of all, don't call your variable dict. This is how I'll do it:

    Edit

    Finally reading the answer of user @Dan D., wich is correct, I had nothing to add, just a subprocess calling the max function over all the results.

    a_dict = {"fe": 1, "je": 2, "jee": 3, "ain": 3, "dai": 5, "ne": 2, "ai": 2}
    
    
    def words_points(w):
      return max(sub_points(w))
    
    def sub_points(w):
        if not w:
            yield 0
            return
        for word in a_dict:
            if w.startswith(word):
                for v in sub_points(w[len(word):]):
                    yield a_dict[word] + v
        for v in sub_points(w[1:]):
            yield -1 + v
    
    
    
    for s in ("feaineai", "fejebain", "fejeain", "aiaiai", "aiaiaije"):
      print(words_points(s))
    

  • answered 2017-06-17 19:07 Dan D.

    Consider the following as the direct recursive solution, a tabled solution would use less time:

    a_dict = {"fe": 1, "je": 2, "jee": 3, "ain": 3, "dai": 5, "ne": 2, "ai": 2}
    
    string = "feaineai"
    
    def p(w):
        if not w:
            yield 0
            return
        for word in a_dict:
            if w.startswith(word):
                for v in p(w[len(word):]):
                    yield a_dict[word] + v
        for v in p(w[1:]):
            yield -1 + v
    
    print max(p(string))
    

    Prints 7 for feaineai and 5 for fejebain and 12 for aidaidai.

  • answered 2017-06-17 19:07 Danstahr

    This is a great example of a problem where you want to use dynamic programming. There's a lot of ways to implement the algorithm, let me sketch one below (not an optimal one).

    For each prefix length (0 to N), you can remember the best score you can achieve for that prefix. Obviously the best score for a prefix of length 0 is 0. You can initialize the other prefixes with minus their length (-1 point for each character).

    Now, you iterate over the positions and you try to match all dictionary words at that position (let's call it K). If there's a match, you update the best score at position (K + length of the matched word, let's call it L) to max(scores[L], scores[k] + word_score). The end result will be at position scores[len(string)].

    def get_score(dct, to_score):
        dp = [-i for i in xrange(len(to_score) + 2)]
        for i in xrange(len(to_score)):
            for word in dct:
                lw = len(word)
                if lw <= len(to_score) - i:
                    if word == to_score[i:i + lw]:
                        dp[i + lw] = max(dp[i+lw], dp[i] + dct[word])
            dp[i + 1] = max(dp[i + 1], dp[i] - 1)
        return dp[len(to_score)]
    
    dct = {"fe": 1, "je": 2, "jee": 3, "ain": 3, "dai": 5, "ne": 2, "ai": 2}
    
    for s in ("feaineai", "fejebain", "aidaidai"):
        print "%s -> %d" % (s, get_score(dct, s))