Detecting Hebrew Characters in PHP Strings

In PHP, is there a known safe/reliable way to

  1. Detect, generically, a hebrew character that's in a string of plain english characters.
  2. Replace that character with something

I know I could, for a set of specific characters, use mb_ereg_replace to replace specific characters. However, I'm interested in being able to scan a string that might contain any hebrew character, and then replace it with things.

That is, I might have two strings like this

<?php
    $string1 = "Look at this hebrew character: חַ. Isn't it great?";
    $string2 = "Look at this other hebrew character: יַָ. It is also great?";

I want a single function that would give me the following strings

Look at this hebrew character: \texthebrew{ח}. Isn't it great?
Look at this other hebrew character: \texthebrew{י}. It is also great?

In theory I know I could scan the string for characters in the hebrew UTF-8 range and detect those, but how character encoding on strings works in PHP has always been a little hazy for me, and I'd rather use a proven/known solution if such a thing exists.

1 answer

  • answered 2017-06-17 18:54 hakre

    The mb_ereg_replace_callback function is useful in your case. The regular expression dialect has support for named properties, the Hebrew property specifically. That is Hewbrew Unicode block (IntlChar::BLOCK_CODE_HEBREW).

    All you need to do is to mask the Hebrew segments:

    mbregex_encoding('utf-8');
    var_dump(mb_ereg_replace_callback('\p{Hebrew}+', function($matches) {
        return vsprintf('\texthebrew{%s}', $matches);
    }, $subject));
    

    Output:

    string(65) "Look at this hebrew character: \texthebrew{חַ}. Isn't it great?"
    

    As the output shows, the four bytes with the two code-points are properly wrapped in one segment.

    I don't know of any other way to do that in PHP with that little code.