character ° encoding and visualization in txt file

I have a field in a table that contains the string "Address Pippo p.2 °". My program read this value and write it into txt file, but the output is:

"Address Pippo p.2 °" ( is unwanted)

I have a problem because the txt file is a positional file.

I open the file with these Java istructions: FileWriter fw = new FileWriter(file, true); pw = new PrintWriter(fw);

I want to write the string without strange characters

Any help for me ?

Thanks in advance

2 answers

  • answered 2017-10-11 10:04 Tridev Chaudhary

    Try encoding the string into UTF-8 like this,

        File file = new File("D://test.txt");
        FileWriter fw = new FileWriter(file, true);
        PrintWriter pw = new PrintWriter(fw);
        String test = "Address Pippo p.2 °";
        ByteBuffer byteBuffer = Charset.forName("UTF-8").encode(test);
        test = StandardCharsets.UTF_8.decode(byteBuffer).toString();
        pw.write(test);
        pw.close();
    

  • answered 2017-10-11 10:04 DodgyCodeException

    Java uses Unicode. When you write text to a file, it gets encoded using a particular character encoding. If you don't specify it explicitly, it will use a "system default encoding" which is whatever is configured as default for your particular JVM instance. You need to know what encoding you've used to write the file. Then you need to use the same encoding to read and display the file content. The funny characters you are seeing are probably due to writing the file using UTF-8 and then trying to read and display it in e.g. Notepad using Windows-1252 ("ANSI") encoding.

    Decide what encoding you want and stick to it for both reading and writing. To write using Windows-1252, use:

    Writer w = new OutputStreamWriter(new FileInputStream(file, true), "windows-1252");
    

    And if you write in UTF-8, then tell Notepad that you want it to read the file in UTF-8. One way to do that is to write the character '\uFEFF' (Byte Order Mark) at the beginning of the file.

    If you use UTF-8, be aware that non-ASCII characters will throw the subsequent bytes out of position. So if, for example, a telephone field must always start at byte position 200, then having a non-ASCII character in an address field before it will make the telephone field start at byte position 201 or 202. Using windows-1252 encoding you won't have this issue, but that encoding can't encode all Unicode characters.