Use writeUTF and readUTF for http requests in Java

This is a a Java method that tries to crawl a designated web page. I am using writeUTF and readUTF for socket communications to a server.

static void get_html(String host, String page, int port) throws IOException {
        Socket sock = new Socket(host, port);
        String msg = MessageFormat.format("GET {0} HTTP/1.1\r\nHost: {1}\r\n\r\n", page, host);

        DataOutputStream outToServer = new DataOutputStream(sock.getOutputStream());
        DataInputStream inFromServer = new DataInputStream(sock.getInputStream());

        InputStream stream = new ByteArrayInputStream(msg.getBytes(StandardCharsets.UTF_8));
        BufferedReader buf = new BufferedReader(new InputStreamReader(stream));
        String outMsg;

        while ((outMsg = buf.readLine()) != null) {
            System.out.println("Sending message: " + outMsg);

            String inMsg;
            try {
                inMsg = inFromServer.readUTF();
            } catch (EOFException eof) {

The reason I am writing it this way was to mimic the c code, where you have a while loop of send() making all deliveries from a buffer, and another while loop of recv() from a buffer untill it hits 'null'. When execute my code, it just hangs there, I suspect that is due to a call of readUTF before I finished sending all my messages. If this is the case, is there any way to fix it?

1 answer

  • answered 2018-04-17 05:07 EJP

    You can't do this. HTTP is defined as text lines. writeUTF() does not write text, it writes a special format starting with a 16-bit binary length word. Similarly the HTTP server won't reply with that format into your readUTF() call. See the Javadoc.

    You have to use binary streams and the write() method, with \r\n as the line terminator. Depending on the output format you may or may not be able to use readLine(). Best not, then you don't have to write two pieces of code: use binary streams again.

    In fact you should throw it all away and use HttpURLConnection. Implementing HTTP is not as simple as may hastily be supposed.