Exercise: Input/Output

Last modified by superadmin on 2018-01-12 20:28

Exercise: Input/Output

Goal: Read lines from one file and write them to another. Change encoding of the file. Replace certain strings and patterns with other ones. 

  1. Create a new project and add a sole class FileEncoding to that project: 
package project5;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;

/**
 * Class changes file encoding from Cp1257 to UTF-8;
 * The command-line call looks like this:
 * java FileEncoding infile.txt outfile.txt
 */
public class FileEncoding {
    public static void main(String[] args) throws Exception {
        FileInputStream fis = new FileInputStream(args[0]);
        InputStreamReader isr = new InputStreamReader(fis, "Cp1257");
        BufferedReader br = new BufferedReader(isr);

        FileOutputStream fos = new FileOutputStream(args[1]);
        OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
        BufferedWriter bw = new BufferedWriter(osw);

        String line;
        while ((line = br.readLine()) != null) {
            line.replaceAll("b(ew+|lw+|pw+)","dziivnieki");
            bw.write(line, 0, line.length());
            bw.newLine();
        }

        bw.close();
        br.close();
    }
}
  1. Create an input file VisiEziMiega.txt by EditPlus or a similar text editor - paste some text in Latvian and save it as ANSI (see attached file). 
Visi eži miegā, Visi lāči sniegā,
Visas peles alās, Klusums malu malās.
  1. Save that file by selecting from EditPlus menu File -> Save As... and picking file name "VisiEziMiega", the default extension (txt), and encoding "Baltic (Windows) 1257". If this encoding is not in the list, press button [...] and add it (see picture)

Unknown macro: picture.
 

  1. Copy the file VisiEziMiega.txt to the JDeveloper workspace directory, e.g. d:/JDeveloper/mywork
  2. Run the program from the command-line. Supply two arguments - the input file name and the output file name. 

Exercise

  1. Modify the code so that all letters in the output file appear in lower-case.
  2. Modify the program so that it converts Windows Cyrillic (Cp1251) to UTF-8. Try converting some Russian text by that program. 
  3. View both input and output textfiles by the "Lister" application - select any of these files, pick F3 button in Total Commander and select from the menu Options -> Hex. This way you can see individual bytes in these text files in different encodings (see picture)
Unknown macro: picture.
Optional: This exercise can be brought even further - rewrite URL addresses - see Exercise: Proxy Servlet
Tags:
Created by Kalvis Apsītis on 2008-02-28 14:03
    
This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 6.4 - Documentation