













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Assignment; Class: LAB: Data Structures; Subject: Computer Science; University: Wellesley College; Term: Fall 2002;
Typology: Assignments
1 / 21
This page cannot be seen from the preview
Don't miss anything!
CS230 Data Structures Handout # 5 Prof. Lyn Turbak Sunday, September 15, 2002 Wellesley College
Reading:
Overview: In this problem set, you will implement methods that determine various statistics for text files. Along the way, you will get experience with Java characters, strings, string buffers, arrays, and enumerations, as well as with writing, testing, and debugging classes written from scratch. Each problem has a required part and a completely optional extra credit portion (which you may complete for extra credit points).
Download: You should download a copy of the directory ~cs230/download/TextStats to begin this assignment. This directory contains implementations of the classes whose contracts are de- scribed in the appendix. In your local copy of this directory, you will create a class TextStats that contains your code for Problems 1, 2, and 3. For Problem 4, you will either create a class MyStringWords or MyFileWords.
Submission:
Remember to include a signed cover sheet (found at the end of this problem set description) at the beginning of your hardcopy submission. Your softcopy submission should be your entire TextStats directory.
Problem 1 [25]: Word Count
Background Linux has a “word count” command, wc, that reports the number of lines, words, and characters in a file. For example, suppose that:
"He said ’Hello?’, but I said ’Goodbye!’", she said.///
Figure 1: The contents of tricky.txt.
I am Sam I am Sam Sam I am
That Sam-I-am! That Sam-I-am! I do not like that Sam-I-am!
Do you like green eggs and ham?
I do not like them, Sam-I-am. I do not like green eggs and ham.
Figure 2: The contents of initial.txt.
Here is the result of invoking the wc command on these three files:^1
$ wc tricky.txt 3 9 59 tricky.txt $ wc initial.txt 16 40 185 initial.txt $ wc green.txt 193 786 3465 green.txt (^1) Assume that $ is the Linux prompt, user input is in regular teletype font, and system output is in slanted font.
TextStats.java:6: cannot resolve symbol symbol : class Enumeration location: class TextStats Enumeration lines = new FileLines(filename); ^
String s = (String) (enum.nextElement());
Extra Credit [10] For extra credit, modify my_wc so that it prints its results using the same format as the Linux wc command (in which the three numbers are right justified in columns).
Problem 2 [25]: Character Frequency In this problem, your task is to write a TextStats class method named charFreq that takes a filename as its single argument and displays the frequency (number of occurrences) for each char- acter appearing at least once in the file. The characters and their frequencies should be displayed in ASCII order, one per line, in the format
’char’:freq
where char is a character representation and freq is the number of occurrences of the character in the file. The following char representations should be used for the special characters they denote: \t, \n, \r, \’, ", and \. You should arrange that the main method of TextStats, when invoked with the two ar- guments charFreq and filename, should call your charFreq method on filename. For exam- ple, the result of java TextStats charFreq tricky.txt is shown in Fig. 3 and the result of java TextStats charFreq initial.txt is shown in Fig. 4.
Notes:
Extra Credit [20] For extra credit, modify charFreq so that it prints character/frequency lines sorted by frequency (from highest to lowest) rather than by ASCII value of the character. Characters with the same frequency should still be sorted by ASCII value of the character.
Problem 3 [25]: Word Frequency In this problem, your task is to write a TextStats class method named wordFreq that takes a filename as its single argument and displays the frequency (number of occurrences) for the lowercase version of each word appearing in a file. The words and their frequencies should be displayed in alphabetical order, one per line, in the format
word:freq
where word is the word (consisting only of lowercase letters, digits, and special characters) and freq is the number of occurrences of each word the file. You should arrange that the main method of TextStats, when invoked with the two arguments wordFreq and filename, should call your wordFreq method on filename. For example, the results of java TextStats wordFreq for tricky.txt, initial.txt, and green.txt are show in Figs. 6–7.
Notes:
to be to be or to be not or to be not or to to be be not or to to
(^2) As we shall see later, this strategy is rather inefficient but there are much more efficient ways to solve the problem.
Problem 4 [25]: Enumerations In the previous problems, you have “worn the user hat” when using implementations of the Enumeration interface to process strings and files. In this problem, you will have a chance to “wear the implementer’s hat” by implementing (from scratch) a class implementing the Enumeration interface. The TextStats directory you downloaded contains the compiled Java files StringWords.class and FileWords.class but it does not contain the source (.java files for the StringWords or FileWords classes. Your task in this problem is to implement one of these classes – you get to choose which one. To avoid confusion, you should name your files MyStringWords.java and MyFileWords.java and name your classes MyStringWords and MyFileWords. This way, when you compile this files, you will not overwrite the existing files StringWords.class and FileWords.class.
Notes applicable to both MyStringWords and MyFileWords:
((i < a.length) && (x > a[i])),
in which the a[i] in the second expression (if it were evaluated) would signal an array-out- of-bounds exception in the case where (i < a.length) is false.
Notes on MyStringWords:
Let s be the string being processed and i be the current index. Then at the exit of the constructor method and at the entry and exit of every call to hasMoreElements() and nextElement(), one of the following two conditions should be true:
$ java MyFileWords tricky.txt He said Hello but I said Goodbye she said Total number of elements: 9
Figure 9: An example invocation of the main method of MyFileWords
In CS230, we will be using several “home-brew” enumeration classes to manipulate files and strings. These classes are documented below.
The EnumTest class provides a single class method:
public static void test (Enumeration e); Enumerates all the elements from e and displays the string representation of each element, one element per line. For a finite enumeration, after all elements have been enumerated, displays the number of elements that were enumerated.
For example, suppose that StringChars is a class enumerating all the characters from a string (see Sec. A.5). Then executing the statement
EnumTest.test(new StringChars("Hi, how are you?"));
yields the following output:
H i , h o w a r e y o u?
Total number of elements: 16
The FileLines class is an implementation of the Enumeration interface that yields the lines of a given file one by one. Each line includes the terminating newline character, if present. In addition to the hasMoreElements and nextElement instance method required by implementations of Enumeration, FileLines supports the following constructor method and main testing method:
public FileLines (String filename); Create an enumeration that enumerates the lines of the file named by filename.
public static void main (String [] args); Invoke EnumTest.test on new FileLines(args[0]).
For example, suppose tricky.txt is the file described in Problem 1. Then here is an invocation of the main method of FileLines on tricky.txt:
$ java FileLines tricky.txt "He said ’Hello?’,
but I said ’Goodbye!’",
she said.
Total number of elements: 3
In the above example, each pair of lines is separated by a blank line because each line includes a terminating newline in addition to the newline introduced for each line by EnumTest.test.
The FileWords class is an implementation of the Enumeration interface that yields the words of a given file one by one. See Problem 1 for a definition of “word”. In addition to the hasMoreElements and nextElement instance method required by implementations of Enumeration, FileWords sup- ports the following constructor method and main testing method:
public FileWords (String filename); Create an enumeration that enumerates the words of the file named by filename.
public static void main (String [] args); Invoke EnumTest.test on new FileWords(args[0]).
For example, suppose tricky.txt is the file described in Problem 1. Then here is an invocation of the main method of FileWords on tricky.txt:
$ java FileWords tricky.txt He said Hello but I said Goodbye she said Total number of elements: 9
The StringWords class is an implementation of the Enumeration interface that yields the words of a given string one by one. See Problem 1 for a definition of “word”. In addition to the hasMoreElements and nextElement instance method required by implementations of Enumeration, StringWords supports the following constructor method and main testing method:
public StringWords (String s); Create an enumeration that enumerates the words of s.
public static void main (String [] args); Invoke EnumTest.test on new StringWords(args[0]).
For example:
$ java StringWords "She asked, "Did you pay $19.99 for ’Pocohantas’?"" She asked Did you pay
for Pocohantas Total number of elements: 8