Purpose
Develop a solution to a problem using C++ objects and Standard Template Library containers.
Overview
Create a program to generate the index for a book.
Chapters, Paragraphs, Lines, and Words
A book will consist of a text input file consisting of chapters, paragraphs, lines and words. A line of words consists of words separated by whitespace. A word is a set of letters and symbols which contains no whitespace. For the purposes of indexing, case does not matter. Punctuation marks at the end or beginning of a word must be removed. Punctuation marks to be removed are shown below.
. ? ! , ; : - " ' _ ( )
A paragraph is a series of lines with no intervening blank lines. A blank line is one without a word on it.
A chapter consists of a series of paragraphs prefixed by a paragraph containing one line which contains exactly two words. The first word will be "chapter" and the second will the number of the chapter, which could be either in arabic or roman numerals, as shown below. This line is not part of the chapter and should not be indexed.
CHAPTER 3
Chapter IV
Output
Your program should produce an alphabetical listing of words and occurrences of each within the book in the format illustrated below.
apple 3:P4L3, 3:P5L7, 2:P1L12, 9:P5L5
bakery 4:P5L13
manitoba 8:P11L9, IV:P4L1
Each output line consists of the word and comma separated list of occurrences. Each occurrence is formatted as chapter-number, colon, P, paragraph-number, L, and line-number.
Sorting
Words should be ordered alphabetically. Occurrences should be ordered first by chapter appearance order, than by paragraph, and finally by line number. However, the publishers believe that the chapter(s) in which a word appears the most number of times must be the most important occurrences of the words and therefore occurrences in those chapters must appear first in the list. Also, only the first occurrences of a word within the same paragraph should be listed.
Special Cases
A second input file will contain a list of words, one per line, which should be completely ignored. For example, the follow might be used.
is
the
of
and
A third input file will contain words which should be considered synonyms. Each synonym group will appear together on one line of the file. Occurrences of any synonym should be considered an occurrence of the first word in each synonym group. For example, the following might be given, so wherever light occurs it should be treated as bright.
bright light sunny
main central
begin start first
Design submission
The first deliverable for this project is a document describing your proposed solution. Your design document should describe all of the following:
-classes you will create, their non-trivial public methods, and their member variables
-STL containers you will use, type(s) they will instantiated on, their purpose, and why it was chosen
-global functions, their signatures, and purpose
-file/module organization, i.e., what parts will be in what files, including header files
-testing strategy, simple "edge" cases, and expected results
Your submission should be neatly formatted, written in correct English grammar and generally pleasant to read.
Execution
Once your program is built you should be able to run it using the example command shown below.
index book.txt ignore.txt synonyms.txt > result.txt
Final Submission
Your final submission should consist of a .zip file meeting the following criteria:
all required source code
small test files (<10K bytes)
build.txt, which will be renamed to build.cmd and run to build your program
tests.txt, which will be renamed to tests.cmd and run to test your program
results from running tests.cmd should be in files named resultsN.txt, where N=1,2,3,...
Test Case Naming
usr00a_name_book.txt
usr00a_name_ignore.txt
usr00a_name_synonyms.txt
usr00a_name_result.txt
Grading
15% completeness and quality of your design
10% how closely you followed your design
5% readability of your code
5% correctness of your code, as tested by your peer's test suite
5% errors detected in your peer's programs using your test suite
60% correctness of your code, as tested by my test suite