Write a program that counts the words in a text entered from the console. The program must output the total number of words, the number of words written in uppercase and the number of words written in lowercase. If a word appears more than once in the text, each repetition counts as a new occurrence. Every character that is not a letter counts as a word separator.
Sample input:
1 2 3 | Welcome to your first programming exam! Can you think of a solution to this problem and write it down? GOOD LUCK! |
Sample output:
1 2 3 4 5 6 7 | Word count: 21 Upper case words: 2 Lower case words: 17 |
Coming Up with an Appropriate Idea for a Solution
Intuitively, it comes to mind, that the problem may be solved by splitting the text up into separate words and counting those that meet the specified conditions.
Obviously, this approach is correct, but far too general, and it doesn’t lead to a particular method for solving the problem. Let’s try to be more specific, and see if by doing so, we could implement an algorithm that will lead to a solution. It might turn out that the implementation is difficult, or that the complexity of the solution is too great for the program to complete its execution even with today’s powerful computers. If that is the case, we would have to find another solution to the problem.
Breaking Down the Problem into Subproblems
A useful approach for solving algorithmic problems is to try breaking them down into smaller problems that are easier and quicker to solve. Let’s try defining the necessary steps for solving this problem.
First of all, we have to split the text up into separate words. This, in and of itself, is not a simple task, but it is the first step towards breaking down the problem into smaller, although still complicated, subproblems.
Then we need to count the words that concern us. This is the second major problem we have to solve. Let’s take a look at both problems separately and try breaking them down even further.
How Do We Split the Text Up into Separate Words?
In order to split the text up into separate words, we need to find a way to identify them first. According to the problem specifications every non-letter character functions as a word separator. Therefore, we must first identify these separators and use them to split the text in tokens.
So far, we have formulated two subproblems – finding the separators and partitioning (splitting) the text in accordance with the characters found. We can implement their solutions right away. This was in fact our goal from the start – breaking down complicated problems into smaller and easier subproblems.
In order to find the separators, all we need to do is iterate through all characters and extract those that aren’t letters.
Once we have identified the separators, we can implement the text partitioning by invoking the Split(…) method of the String class.
How Do We Count the Words?
Let’s assume we already have a list of all words from the text. We want to find the total word count, the number of words in uppercase and the number of words in lowercase.
To do this, we can go through each and every word from the list and check if it meets either of the necessary conditions. At each step we increment the total word count. We check if the current word is in uppercase and, if so, we increment the number of words in uppercase. Likewise, we check if the word consists only of lowercase letters and increment the lowercase word counter.
Thus, we have defined another two subproblems – recognizing uppercase and lowercase words. These appear to be very easy. It might even turn out that the string class provides such functionality. After we check, it turns out this is not the case. Yet we notice that there are methods that allow us to convert a string to an uppercase or a lowercase string. This might be of use.
To check if a word consists only of uppercase letters, all we have to do is compare it to the string resulting after converting the word to uppercase. If the two are equal, then the comparison returns true. Performing the check for lowercase words is done likewise.
Verifying the Idea
It seems our idea is a good one. We’ve broken down the problem into subproblems and we know how to solve each of them. Should we continue towards the implementation? Haven’t we overlooked something?
Shouldn’t we have verified the idea by writing down a few examples on paper? Perhaps we would come across something we have missed. We could start with the example given in the problem statement:
1 2 3 | Welcome to your first programming exam! Can you think of a solution to this problem and write it down? GOOD LUCK! |
The separators would be: spaces, ? and !. The words that have come up are the following: Welcome, to, your, first, programming, exam, Can, you, think, of, a, solution, to, this, problem, and, write, it, down, GOOD, LUCK.
Counting the words we acquire the correct result. It seems the idea is adequate, and it works. Now we can proceed towards implementing it. We will do this step by step and at each step we will implement one subproblem.
Let’s Consider the Data Structures
The problem is simple and doesn’t need complex data structures.
We can use the char data type for storing each separator. During the process of finding the separator characters we add each of them to a list. We can use either char[] or List<char>. In this case, we will choose the latter.
As for the words in the text, we can use an array of strings string[] or List<string>.
Let’s Consider the Efficiency
Are there any performance requirements? How long can the text be?
Since the text will be entered from the console, it’s unlikely to be very long. No one is going to type 1MB of text into the console. We can assume that the solution’s performance is not critical.
Let’s Write Down the Solution
It’s very good practice to write the solution down on a piece of paper before typing it on the computer. This helps uncover drawbacks in our idea or implementation beforehand. In addition, implementing the solution will be considerably quicker, because of the outlines we can provide and because we would then have a better grasp of both the problem and the solution.
Step 1: Finding the Separators in the Text
We will define a method that extracts all non-letter characters from the text and return them as an array of characters. Then we will use that array for splitting the text up into separate words. We will use List<char> to keep the separators we find when passing through the text:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | private static char[] ExtractSeparators(string text) { List<char> separators = new List<char>(); foreach (char character in text) { // If the character is not a letter, // then by definition it is a separator if (!char.IsLetter(character)) { separators.Add(character); } } return separators.ToArray(); } |
We use a loop to iterate through all of the characters in the text. We check if the current character is a letter by invoking the IsLetter() method of the primitive data type char. If it’s not, we add the character to the separators. Finally, our method returns an array containing the separators.
Testing the ExtractSeparators(…) Method
Before we go any further, it’s advisable to test if extracting the separators is working correctly. For this purpose, we will write two additional methods. The first of these is TestExtractSeparators() which will test the execution of ExtractSeparators(…) and the second – GetTestData() – will return different texts, which will allow us to test our solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | private static List<string> GetTestData() { List<string> testData = new List<string>(); testData.Add("This is wonderful!!! All separators like " + "these ,.(? and these /* are recognized. It works."); testData.Add("SingleWord"); testData.Add(string.Empty); testData.Add(">?!>?#@?"); return testData; } static void Main() { TestExtractSeparators(); } |
We start the program and check if the separators have been correctly identified. The first test’s result is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | Test Case: This is wonderful!!! All separators like these ,.(? and these /* are recognized. It works. Result: ! ! ! , . ( ? / * . . Test Case: SingleWord Result: Test Case: Result: Test Case: >?!>?#@? Result: > ? ! > ? # @ ? |
We might think of the above output as partially correct. In fact it does extract correctly the separators between the words but most of them are duplicated several times. We need all the separators without duplications, right?
Correcting the ExtractSeparators(…) Method
To correct the method for extracting the separators between the words in the text, we can use a different data structure to keep them. We know that sets keep elements without duplications. So we could use HashSet<char> instead of List<char> to hold the separator characters we find in the text:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | private static char[] ExtractSeparators(string text) { HashSet<char> separators = new HashSet<char>(); foreach (char character in text) { // If the character is not a letter, // then by definition it is a separator if (!char.IsLetter(character)) { separators.Add(character); } } return separators.ToArray(); } |
The code is almost the same, but we use a set instead of list to avoid duplicated separators. We might need to include the System.Linq namespace in the start of the program to use the ToArray() extension method for converting a hash set to an array.
Testing Again after the Fix
We test the above method with the same testing code and we find it now works correctly. The separators are extracted correctly with no duplicates:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | Test Case: This is wonderful!!! All separators like these ,.(? and these /* are recognized. It works. Result: ! , . ( ? / * Test Case: SingleWord Result: Test Case: Result: Test Case: >?!>?#@? Result: > ? ! # @ |
We test also with some borderline cases – text consisting of a single word without separators; text consisting of separators only; an empty string. We’ve already included such tests in our GetTestData() method.