MIT OpenCourseWarehttp://ocw.mit.edu 6.005 Elements of Software Construction Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.005elements ofsoftwareconstruction designing stream processors Daniel Jackson This lecture is the last in the series on state machines. It takes a dierent view --how to implement a state machine that processes a stream of events, using the control structure of code. In contrast to the machine implementations based on the state and transition structure, this approach uses the implicit state in the program counter, defined by the nested structure of loops, conditions, etc. The two important lessons in the lecture are (a) the idea of grammars and regular expressions and how to use them to model trace sets, and (b) the JSP method for synthesizing code from grammars. designing a quizzer designing a quizzer want a console program that reads a file of questions displays them one by one accepts user responses prompts again if out of bounds reports a total and message © Daniel Jackson 2008 What is the capital of France? (a) Paris (b) London (c) Brussels e Must give answer between a and c. Try again. b Which of these bodies of water is the largest? (a) Pacific Ocean (b) North Sea (c) Walden Pond a Which of these is not a continent? (a) Asia (b) Africa (c) North America (d) Finland d Which city is at approximately the same latitude as Boston? (a) Rio de Janeiro (b) Rome (c) Amsterdam c What is a geocache? (a) A form of payment that geologists use (b) A place that things are hidden in a treasure-hunting game (c) A memory used to accelerate the performance of GPS devices b You scored 3 points. You're obviously a real globe trotter. 3 A simple program but not a trivial one --will illustrate many of the issues involved in reading streams. There will be two streams to consider: the stream of inputs from the user when actually running the quiz, and the stream of records read from the file template for the quiz. The first one is simpler, so we start with that. a state machine design what’s right with this? it captures the correct traces what’s wrong with this? it doesn’t show any structure displayQuestion DISP ERRWAIT goodAnswer hard to see that you can loop on a question badAnswer displayErrMsg 4© Daniel Jackson 2008 How do we model the behavior of the quizzer with a state machine? We can do it, but it fails to show the structure inherent in the problem. There’s no indication here that the repeated handling of bad answers is done once per question --that the bad answer handling is nested inside the question handling. another attemptusing Statecharts notation can show repeat attempts inside the state corresponding to one question but this can get clumsy DISP WAIT displayQuestion goodAnswer badAnswer displayErrMsg ERR 5© Daniel Jackson 2008 We can use the syntactic extensions of Statecharts to introduce some structure, and now we can see that the bad answer handling is a sort of subroutine. But this is not a good approach in general. First, there isn’t a convention for how to do this systematically --to prevent arbitrary exits out of the ‘subroutine’ for example --and it can easily become a mess. Second, it doesn’t scale very well. Third, the structure is still somewhat obscured. We need to observer that there’s a cyclic path to see that there’s an iteration going on here. what’s going on? consider a trace such as < displayQuestion, badAnswer, displayErrMsg, badAnswer, displayQuestion DISP ERRWAIT goodAnswerdisplayErrMsg, goodAnswer, displayQuestion, goodAnswer, badAnswer displayErrMsgdisplayQuestion, badAnswer, displayErrMsg, goodAnswer > 6© Daniel Jackson 2008 Take a step back and look at just the trace; this is where we find the structure. The states are just an artifact. hierarchical structure we can structure it like this: 7© Daniel Jackson 2008 We can group the trace into subtraces, hierarchically. So we have a subtrace for each question, and that’s divided into a subtrace for displaying the question, one for handling bad answers, and one for accepting the good answer. as a grammargrammar defines set of traces Session ::= Question* Question ::= displayQuestion BadAnswers goodAnswer BadAnswers ::= (badAnswer displayErrMsg)* easily converted to code very roughly: void Session() {while (...) Question();} void Question() {displayQuestion(); BadAnswers(); goodAnswer} void BadAnswers() {while (...) {badAnswer; displayErrMsg} 8© Daniel Jackson 2008 This hierarchical structure can be expressed directly as a grammar. This is not only a more direct and clear representation of the structure than the state machine diagram, but it can be transcribed straightforwardly into structured code. We’ll see how to do this in today’s lecture. A state machine diagram can be transcribed easily into code too, but in general it will be code with arbitrary gotos lacking structure, so that approach is generally only useful for autogenerated code, when the developer never has to look at the code but can do all development work at the diagram level. regular grammars This section of the lecture is a brief introduction to the idea of regular grammars and regular expressions, from a software engineering perspective. Some comments at the end of the lecture about the theoretical connection between grammars and state machines. grammars sentences a grammar defines a set of sentences a sentence is a sequence of symbols or terminals productions, terminals and non-terminals a grammar is a set of productions each production defines a non-terminal a non-terminal is a variable that stands for a set of sentences 10© Daniel Jackson 2008 Grammars can be used for more than describing sequences of events, so we talk in general about sentences rather than traces. examplegrammar URL ::= Protocol ://Address Address ::= Domain . TLD Protocol ::= http | ftp Domain ::= mit | apple | pbs TLD ::= com | edu | org terminals are ://, ., http, ftp, mit, apple, pbs, com, edu, org non-terminals and their meanings TLD = { com, edu, org } Domain = { mit, apple, pbs } Protocol = { http, ftp } Address = { mit.com, mit.edu, mit.org, apple.com, apple.edu, apple.org, pbs.com, pbs.edu, pbs.org} URL = { operatorsproduction has form non-terminal ::= expression in terminals and non-terminals expression operators sequence: an A is a B followed by a C A ::= B C iteration: an A is zero or more B’s A ::= B* choice: an A is a B or a C A ::= B | C option: an A is a B or is empty A ::= B ? 12© Daniel Jackson 2008 These are the regular expression operators. The symbol + is often used for an iteration of one or more; B+ is short for BB*. examplesa two-way switch SWITCH ::= (up down)* a Java identifier Identifier ::= Letter (Letter | Digit)*Letter ::= a | b | ... | ZDigit ::= 0 | 1 | ... | 9file handling protocol FILE ::= open (read | write)* close? trailing whitespace TRAIL ::= (space | tab)* newline 13© Daniel Jackson 2008 SWITCH is an example of a grammar describing a trace set. IDENTIFIER and TRAIL illustrate the use of grammars for things that aren’t about events. The FILE example illustrates a typical feature of APIs: that the methods must be invoked in some order --in this case, that you can’t read or write until the file has been opened. Closing the file is optional (for correctness, but if you open a lot of files and don’t close them, you’ll waste resources and damage performance). JSP form“JSP form” no ‘mixed productions’ each is sequence, iteration, choice or option has nice diagrammatic form good basis for code synthesis example SWITCH ::= TOGGLE* //SWITCH is an iterationTOGGLE ::= up down //TOGGLE is a sequenceSWITCH TOGGLE* up down 14© Daniel Jackson 2008 The regular expression operators are technically all you need to describe a regular language: you just write a single expression for the language. That is, one production is enough. So why would you introduce more non-terminals and productions? We had an interesting discussion about this in class. The main reason is the same reason we use procedures in code: if the single expression would contain multiple occurrences of the same subexpression, we can introduce a non-terminal to factor that out, and thereby make it possible to change that subexpression in just one place. Someone else pointed out that sometimes a non-terminal represents a choice amongst a very large set of terminals, and in that case, it’s easier just to leave the production for that non-terminal out and define it informally. In programming language grammars, for example, you’ll find non-terminals for alphanumeric characters, and it’s easier to define what that means in English than to actually list them all: ALPHANUMERIC ::= a | b | ... So, in general, when using grammars for modeling, you have to make good judgments about where to introduce non-terminals in just the same way that you decide when to introduce procedures in a program. But when grammars are being used in a more constructive way to generate code directly, it’s appropriate to use a more prescriptive rule. The rule that JSP, the method I’m explaining today, uses, is that every production must be either a sequence, an iteration, a choice or an option. This leads to an easily understandable diagrammatic form in which each box in the diagram has a single type --it’s either one of these non-terminal types, or a terminal. Tools that generate parsers from grammars also impose rules about the exact forms that productions must take. diagram syntax for grammarsnotation also called“structure diagram” “entity life history” ABCA B* AB0C0A B0 A ::= B C A ::= B* A ::= B | C A ::= B ? 15© Daniel Jackson 2008 Here’s the repertoire of operators shown in diagrammatic JSP notation. Note that the same superscript 0 symbol is used for choice and option. In the choice, choosing the empty sequence is not one of the possibilities though. A common variant of iteration uses + in place of * to mean “one or more” rather than “zero or more”. how to write code to read a stream Now we’ll see how to synthesize code directly from the grammar, first on the simpler example, with a bit of hand-waving and using pseudocode statements, then more carefully for the second example. basic recipebasic idea structure code around grammar of input events steps draw input grammar in JSP form construct list of statements for reading inputs, updating state, generating outputs assign operations to grammar components write conditions read code off grammar sequence becomes statements in sequenceiteration becomes while loopchoice becomes if-then-else17© Daniel Jackson 2008 This is the basic JSP recipe. The full method handles a much more general case, in which there are multiple streams being read and written, and you need to reconcile their structures. In many cases though --like this one --the output stream has a very simple relationship to the input stream and writes can just be embedded in the structure of the input stream. example: quizzerSESSION QUESTION* BADANSWERS BADANSWER* diagrammatic grammar display good Question Answer textual grammarSession ::= Question*Question ::= displayQuestion BadAnswers goodAnswerBadAnswers ::= BadAnswer* BadAnswer ::= badAnswer displayErrMsg bad display Answer ErrMsg © Daniel Jackson 2008 18 Here’s the grammar for the input stream of the quizzer, shown in JSP diagram form, and as a textual grammar (for comparison --you don’t need that for the JSP method). assigning operationsSESSION QUESTION* BADANSWERS good Answer display Question BADANSWER* 35 62 1 read line 2 initialize score 3 incr score 4 display error msg 5 display question 6 display score bad display Answer ErrMsg 4 19© Daniel Jackson 2008 Here’s a list of basic operations: reading a line of input, initializing and incrementing the score, and the various display statements. To assign them to the diagram, you just ask the question “once per what?”. For example, incrementing the score is done once per good answer, so we assign that operation, (3), to the goodAnswer component. assigning reads only tricky part is where to put the reads often need lookahead to evaluate conditions usually one lookahead is enough in this case read after display read again after bad answer 1 read line 2 initialize score 3 incr score 6 display score 4 display error msg 5 display question SESSION QUESTION* BADANSWERS good Answer display Question BADANSWER* bad Answer display ErrMsg 3 4 5 6 1 1 2 20© Daniel Jackson 2008 Reads are tricky, because you can’t just assign them logically. You need to figure out how you’re going to be able to evaluate the conditions (on loops and choices). In this case, for example, you need to determine whether there’s another bad answer -and obviously you can’t determine that until you’ve already read the answer to see if it’s bad. Also you have to be careful when there are outputs also to order the inputs appropriately with respect to the outputs --in this case, we have to do the reading for new input (1) after the displaying of an error message (4). SESSION QUESTION* BADANSWERS good Answer display Question BADANSWER* bad display 24 5 1 conditions two conditions needed for termination of QUESTION* while more questions in quiz for termination of BADANSWER* now just transcribe into code follow the structure: while (more questions) { //displayQuestion while (answer is bad) //badAnswer //displayErrMsg Answer ErrMsg } //goodAnswer } 31 21© Daniel Jackson 2008 Having assigned the operations, you write down the conditions for loops and choices --in this case, just loops. Now we can just transcribe directly to code, with the structure following the structure of the diagram. Check that you understand how the code on the left is related to the diagram on the right, and see the next slide for the code fleshed out in full. � } � } codeprivate static void runQuiz(Quiz quiz) throws IOException { � � for (Question question: quiz.getQuestions()) { � � � System.out.println (question);� � � � //display question � � � readLine();�� � � � � � � � //read line � � � int maxIndex = 'a' + question.getNumOptions() -1; � � � while (!isEOF() && badResponse(maxIndex)) { � � � � System.out.println� � � � � � //display error message � � � � ("Answer must be between a and " + (char)maxIndex); � � � � readLine();�� � � � � � � //read line ���}�� � � � � char choice = response.charAt(0);�� � � //increment � � � score += question.getScore(choice);� � � //... score } System.out.println (quiz.interpretScore(score));� //display score � private static boolean badResponse (int maxIndex) { � � return response.length() != 1 � � � � || response.charAt(0) < 'a' � � � � || response.charAt(0) > maxIndex; 22© Daniel Jackson 2008 designing a file format:a more detailed exampleNow we consider the other stream processing problem: reading in the quiz template. This is more complicated in terms of the structure (but simpler in one respect --there’s are no output events to consider, so we don’t need to worry about reading too early). designing operationswhere do the operations come from? from datatypes we invent more on this later in the course considerations what objects do we need to construct? Quiz: the quiz object derived from the fileQuestion: one of these for each questionOption: one for each option of each questionScorer: for interpreting the total scorewhat observations do we need to make about them? 24© Daniel Jackson 2008 The operation list that I showed earlier came out of thin air a bit. Here I’ll explain more prescriptively how you construct it. A good place to start is to think about what abstract datatypes you need, and what their operations should. We’ll be covering this in detail in the second third of the course. The way to start thinking about the datatypes is to ask first what kind of objects we need to construct (questions, options, etc), and then what kinds of observations we’ll need to make of them. origins of operationsfor construction of objects Quiz from Questions --> Quiz.new Question from text and Options --> Question.new Option from text and value --> Option.new Scorer from range and message --> Scorer.new, Scorer.fromRange for observation of objects from Quiz, get Questions --> Quiz.getQuestions from Question, get text, num options --> Question.toString, getNumOptions from Question, get score for choice --> Question.getScore(index) from Quiz, get interpretation of score --> Quiz.interpretScore for internal observations from Option, get text and value --> Option.getValue, toString from Scorer, get interpretation of score --> Scorer.interpretScore 25© Daniel Jackson 2008 Here’s how I thought about what methods I’d need: first the constructors, then the observers. The observers for Option and Scorer are internal in the sense that they’re observations made with a Quiz or Question. For example, the program will call Quiz.interpretScore to interpret a score, and that will then call the method of the same name in Scorer. This design idiom is called “Facade” and limits dependences, by having access to types like Option and Scorer go through Quiz and Question. Take a look at slide 22 and you’ll see that the code that executes the quiz only sees these two types, and is therefore decoupled from the other types. datatypes designedpublic class Quiz implements Iterable {� � public Quiz (Scorer scorer, List questions) � public String interpretScore (int score) � public List questions() }public class Question {� public Question (String text, List