itec420 lecture scratch-pad file. ==== 2021-aug-24 (week1a) course-story look at regexp b(ba*b)* draw FSM informally start FSM formal def'n (incl. Java) READING: Chpt.03 ==== 2021-aug-26 (week1b) finish FSM formal def'n (incl. Java) defn of FSM "accept" - def'n: A *Language* is: a (possibly infinite) set of strings. - (hand-wave): For a FSM M, we say "L(M)" is the language it *accepts*; M "recognizes" L. E.g. We just gave a FSM which recognized b(ba*b)*. - set review: union, intersection, set-difference, cross-product; concatenation, Kleene Star ==== 2021-aug-31 (w2a) - common sets: ℕ, ℤ, ℚ, ℝ, ℂ; Σ* - set review: union, intersection, set-difference, cross-product; concatenation (powers), Kleene Star L₀ = {} L₁ = {17} L₂ = {vw, saab, bmw} L₃ = {2,3,5,7} P = prime numbers L₄ = state-abbreviations L₅ = strings over {a,b}* where every 'a' is followed by a 'b' L₆ = b*aa* L₇ = b*a*b* L₈ = b(ba*b)* - define lexicographic order. [For homeworks: "list first 5 strings in lang, lexicographically" - "set A closed under f": - ints closed under squaring - ints closed under * - ints *not* closed under square-rooting, nor division - general def'n, in ENGLISH (logic as exer. below) (for a unary f, and separately for a binary f) Questions: #elements? contains ""? L Exercise (challenge): Find two sets-of-strings A,B such that |AB| < |A| * |B| Answer: {ε,a}{ε,a} = {ε,a,aa} {ε,a}{a,aa} {a,b,ab}{b,bb} = {ab,abb,bb,bbb,abbb} ==== Sep.02 (Thu, week2b) - Showing sets equal: compare: L₃ = strings over {a,b}* where every 'a' is followed by a 'b' vs L₃' = strings over {a,b}* which don't contain "aa" Show/argue : - L₃ ⊆ L3': quick proof-by-contra - try other dir - [postpone until used] review proof-by-induction: n(n+1)(2n+1)/6 sum-of-squares formula - logic notation: translate engl. <-> logic: - ∃m∈ℕ.n+m=0 "n has an additive-inverse in ℕ" [note: n is a free var.] true for n=0, but false for n=17 - ∀n∈ℕ.∃m∈ℕ.n+m=0 "every number in ℕ has an additive inverse in ℕ" [note: NO free var] false. - if we replace the previous with "m∈ℤ", it becomes true. - "n/2 is an integer" (a.k.a. "n is even") n/2 ∈ ℤ now write it w/o referring to division (but mult. okay) {{{ thinking: n = 2*(some integer). NOW we can logic-ize it!: }}} ∃k∈ℤ. n=2*k Thus we could define the set of even numbers as 2ℤ = { n | ∃k∈ℤ. n=2*k } - ∃k∈N. n=5k in english is: "n is divisible by 5" - "The set Z is closed under squaring" - relation R is a function - a natnum n is composite ("non-prime",nearly): (n∈ℕ) ∧ (∃k∈ℕ .∃m∈ℕ. n=m*k ^ n>1 ^ m>1) (n∈ℕ) ∧ (∃k,m∈ℕ. n=m*k, n,m>1) - "every prime# is odd"; use an `->` to capture forall n, prime(n) -> n is odd - if a mult. of 5, then ends in '0' or '5' (use function 'lastDigit') - every 'a' is followed by a 'b'; use "charAt" - string w has string p as a prefix ==== Sep.07 (week3a) - finish logic notation: translate engl. <-> logic: review: - ∃k∈N. n=5k in english is: "n is divisible by 5" - a natnum n is composite ("non-prime",nearly): (n∈ℕ) ∧ (∃k∈ℕ .∃m∈ℕ. n=m*k ^ n>1 ^ m>1) (n∈ℕ) ∧ (∃k,m∈ℕ. n=m*k, n,m>1) new: - "every prime# is odd" (which, btw, is false); use an `->` to capture "for all primes" rather than "for all integers": forall n, prime(n) -> n is odd ∀n. prime(n) → ∃k.n=2k+1 ∀n. prime(n) → ∀k.n≠2k where prime(n) is shorthand for: ((n∈ℕ) ∧ ¬(∃k∈ℕ .∃m∈ℕ. n=m*k ^ n>1 ^ m>1) ∧ n≠1) - "The set ℤ is closed under squaring" (which is true, btw) ∀z∈ℤ. (z²∈ℤ) - relation R A×B is a function recall: a relation is just a set of pairs: e.g. { (16,4), (9, 3), (256,16), (16,-4), … } a relation is a *function* iff: ("there is only one output for a given input") ∀ a∈A: ∀b,c. (a,b)∈R ^ (a,c)∈R → b=c. ∧ ∃z.(a,z)∈R - book 4.3 (p.35+15=50): for a language L over an alphabet Σ chop(L)={w: ∃x∈L (x=x₁cx₂ ∧ x1 ... } (see book) firstchars(L) = (see book) == Sep.09 (week3b) - Writing FSMs (Chpt.05) - Example 5.2 - Example 5.3, every "a" region in w is even-length: - Task: odd-parity (soln: Example 5.4) Now back to formalizing: Def'n: FSA (as tuple), p.42+15 - define configuration - define acceptance == Sep.14 (week4a) - review hw01 (took ~1hr, esp. on "closure") - build DFA for "ends in 'baa'", using https://automatonsimulator.com/ (illustrate: add-state; single-step [and when finished]; accept/reject unit-tests; saving as decent URL) - review configuration, computation, acceptance == Sep.16 (week4b) Reading: Sect. 5.4 (NFAs) - terminology note: book common-usage ---- ------- FSM FA DFSM DFA NDFSM NFA - NEW: NFA, as diagram. - build NFA for "ends in 'baa'"; Try accepting: 'babbaa'; 'babbab' - espsilon transitions: an NFA for: every 'a' followed by exactly 1 b, or 3 b's: three strings in the lang: three strings not in the lang: Now let's make a NFA: Def'n of 'accept' (@page#-rich[48]) Notes (@page#-rich[49]): Let's update our definition of DFA to allow NFA; we need to account for: - NFA might have epsilon transitions (so we need to update our transition-function to account for that) - NFA might have zero transitions -- we don't need to add a 'dead' state (our transition-function won't be a total-function) - NFA might have *two* transitions (together meaning: transition *relation* instead of function) Def'n of DFA (@page#-rich[48]) == week05a Sep-21 Work through example of translation: Slides 1-17 skim 20-44 (these examples are also in book: Sections 5.4.1, 5.4.2) Slides 48-75 motivate Slides 75-80 Work through example: - p.56 (Example 5.20); this is in ppts slides OR - give a DFA for: L1= (ab)* [DFA as good as NFA] - give a DFA for: L2= (aba)+ [DFA as good as NFA] - give a NDFSM for: L1 ∪ L2 - convert this to a DFSM, using subset-construction: [be sure to view table this in a mono-width font!] megastate Q | eps(Q) | eps(δ(Q,a)) | eps(δ(Q,b)) | ------------+---------+---------------+--------------+ {0} | {0,1,3} | {2,4} | φ | ==== 2021-Sep-23 (Thu) week5b - work through DFA->NFA subset construction example - p.56 (Example 5.20); this is in ppts slides N.B. the book uses eps(δ(Q,a)), but we could also use δ(eps(Q),a) -- that's also fine, as long as you remember to "alternate" δ() and eps() each step. (The only difference is when we figure M's accept-states, we need to look for those Q where *eps(Q)* contains an M's accept state. …Really, it's just a matter of whether we name/label our states using column 1 above, or column 2 above.) megastate Q | eps(Q) | δ(eps(Q),a) | δ(eps(Q),b) | ------------+---------+---------------+--------------+ {0} | {0,1,3} | {2,4} | φ | {2,4} | {2,4} | φ | {1,5} | {1,5} | {1,5} | {2,6} | φ | {2,6} | {2,6,3} | {4} | {1} | {4} | {4} | φ | {5} | {1} | {1} | {2} | φ | {5} | {5} | {6} | φ | {2} | {2} | φ | {1} | {6} | {6,3} | {4} | φ | φ | φ | φ | φ | Ten states total, in our non-determ FSM. What is the initial state? {0} What is/are the final states (if any)? Any megastate which contains 1 or 6. well, in our case: Any megastate Q such that eps(Q) contains 1 or 6. So: { {0}, {1,5}, {2,6}, {1}, {6} } are our _5_ accept-states in the non-determFSM Augment: In order to prove that DFSMs and NDFSMs accept the same class of languages, we gave an algorithm that took *any* NDFSM M, and constructed a DFSM M' from it. We just waved our hands and said "clearly, L(M) = L(M')", and that's reasonable (for math graduates and for CS graduates with some theory background) but (as students) let's consider more closely what's involved: N.B. Book's Appendix B also goes through this, in a slightly different order: (The book also points out that we need to verify that our constructed machine M' really is a DFSM -- that δ really is a function in A′,Σ → A′ .) === To show L(M) = L(M′), we need to show - L(M) ⊆ L(M′) - L(M) ⊇ L(M′) == Show: L(M) ⊆ L(M′) That is, show that for any strings w1,w2 if 〈s,w1〉 ⊢* 〈q,w2〉 in M (for some state q), then 〈s′,w1〉 ⊢* 〈Q,w2〉 in M′ (for some state Q such that q∈Q) This part is a bit sticky: we have to go backwards from an accept state, and find ways that could reach it. (Or if you go forward, your statement is "for each *element* of the mega-state, there is a computation that reaches it") Both directions can be shown by induction on the length of the computation(*), in addition to verifying the acceptance-conditions. (*) The length of the computation in the 'if' statement. The two computations aren't necessarily the same length, due to epsilon-transitions getting folded into individual steps of the FSM. ... ==== week09a exam01 ==== week09b review exam01 briefly; CFGs: Ch11 slides ==== week10a review exam01 finish Ch11 slides: in particular, ambiguity and then discussion driven by "is G ambig." is undecidable, and approaches like "try all strings" -- along w/ ("given a string w, does it have two derivations" *is* decidable) ==== week10b PDAs -- Ch12 slides through 40 (example PDAs for: a^nb^n ; wcw^r ; Bal ; formal-defns ) - A language L is *regular* iff: - a dfa M s.t. L(M)=L - an nfa M s.t. L(M)=L - a regexp r s.t. L(r) = L - a reg-grammar G s.t. L(g) = L (that is, these 4 are equivalent def'ns) - Mention in passing sections 5.5-5.12: - NFAs can be simulated: (a) build DFA, or (b) code finger-method, or (c) DFS/BFS search (for DFS: do all epsilons at each step, to avoid loops) - FAs can be minimized - can have a canonical form - transducers (circuits): e.g. parity, p.70 (sect5.9 ); TCP: p.701 ^ \----- a FSM that *also* emits outputs, as well as consuming inputs on transitions. === CFGs S → N V . rule 1 S → N V N . rule 2 N → cat N → ufo N → apple N → cat | ufo | apple | fox | dog | hotdog | android | itec rules 3a-h V → bite | run | jump | hit | love | spin | hear rules 4a-g Using this grammar, we can derive "hotdog hit android ." from S (the start-symbol) S ⇒ N V N . (by rule 1b) ⇒ N hit N . (by rule 3d) ⇒ N hit android . (by rule 2g) ⇒ hotdog hit android . (by rule 2f) this is a "derivation" Grammar G2: S → NP CV NP . rule 1 S → NP CV . rule 2 NP → ART N rule 3 ART → a | an | the rules 4a-c CV → V s | V ed rules 5a-b N → cat | ufo | apple | fox | dog | hotdog | android | mongo | itec rules 6a-i V → bite | run | jump | hit | love | spin | hear rules 7a-g If we want to make articles optional, there are two ways: change rule 4 to be: ART → a | an | the | ε rules 4a-c OR, alternatively we could say: NP → ART N | N rule 3 OR, alternatively we could say: NP → ART_OPT N rule 3 ART_OPT → ART | ε ART → a | an | the - the set of Reg.languages is closed under: - complement, - union, - intersection [how? - make a DFA whose states are pairs-of-states of L1,L2? Just like subset-construction, but accept if only *both* DFA accept. - Easier: A∩B = ¬(¬A∪¬B). ] - concatenation ==== Nov.09 Finish: a^nb^nc^n not CFG, by pumping lemma (sketch) Start: Ch17a-Turing-Machines.key slides [1,22). example; formal def'n (transition, computation) Show example on turingmachine.io gist: 31acac183b8881a7e59a8298b0f31bd1 ==== Nov.11 Continue: Ch17a-Turing-Machines.key slides [35,end). [Skip [22,35) ] Transducers: just have a halt-state Deciders: two halt-states, "y" and "n" Note: no transitions out of the special-states h,y,n. This will let us *compose* machines: M1 -> M2 identify M1's "h" state (or, if a decider, its "y" state) to M2's start. (of course, M2 might not be starting at left-most of input, in this case; but if you really want to build a library you'd probably have each machine *finish* in this standard way.) Examples (deciding): a^n b^n c^n wcw Examples (computing): w -> w_w s_t -> st == 2021-nov-30 (week14a) Ch19 -- Halting not Decidable Ch18 -- Church-Turing thesis (only the first couple slides) Def'n's of P, NP, NP(alternate) ========= 2020fall ==== 2020-aug-19.txt Outline: - recall: equal cardinalties N, Z, Σ* (and, def'n of equal-cardinalites) enumerate Z: 0, 1, -1, 2, -2, 3, -3, ... enumerate Σ*: ε, a, b, ..., z, aa, ab, ..., az, ba, ..., zz, aaa, aab, ..., zzz, aaaa, ... This is a "lexicographic ordering" of strings. - "countably infinite" means "has the same cardinality as ℕ" - aside: aleph-null - def'n: A *Language* is: a (possibly infinite) set of strings. - ℕ vs (ℕ x ℕ) Is there some way to enumerate the elements of NxN ? (see picture) ⟨0,0⟩ ⟨1,0⟩ ⟨0,1⟩ ⟨2,0⟩ ⟨1,1⟩ ⟨0,2⟩ ⟨3,0⟩ ⟨2,1⟩ ⟨1,2⟩ ⟨0,3⟩ ... Thus: (ℕ x ℕ) is countable! - ℕ vs ℚ (the rational numbers) 3/4 7/32 0/17 6/8 Surprisingly, we can still enumerate ℚ ! Just look at NxN, but remove duplicates, and go through by diagonals again. - ℕ vs ℝ - ℕ vs [0,1] There is NO such enumeration of [0,1]!!!! Proof: proof by contradiction: Suppose some f: ℕ → [0,1] which is 1:1,onto. f(0) 0.141592653589793238462643383279... f(1) 0.707....... f(2) 0.50000000000... f(3) 0.16666666666... We show a contradiction to f existing: Look at the following number: x = 0.1006... where the i'th digit of my number x is the i'th decimal place of f(i). Now make the number y, whre each digit of y is ONE MOER THAN the corresponding digit of x (wrapping around) y = 0.2117.... Claim: y is not in our enumeration. SUppose it were -- that is suppose there were some k in N such that y=f(k). Then the k'th digit of y is one more than the k'th digit of f(k)=y. Contradiction! Therefore f isn't *onto* [0,1]. Contradiction. Conclusion: the real numbers are NOT countable! (This proof technique is called "diagonalization".) ------------ 2020-aug-21 (Fri) hw01a review: (see D2L » Content for these notes) f: A → B Facts: - if f is 1:1 (and total), it means |A| ≤ |B| - if f is onto, it means |A| ≥ |B| General warning: in English, BEWARE THE "THE" -- tossing that word in implies both existence and uniqueness, w/o people always realizing it! - example of proof by induction: - Prove: ∀n∈ℕ, ∑ᵢ₌₀ⁿ(i) = n(n+1)/2 Proof by induction: Let P(n) be the statement "∑ᵢ₌₀ⁿ i = n(n+1)/2". [Note how P takes a natural-number, and gives a t/f proposition. That is, P is an infinite family of propositions. "induction" is the tool that lets us prove infinitely many propositions. Logicians would say "Induction is one of our *rules of inference*".] - we show: P(0) holds. ∑ᵢ₌₀ᵏ i Σ_0^0 i = 0 = 0(0+1)/2, done. - we show: forall k in ℕ, P(k) => P(k+1) By the inductive hypothesis, ∑ᵢ₌₀ᵏ i = k(k+1)/2. Then: 0+1+2+...+k = k(k+1)/2 (expand summanion-notation) 0+1+2+...+k+(k+1) = k(k+1)/2 + (k+1) (add k+1 to both sides) 0+1+2+...+k+(k+1) = k(k+1)/2 + 2(k+1)/2 (multiply & divide the r.h.s. (k+1) by 2) 0+1+2+...+k+(k+1) = (k+1)(k+2)/2 (collect the (k+1)/2's on the r.h.s) so ∑ᵢ₌₀ᵏ⁺¹ i = (k+1)(k+2)/2 (re-write l.h.s. using summation-notation) which is exactly P(k+1). [So this bullet showed that: assuming P(k) is true, then P(k+1) is true -- that's our general task, for the inductive step.] Therefore by the principle of induction, the preceding two bullets let us conclude: ∀n∈ℕ, P(n) holds. (yay!) - Alternate example (ask during office hours perhaps): For any finite set S, |P(S)| = 2^|S| ==== Aug.24 (Mon) - review hw01b (see D2L for those notes) ==== 2020-aug-26 (Wed) - Alphabets, Strings and Languages: - some examples of each: alphabet: {'a'..'z','A'..'Z'} ∪ punctuation strings: - "You taught me language, and my profit on't / Is, I know how to curse" [act 1, sc. 2, l.363] - "What's past is prologue." [act 2, sc. 1] languages: - the lines of The Tempest by Shakespeare (1611) -- finite - legal Java programs -- infinite alphabet: {0,1} strings: - 01101111 - 0 languages: - all bit-strings such that no two "0"s occur in a row - all bit-strings with even parity - strings of note: ε Any one string is *finite* (just like in Java; use `Stream` for potentially-infinitely-long sequences.) - string operations (where u,v are variables): u,v concat: uv u reversed: uᴿ "abc" "abc"ᴿ = "cba" - languages of note: empty-lang, φ langs can be finite, or infinite Σ* -- all strings over Σ -- ("0 or more occurrences from Σ" - finite) "Kleene Star" Def'n of lang: "a set of strings" or for this class, to remind ourselves, "a (possibly infinite) set of (finite) strings." language operations: ∪, ∩, x L, M concat: LM -- write def'n using set-builder if L = {a,aa,abba,cab}, M = {cc,a} then LM = { acc, aa, aacc, aaa, abbacc, abbaa, cabcc, cab } if L = {a,aa}, M = {cc,a,aa} then LM = { acc, aa, aacc, aaa, aaaa } -- size 5, not 6=2x3 Note how "aaa" ∈ LM for "two different reasons", but sets either have an item or they don't, not "have it twice" LM ≡ { uv | u∈L, v∈M } (We are "overloading" concat to apply to 2langs, not just 2strings) question: what can we say about |LM|? Answer: |LM| ≤ |L| * |M| L reversed: Lᴿ -- write def'n using set-builder Lᴿ ≡ { wᴿ | w∈L } (or equivalently: Lᴿ = { w | wᴿ∈L } ) example: if L = {abc, ba, ccc}, then Lᴿ = { cba, ab, ccc } question: |Lᴿ| =? ≤? ?? misc: concat string-w-lang: abcL, or L1 (reminiscent of "3·ℕ" = {3n | n∈ℕ} ) e.g. L = {00, 0110, ε}, then L1 = {001, 01101, 1} Define: If u is a string and L is a language, then: - uL is shorthand for { uv | v∈L } Same as: Let U = {u}, then uL = UL. - and similarly, Lu is { vu | v∈L } question: |uL| = ? (for a string u and a language L). === Aug.28 (Fri) - FSM: - examples Book, p.42 = 58-16 (pdf) - designing a FSM: practice - formalism ... - a computation of M is a *sequence* of states of M. Note that the notion of "dead state" ("fail state") is NOT part of a machine's spec. Can you define it more formally? === Aug.31 (Mon) === Sep.02 (Wed) L1={a, ab} L2={bc, c} |L1·L2| < |L1| * |L2| L1={ a } L1* = { ε, a, aa, aaa, ... } L2={ b } L2* = { ε, b, bb, bbb, ... } L1·L2 { ab } (L1·L2)* = { ε, ab, abab, ababab, ...} L1*·L2* = { ε, b, bb, bbb, ..., a, ab, abb, abbb, ... aa, aab, aabb, ... } ==== Sep.04 (Fri) Notes from after-class Prove: for a language L, show (Lᴿ)ᴿ = L. side note: to show two sets are equal A = B, it means: - A subseteq B -- that is, show if a∈A, then a∈B - B subseteq A -- that is, show if a∈B, then a∈A What I'd be entirely happy with -- *assuming* that for any individual string w, wᴿᴿ=w. Recall that (by def'n of Lᴿ), Lᴿ = { wᴿ | w∈L } and so (Lᴿ)ᴿ = { kᴿ | k∈Lᴿ } = { (wᴿ)ᴿ | since k∈Lᴿ k=wᴿ for some w in L} = { w | some w in L} = L Q: But if you wanted to first show that for individual strings, then we'll need a firmer notion of what a string is, and what reverse is: Data def'n: a string is: - ε - aw, where a∈Σ, and w is a string. Examples of the data: ε aε = a ba aba ... def'n: the reverse of a string u is: - ε, if u=ε - wᴿa, if u=aw. Prove: that (wᴿ)ᴿ = w. Let P(n) = "any string of length n, wᴿᴿ = w" We'll use induction to show that for all n, P(n) is true: - base case: P(0) is true ... - inductive set: show that P(k) ⇒ P(k+1) let u be a string of length k+1. Then u=cw. then w is a string of length k. so by ind.hyp, wᴿᴿ = w uᴿᴿ = ((cw)ᴿ)ᴿ = (wᴿc)ᴿ by def'n of (cw)ᴿ [okay, I have to define what right-concat is, to get this to go through... will leave it for now.] ==== 2020-Sep-11 (Fri) augment: In order to prove that DFSMs and NDFSMs accept the same class of languages, we gave an algorithm that took *any* NDFSM M, and constructed a DFSM M' from it. We just waved our hands and said "clearly, L(M) = L(M')", and that's reasonable (for math graduates and for CS graduates with some theory background) but (as students) let's consider more closely what's involved: N.B. Book's Appendix B also goes through this, in a slightly different order: (The book also points out that we need to verify that our constructed machine M' really is a DFSM -- that δ really is a function in A′,Σ → A′ .) === To show L(M) = L(M′), we need to show - L(M) ⊆ L(M′) - L(M) ⊇ L(M′) == Show: L(M) ⊆ L(M′) That is, show that for any strings w1,w2 if 〈s,w1〉 ⊢* 〈q,w2〉 in M (for some state q), then 〈s′,w1〉 ⊢* 〈Q,w2〉 in M′ (for some state Q such that q∈Q) Both directions can be shown by induction on the length of the computation(*), in addition to verifying the acceptance-conditions. (*) The length of the computation in the 'if' statement. The two computations aren't necessarily the same length, due to epsilon-transitions getting folded into individual steps of the FSM. ==== 2020-Sep-14 (Mon) Work through example: - give a NDFSM for: L1= (ab)* - give a NDFSM for: L2= (aba)+ - give a NDFSM for: L1 ∪ L2 - convert this to a DFSM, using subset-construction: [be sure to view table this in a mono-width font!] megastate Q | eps(Q) | δ(eps(Q),a) | δ(eps(Q),b) | ------------+---------+---------------+--------------+ {0} | {0,1,3} | {2,4} | φ | {2,4} | {2,4} | φ | {1,5} | {1,5} | {1,5} | {2,6} | φ | {2,6} | {2,6,3} | {4} | {1} | {4} | {4} | φ | {5} | {1} | {1} | {2} | φ | {5} | {5} | {6} | φ | {2} | {2} | φ | {1} | {6} | {6,3} | {4} | φ | φ | φ | φ | φ | N.B. the book uses eps(δ(Q,a)) rather than δ(eps(Q),a) -- that's also fine, as long as you remember to "alternate" δ() and eps() each step. (The only difference is when we figure M's accept-states, we need to look for those Q where *eps(Q)* contains an M's accept state. …Really, it's just a matter of whether we name/label our states using column 1 above, or column 2 above.) Ten states total, in our non-determ FSM. What is the initial state? {0} What is/are the final states (if any)? Any megastate which contains 1 or 6. well, in our case: Any megastate Q such that eps(Q) contains 1 or 6. So: { {0}, {1,5}, {2,6}, {1}, {6} } are our _5_ accept-states in the non-determFSM ==== Sep.18 (Fri) start slides on regexps Everybody thinks about false-negatives, but sometimes forget false-positives. ==== Sep.21 (Mon) What's a regex for…: an email addr? a regex for any letter, you can write, instead of (a ∪ b ∪ c ∪ ... ∪ z) (a|b|c|..|z) or [abcd...z] or [a-z] or [A-Za-z] (better) related: [A-Za-z0-9] or \w (worse? It's really [A-Za-z0-9_] ) but this excludes: ñ é try: \p{L} -- any unicode character with the Property of "Letter" \p{Z} -- any unicode character of (horizontal?) space: tab, nbsp, thinsp, ... NOW: What's a regex for…: an email addr? \w+@(\w+\.)+(com|edu|org|…) What's wrong with this regex for a URL-start?: https*://(\w+\.)+(com|edu|org|…) s* is too much -- false positives instead try: https(s|ε): //(\w+\.)+(com|edu|org|…) [our book's syntax] or https(s|): //(\w+\.)+(com|edu|org|…) [a real re lib's syntax] or https?: //(\w+\.)+(com|edu|org|…) Example of re's I sometimes write myself: Here's an expression to look in a student's homework, to see if they have a definition to define the function asked for on the homework: accept "function pluralize( string s, int n ) : string " or "function pluralise( s, n )" ... r'function plurali[zs]e\(\s*(\w+\s+)?\$\w+,\s?\s*(\w+\s+)?\$\w+\s*\)' (N.B. the `r'...'` is a python "raw string". The above in Java would be: "function plurali[zs]e\\(\\s*(\\w+\\s+)?\\$\\w+,\\s?\\s*(\\w+\\s+)?\\$\\w+\\s*\\)" and if I wanted a regexp to match strings with a literal backslash, like "H:\ibarland", the regexp would be: the Java for that regexp would be: ) ==== 2020-sep-23 (Wed) L1 = {w ∈ {a, b}* : every 'a' is immediately followed by a 'b' } Recall from discrete math / logic: ¬(∀x.P(x)) is equivalent to (∃x.¬P(x)) b ∈ L1 ? yes. bbbbabbbabb bbbb ε b <- ->S1 -> S2 || a b ((a ∪ φ)bb*)* a? ((a ∪ φ)b)* Prove: a language is accepted by a NFA iff there is a regexp defining the language. Rephrase: For a language L: There exists an NFA M ... iff There exists ... Rephrase: Regexps and NFAs both correspond exactly to the Regular Languages. Suppose α = β ∪ γ Write a machine M3 to accept L(β ∪ γ): inductive hyp: M1= L(β), and M2 = L(γ) We construct M3 by: _>M1 ε/ ->S₀ ε\ ->M2 M1 = (K1, Σ, Δ1, s1, A1) M2 = (K2, Σ, Δ2, s2, A2) build: M3 = (K3, Σ, Δ3, s3, A3) K3 = K₁ υ K₂ ∪ {S₀} A3 = A₁ υ A₂ s3 = S₀ Δ3 = Δ₁ υ Δ₂ υ { (s₀,ε,s₁), (s₀,ε,s₂) } ==== Sep.28 (Mon) We talked about regexps-in-real-life [not on exam/hw]; then Ch08: lead-up to pumping lemma (incl. review pigeon-hole principle) == for fun: a regexp to capture a string of "a"s which is non-prime in length: 1111 111111 but not 11111 Here's a regexp: (11+)\1+ (using Perl's extension -- backreferences -- so not a "real" theory-regexp) Last(ish) topic for Regular Languages: Show that there are non-regular languages! That is, PROVE that some language is NOT regular. We'll do this with "the pumping lemma" -- we'll note that any *long* string accepted by a FSM has certain regularities in it ('cause the FSM can't be too complicated) We can exploit the regularity to show that other strings must be accepted by the same FSM Therefore: if those other (even-longer) strings aren't in our language, then we can't have a FSM for it. ==== Oct.07 (Wed) Grammars, cont.: idiom: if I want (say) *optional* article, I can use the rules: … ART-OPT → ε ART-OPT → ART ART → a | an | the … Let's make a grammar for a java-field-declaration: Examples: Strings we want to be able to generate: String str; private int test; final public int num_students; String str = "howdy"; public double x = Math.sqrt(12+4); Field-Decl → Final-Opt Access-Modifier-Opt Type Id Assign-Opt ; Access-Modifier-Opt → ε | Access-Modifier Access-Modifier → public | private | protected Final-Opt → ε | final Assign-Opt → ε | Assign Assign → = Expr Type → char | byte | short | int | long | float | double | boolean | Id Expr → ... Id → ... === generating zero-or-more, with a CFG: AP → N rule 5 AP → ADJ AP Note that we can in general have an idiom for "0 or more ADJ": (version 1:) ADJS → ADJ ADJS → ADJ ADJS NP -> ADJS N oops, that non-terminal "ADJS" can only generate ...*one*-or-more. Let's fix that: (version 2:) ADJS → ADJ | ε ADJS → ADJ ADJS NP -> ADJS N This works, BUT can be trimmed down sleeker/better: (final version:) ADJS → ε ADJS → ADJ ADJS NP -> ADJS N THIS now is an idiom for "0-or-more of ...": If you want 0-or-more of 'Blah' (where 'Blah' is some existing non-terminal), make a new non-terminal named "Blahs" with the rules: Blahs -> ε | Blah Blahs ==== oct.23 If a grammar can generate a long-enough string w, there is some way to divide w = uvxyz, such that the same grammar can also generate uvvxyyz and uvvvxyyyz and uxz and all strings { uvⁱxyⁱz | i ∈ ℕ } Now we show that L = { aⁱbⁱcⁱ | i ∈ ℕ } is NOT context-free: Suppose it were. Then there is a CFG G accepting L. Then by pumping-lemma-for-CFGs, choose a k so that w = aᵏbᵏcᵏ is "long enough" so that w = uvxyz Consider v: either it is all the same character, or crosses a boundary between as, bs, cs. Likewise for y. Case 1: either v or y contains two letters. If so, then uv²xy²z contains too many letter-change-boundaries, and generates a string NOT in L. Contradiction Case 2: both v and y contain a single letter. But then, uv²xy²z has extra copies of up to TWO of the letters, BUT NOT ALL 3. Contradiction. Thus there can't be a CGF accepting L, and thus L is not Context-Free. For slide#13 in notes (Chptr.17), the TM to pad with b's to equalize a's,b's: State 2: [sitting on first unprocessed 'a'] mark 'a' with '$' State 3: move R to the first 'b' (or a blank); mark it (if found) State 4: move L back to the last 'a' (or a blank); mark it and back to 3 See: turingmachine.io import gist: 31acac183b8881a7e59a8298b0f31bd1 === 2020-oct-26 Tasks: Write a TM to: - verify a palindrome over {a,b}* - duplicate a word Several good sites letting you execute (& visualize) TMs: - https://turingmachine.io/ * we'll use this one - shows machine as diagram! - allows aliases? - instruction-syntax not like book, kinda scrambles the tuple elements. - instructions allow some shortcuts ... which can also hinder reading/consistency at first - have to spell out "write:" - pre-made machines have good state-names, and good tasks - '␣' instead of book's '❏' - https://turingmachinesimulator.com/ - good syntax (once I realize each rule is on 2 lines) - supports multi-tape machines - http://morphett.info/turing/turing.html - good syntax; allows '*' wildcard - sub-par visualization ==== 2020-oct-26 2i=3j+1 i j ____ 0 -- 1 -- 2 1 aab 3 -- 4 -- 5 3 aaaaabbb 6 -- 7 -- 8 5 S → aaaTbb|aab ----equiv to:--- S → aT, T → aaaTbb | ab S → A | B A → aAc | bA | ε B → bBc | aB | ε ==== 2020-oct-30 initial: ----aaba----- ------------- inbetween: ----AabaA---- ----AAbaAA--- ----AABAAABA- final: ----aabaaaba- ==== 2020-nov-09 ____xx111-xx111111111111=11____ ____xx111-111111111111xx=11____ ____xxxx1-1x=xxx__ _____111-=111___ ==== 2020-nov-13 ∨ - Notes: - office hours through next Fri. - exam02: similar to exam01 format (24-48hr window cover; 2-3hrs to complete, open web) May include an untimed set of "short answer" q's. - exam01: You can re-do problems you missed points on; dropbox/submission-details by tomorrow; due Mon. ==== Excel, TMs, and halting: https://xkcd.com/2453/