Lingyuan's blog

For many high school students, summer is a great opportunity to relax, catch up on work, or study some things they would not usually learn at school. For me, it is the third option. I have just completed my first week at Stonybrook’s summer camp, and it was a lot of fun! While my experience is not over yet, I felt compelled to share what has already occurred.

SYCCL is a two week long boarding pre-college summer program designed to introduce students of varying degrees of experience to the rather niche field of computational linguistics. There are numerous summer camps that study either Linguistics or Computer Science in depth, but to my knowledge very few other programs offer courses in this specific field. This was rather unfortunate for me as both fields are on my radar as potential fields I wish to study.

Living Arrangements

Firstly, as a boarding program the living arrangements are a big factor to my experience, and they did not disappoint. We were placed in a college dormitory consisting of suites, each further divided into rooms with two people sharing a room. I quite like this arrangement as opposed to the usually normal dorm room structure since this allows the extra community space offered by the suite, and also naturally forms a group of people that you can get to know quite easily. There are also many RA’s living in the dorm with us, all of whom are very nice and amiable. If there was one downside, it would be the dorm placement. Unfortunately, many of the facilities on campus are situated pretty far from my dorm which is very inconvenient.

Linguistics

Next about the classes themselves, I found them to be very fascinating. Since most of my linguistic knowledge was self-taught, this was a great opportunity to fill in the blanks of my knowledge. In addition many of the topics covered were a lot more advanced than what I’ve seen. For example when learning about Phonetics, instead of simply learning about the IPA, we also learned about how the pronunciation of some morphemes change based on their context or surrounding phonemes. This is called allophones. Besides just being interesting, the fact that the same morphemes can surface in our language points to a underlying representation of morphemes that is separate from how they are pronounced.

We were also able to learn from some established professionals in the field such as Professor Larson who talked in depth about Semantics and his experience with The Warlpiri Tribe in Australia. One fascinating example he showed was how the way the Warlpiri tribe viewed certain concepts such as kinship completely differently from the rest of the world. To summarize, the Warlpiri only viewed parents of the same sex as true parents in their language, meaning for a male Warlpiri, they would view their mother not as ‘mother’ in the sense we knew but as his father’s spouse. This was an awesome example of the subtle differences that can exist between languages.

Linguistic Models

Besides traditional Linguistics, we also learned to view Linguistics in a more formal or mathematical sense using Linguistic Models: ways of representing or mirror language. I say mirror because in the way I understand it, while these models are not the same as how humans process language, they do offer insight on the structure of languages itself. The two examples we were shown were Finite State Automata (FSA) and Context Free Grammar.

Finite State Automata

For FSA’s, they are made up of nodes and edges with the nodes representing a state and the edges representing the operations applied to those states. In this case, the operation applied is appending a morpheme. In a sense, it works similarly to the Computer Science method of representing dictionaries with a tree. Below is an example FSA for the word deindustrialization. By following the graph, you should be able to create nearly all variations of the word based on its morphemes:

FSAs can also use loops to handle reduplication. Here is an example using negation:

The FSA itself is a rather weak Linguistics Model since it has many limitations such as only forming words in a linear left to right manner but was a great introduction to approaching Linguistics mathematically.

(Note: One limitation is that FSA’s cannot handle circumfixes. This will be left as an exercise to the reader😊)

Context Free Grammar

In contrast, CFG’s are much more powerful. Here the process CFG’s go through:

  1. Begin with a string (Symbol ‘S’)
  2. Pick a symbol to replace
  3. Replace according to rules
  4. Repeat steps 2 and 3 until no more updates can be made
  5. End

CFGs take advantage of the phrasal nature of languages which allow you to replace words with words of the same nature. An example you may be familiar with are Mad Libs. However, CFGs Go a step further by applying these to phrases. Then by applying these rules recursively, you can generate full on sentences.  

Here is a set of simple CFG rules:

  S -> NP VPNP -> (D) N (PP)NP -> PronVP -> V (NP) (PP)VP -> V CPPP -> P NPCP -> C SX -> X Conj X   (for X in {NP, VP, PP, CP, S})  D -> Determiner N -> Noun Pron -> Pronoun V -> Verb P -> Prepositions C -> Subordinating Conjunction Conj -> Coordinating Conjunction

Example CFG process for the sentence: “The King of Britain likes tea”

However, CFGs can also demonstrate alternative interpretations of sentences. For example, here are two ways the CFG can generate the sentence “We caught fireflies in our pajamas”

All in all CFG’s are very fascinating. However, there are also limitations. While CFG’s can likely model most languages using this format, the limitation is complexity. The set of CFG rules over cover for the most basic grammatical rules in English. However, if you were to apply other grammatical rules like verb-subject tense agreement, you would need to encode more information, manifesting in the form or exponentially growing number of states and rules. This makes CFG’s very impractical for modeling complex systems.

Conclusion

All in all, this Summer Program has been incredibly interesting, and I have learned so much stuff that I could not write about in this simple blog. This has been a great help for my personal development, and it has only been the first week! I can’t wait to see what we do next week as we begin working on personal projects and learning even more things!