3   General parsing principles

The annotation scheme represents syntactic structure with labelled parentheses. All open parentheses have an associated label, representing nodes in a tree. Labels are either word level part-of-speech tags (N, P, ADV, etc.), or phrase level categories with minimally a basic label to indicate the form of the constituent (NP, PP, ADVP, etc.). Frequently, label extensions (separated by a hyphen) are added to labels to indicate function (NP-SBJ=subject, IP-REL=relative clause, IP-SMC=small clause, etc.). In most cases there is one, but more is possible. There is no specified VP level, so clause structure is generally flat with multiply branching nodes, with IP layers immediately dominating all clause level constituents. Furthermore, verbs (VB, VB0, VB2), verbal auxiliaries and copulas (AX), modal elements (MD), and particles (P) of various types, etc., are separately labelled and dominated by the IP layer. Strings of one or more particles (P) can attach to noun phrases or clauses to create particle phrases (PP). The PP label itself is never extended with function marking. However the immediately following sibling of a PP may provide disambiguation information for the PP, in which case the terminal of the disambiguation information starts and ends with ‘*’. Null elements of various types often appear under labels appropriate to the grammatical function, in which case the terminal starts and ends with ‘*’.

     As an example, consider:


The parse in tree form looks like:

IP-MAT PP NP PP NP N 貴社 P N 記者 P NP-SBJ * PP NP N 汽車 P VB 帰社 VB0 AXD 貴社 の 記者 は * 汽車 で 帰社 し た

In bracketed notation this is:

( (IP-MAT (PP (NP (PP (NP (N 貴社))
                      (P の))
                  (N 記者))
              (P は))
          (NP-SBJ *)
          (PP (NP (N 汽車))
              (P で))
          (VB 帰社)
          (VB0 し)
          (AXD た))
  (ID 34_misc_BUFFALO))

The presence of (NP-SBJ *) indicates that the immediately preceeding PP is headed by a toritate particle (in this instance, は) and also has as its complement an NP with the core grammatical role of SBJ. By contrast, the nominal modifier PP with の and the adjunct PP with で are not further specified for function, as particle の at the NP level is sufficient to specify the function of a PP with regard to a head N, and particle で at the IP level is sufficient to specify the function of a PP with regard to a head predicate.

     Every word has a word level part-of-speech label. Words that project phrases are phrase heads (e.g., N, P, ADV, etc.). Phrase heads are immediately dominated by phrase nodes (e.g., NP, PP, ADVP, etc.) so that any modifiers or complements of the phrase head appear as sisters of the head. Intermediate levels of structure in the sense of X' theory (N', ADV', etc.) are never represented explicitly. To complement this flat phrase structure, label extensions marking function allow modifiers and complements to be distinguished.

     In practice, there are several scenarios where a single-word modifier does not project a phrase, but appears directly under a phrasal node headed by some other word. For example, ADV occasionally appears directly under NP, either as a modifier of NUMCLP (e.g., わずか35 分) or as a modifier to a word denoting a position or value in scale, degree, or quantity (e.g., ちょっと前に; ずーっと昔から; 最も内陸側に; 凄くたくさんの; もう少し長く; etc.). There are also the pre-nominal modifiers D, WD, and PNL, which take no complements and in principle are never modified. These are directly dominated by the NP projected by the head N that they modify (e.g., 例の人は; どの市町村でも; ただの流行として; etc.).

     Aside from modifiers an interjection (INTJ) can also appear directly dominated by an IP without projecting its own INTJP. Furthermore, foreign words (FW), and punctuation (PU) never project their own phrasal nodes.

     Finally, while inflecting elements (predicates) are regularly heads projecting IPs of various types, there are a variety of elements that appear together with predicates as part of the extended predicative syntagm. For example, in main clauses a verb stem may be extended by voice morphology like (PASS られ) or derivational elements like (AX やすい), by particle て followed by aspectual and deictic auxiliary verbs like (VB2 いる) and (VB2 くる), by negative auxiliaries like (NEG ない), by past tense auxiliaries like (AXD た), by post-adnominal evidential elements like the sequences (P の) (AX だ) and (MD そう) (AX だ), by modal elements like (MD だろう) and sequences like (MD よう) (AX だ), and by extensions like (MD らしい) and (MD べし). Distributed among these, various particles can appear such as は, も, こそ, ばかり, のみ, さえ, etc. These can all be considered as parts of the extended verbal syntagm that heads an IP.

     Taking these details into account, phrase structure is roughly describable as below:

gr n1 XP n2 Y n1--n2 n3 YP n1--n3 n4 ZP n1--n4 n5 X n1--n5 t2 single-word-modifier n2--t2 t3 multi-word-modifier n3--t3 t4 complement n4--t4 t5 head n5--t5

In bracketed notation this is:

  (XP (Y single-word-modifier)
      (YP multi-word-modifier)
      (ZP complement)
      (X head))

     In general the head (N, P, ADV, etc.) is overt and matches the category of the phrase level (NP, PP, ADVP, etc.).

  (PP (NP (N 街))
      (P で))

  (ADVP (ADV とても))

     The lack of a word level constituent to match the phrase in category sometimes indicates either (i) the head has been elided (as can be seen in the lack of a predicate in the IP-ADV below—a right-node raising construction); or (ii) the head has a more specific label than its general category label (as can be seen in the way that the pronoun (PRO), as a subclass of nouns (N), heads an NP below.

  ( (IP-MAT (NP-SBJ *pro*)
            (IP-ADV (PP (NP (N 指揮))
                        (P に))
                    (NP-OB1 (NPR ヘンリク・シェーファー)))  ← ellided head
            (CONJ *)
            (PU 、)
            (PP (NP (N ピアノ))
                (P に))
            (PP (NP (NPR 萩原麻未))
                (P を))
            (NP-OB1 *を*)
            (VB 迎え)
            (AX ます)
            (PU 。))
    (ID 99_news_KAHOKU_40))

  (NP (PRO 彼))   ← more specific label for head

More specific heads of NP include proper noun (NPR), quantifier (Q), noun with quantifier (QN), pronoun (PRO), indeterminate pronoun (WPRO), and intermediate nominal (NML). As an exceptional case, NUMCLP (itself a phrasal category) always projects an NP.

     However there are other scenarios where a phrasal category occurs without a matching head. As inflecting elements, predicates regularly head IPs, but predicates can have core elements of various types: VB; ADJI; ADJN + AX; NP-PRD + AX; all project a clause (IP*). In numeral-classifier phrases (NUMCLP) the numeral (NUM) and the classifier (CL) are mutually dependent, and while the rightmost element (normally a CL) is the head, both words are directly dominated by a syncretic category NUMCLP. ADVPs are frequently headed by ADJN + AX in the infinitive inflection, and sometimes by ADJI in the infinitive inflection. (Adverbs derived from single verbs in the て-form (like 決して, 辛うじて, 初めて, 挙って, 予て, etc.) are assiged the tag ADV). NUMCLPs regularly project an NP. A prenominal phrase (PNLP) is allowed to dominate any variety of category, in any arity. Finally, the category CONJP can be headed by either a P, a CONJ, or a “bare” phrasal category.