Time
1200 to ??? Weds 14th.
Topics
Interface between standoff XML interface (SMAF) and the deep parser (deep grammar running on LKB/PET).
Sketch:
* Discuss the standard for the process (x) below:
PREPROCESSING --.--> SMAF XML --(x)--> deep parser
* Current SMAF looks something like (see also SmafExample):
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE smaf SYSTEM 'smaf.dtd'>
<smaf addressing='char'>
<olac:olac xmlns:olac='http://www.language-archives.org/OLAC/1.0/' xmlns='http://purl.org/dc/elements/1.1/' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.language-archives.org/OLAC/1.0/ http://www.language-archives.org/OLAC/1.0/olac.xsd'>
<dc:identifier>s1</dc:identifier>
<creator>sciborg 1.00</creator>
<created>16:18:08 6/07/2006 (UTC)</created>
</olac:olac>
<fsm init='v0' final='v3'>
<edge type='token' id='t1' from='0' to='3' value='some' source='v0' target='v1'/>
<edge type='oscar' id='o4' from='4' to='13' source='v1' target='v2'>
<fs>
<f name='type'>compound</f>
<f name='surface'>A1S2D34F5</f>
</fs>
</edge>
<edge type='token' id='t3' from='14' to='19' value='melts' source='v2' target='v3'/>
</fsm>
</smaf>
* SMAF edges may take a variety of types: Eg.
token
morph
named-entity
pos
oscar
* SMAF edges possess a number of properties: id, standoff from/to, lattice source/target + a content blob
* A content blob consists of
either TEXT
or one or more of the following:
RMRS
slots
(typed feature structure)
* [SciBorg] We have been playing with the use of a config file to define the mapping (x). This looks something like:
;;;; PROCESSOR settings (LKB) (share with PET?) ;; "instantiate chart" with edges of following form: ;; token edges (edgeType='tok'), possess: a string (tokenStr) ;; non-token edges (edgeType='morph'), may possess: stem + partialTree ;; non-token edges which also give rise to token edge in chart (edgeType='tok+morph'): ;; union of 'tok' and 'morph' above token.[] -> edgeType='tok' tokenStr=content morph.[] -> edgeType='morph' stem=content.stem partialTree=content.partial-tree pos.[] -> edgeType='morph' oscar.[] -> edgeType='tok+morph' tokenStr=content.surface ;;;; GRAMMAR specific settings (ERG) ;; map SMAF type into type in grammar's type hierarchy ;; map SMAF RMRS content into lexical entry ;; "slot" definitions define gMap.type () define gMap.pred (synsem lkeys keyrel pred) define gMap.carg (synsem lkeys keyrel carg) STRING define gMap.rel (synsem lkeys keyrel) ;; syn(sem) type oscar.[type='compound'] -> gMap.type='n_proper_nale' oscar.[type='substance'] -> gMap.type='n_proper_nale' oscar.[type='element'] -> gMap.type='n_proper_nale' oscar.[type='namender'] -> gMap.type='n_proper_nale' oscar.[type='adjective'] -> gMap.type='adj_intrans_nale' ;; semantics ;; either pure native RMRS oscar.[] -> gRmrs=content.rmrs ;; or slots (REL + CARG) oscar.[type='compound'] -> gMap.pred='chem_compound_rel' oscar.[type='substance'] -> gMap.pred='chem_substance_rel' oscar.[type='element'] -> gMap.pred='chem_element_rel' oscar.[type='namender'] -> gMap.pred='named_rel' oscar.[type='compound'] -> gMap.carg=content.surface oscar.[type='substance'] -> gMap.carg=content.surface oscar.[type='element'] -> gMap.carg=content.surface oscar.[type='namender'] -> gMap.carg=content.surface ;; or feature structures as in MAF??? no
* Collect and examine some examples...