FeforStandoffAnnotationInterface

Time

1200 to ??? Weds 14th.

Topics

Interface between standoff XML interface (SMAF) and the deep parser (deep grammar running on LKB/PET).

Sketch:

* Discuss the standard for the process (x) below:

PREPROCESSING --.--> SMAF XML --(x)--> deep parser

* Current SMAF looks something like (see also SmafExample):

<?xml version='1.0' encoding='UTF-8'?> 
 <!DOCTYPE smaf SYSTEM 'smaf.dtd'>
 <smaf addressing='char'>
  <olac:olac xmlns:olac='http://www.language-archives.org/OLAC/1.0/' xmlns='http://purl.org/dc/elements/1.1/' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.language-archives.org/OLAC/1.0/ http://www.language-archives.org/OLAC/1.0/olac.xsd'>
   <dc:identifier>s1</dc:identifier>
   <creator>sciborg 1.00</creator>
   <created>16:18:08 6/07/2006 (UTC)</created>
  </olac:olac>
  <fsm init='v0' final='v3'>
   <edge type='token' id='t1' from='0' to='3' value='some' source='v0' target='v1'/>
   <edge type='oscar' id='o4' from='4' to='13' source='v1' target='v2'>
    <fs>
     <f name='type'>compound</f>
     <f name='surface'>A1S2D34F5</f>
    </fs>
   </edge>
   <edge type='token' id='t3' from='14' to='19' value='melts' source='v2' target='v3'/>
  </fsm>
 </smaf>

* SMAF edges may take a variety of types: Eg.

* SMAF edges possess a number of properties: id, standoff from/to, lattice source/target + a content blob

* A content blob consists of

* [SciBorg] We have been playing with the use of a config file to define the mapping (x). This looks something like:

;;;; PROCESSOR settings (LKB) (share with PET?)

;; "instantiate chart" with edges of following form:
;; token edges (edgeType='tok'), possess: a string (tokenStr)
;; non-token edges (edgeType='morph'), may possess: stem + partialTree
;; non-token edges which also give rise to token edge in chart (edgeType='tok+morph'):
;;    union of 'tok' and 'morph' above

token.[] -> edgeType='tok' tokenStr=content
morph.[] -> edgeType='morph' stem=content.stem partialTree=content.partial-tree
pos.[] -> edgeType='morph'
oscar.[] -> edgeType='tok+morph' tokenStr=content.surface

;;;; GRAMMAR specific settings (ERG)

;; map SMAF type into type in grammar's type hierarchy
;; map SMAF RMRS content into lexical entry

;; "slot" definitions

define gMap.type ()
define gMap.pred (synsem lkeys keyrel pred)
define gMap.carg (synsem lkeys keyrel carg) STRING
define gMap.rel (synsem lkeys keyrel)

;; syn(sem) type

oscar.[type='compound'] -> gMap.type='n_proper_nale'
oscar.[type='substance'] -> gMap.type='n_proper_nale'
oscar.[type='element'] -> gMap.type='n_proper_nale'
oscar.[type='namender'] -> gMap.type='n_proper_nale'
oscar.[type='adjective'] -> gMap.type='adj_intrans_nale'

;; semantics 

;; either pure native RMRS

oscar.[] -> gRmrs=content.rmrs

;; or slots (REL + CARG)

oscar.[type='compound'] -> gMap.pred='chem_compound_rel'
oscar.[type='substance'] -> gMap.pred='chem_substance_rel'
oscar.[type='element'] -> gMap.pred='chem_element_rel'
oscar.[type='namender'] -> gMap.pred='named_rel'

oscar.[type='compound'] -> gMap.carg=content.surface
oscar.[type='substance'] -> gMap.carg=content.surface
oscar.[type='element'] -> gMap.carg=content.surface
oscar.[type='namender'] ->  gMap.carg=content.surface

;; or feature structures as in MAF??? no

* Collect and examine some examples...

last edited 2006-06-14 07:51:43 by BenjaminWaldron

(The DELPH-IN infrastructure is hosted at the University of Oslo)