Sunday, June 20, 2010

multi-directional lexica

I'm essentially thinking out loud (or online) in this post.

What I would really like is to have a program that was a multi-lingual lexicon. So, one would have multiple sets of data.

Set A:
single entry English matched to a series of glosses, in Latin (for example). ie, an English-Latin lexicon.
Set B:
single entry Latin matched to a series of English glosses. ie, a Latin-English lexicon.

For my main purposes, I only really need English-X and X-English, but there's no reason other lexica couldn't be incorporated. The program simply needs to be able to sort the data sets appropriately, and switch between them. And then one requires a search function that will accept unicode. Probably the ability to ignore accents in Greek would be helpful.

I imagine (in my programming naivety) that structurally this would not be difficult to implement. Ideal would be if the user could add their own datasets. There are then a number of older lexicons in the public domain, it would be a matter of digitising them appropriately. Woodhouse for English->Greek; Smith for English->Latin, for instance.

1 comment:

Anonymous said...

Yeah, this wouldn't be very hard to program, once the lists were in place. Searching is the slightly trickier part since you usually want to make it somewhat fuzzy, but even that wouldn't be a show stopper. It would just be a matter of getting the data sets in a proper format. Something like this would definitely be helpful for compositions and the like.