(→Similar implementations) |
|||
| (13 intermediate revisions not shown) | |||
| Line 10: | Line 10: | ||
The back end can take a set of Markov chains based on a given language and output the most likely result. [http://www.chiark.greenend.org.uk/ucgi/~tthurman/git/predictive.git/ Here's a sketch of a program to produce such a database.] | The back end can take a set of Markov chains based on a given language and output the most likely result. [http://www.chiark.greenend.org.uk/ucgi/~tthurman/git/predictive.git/ Here's a sketch of a program to produce such a database.] | ||
| + | |||
| + | ''Possibly'' we could also reuse [http://meego.gitorious.org/meegotouch/meegotouch-inputmethodengine/blobs/master/words/mimenginewordsinterface.h IM predictive text], but there are statistical reasons why this isn't a perfect solution. | ||
| + | |||
| + | Ideally we should make this use [http://www.inference.phy.cam.ac.uk/dasher/ dasher]. | ||
(What about the front end?) | (What about the front end?) | ||
| + | |||
| + | == Prototype1 == | ||
| + | |||
| + | [http://people.collabora.co.uk/~tthurman/predictive/ There is a JavaScript prototype which you may play with.] | ||
| + | |||
| + | == Prototype2 Video == | ||
| + | |||
| + | http://www.youtube.com/watch?v=8gBtVYMq_ts | ||
| + | |||
| + | == Prototype3 == | ||
| + | |||
| + | http://marnanel.org/DasherKeyboard/ | ||
== Training texts == | == Training texts == | ||
What would be appropriate training texts for each language? Public domain would be helpful, so they will be quite old, but if they're too old (e.g. Chaucer in the original) they'll be less useful for producing the data. | What would be appropriate training texts for each language? Public domain would be helpful, so they will be quite old, but if they're too old (e.g. Chaucer in the original) they'll be less useful for producing the data. | ||
| + | |||
| + | lcuk: we have extensive conversation logs from the IRC channels around which might be useful? | ||
| + | |||
| + | timeless: to some extent you're going to want to filter against a spelling dictionary to avoid typos. Some additional magic should be applied to learn proper nouns. This is all doable, I even have scripts or beginnings of scripts for some of it. | ||
| + | |||
| + | lcuk: clearly for generic training default documents, using text and books from the most general field possible would be useful. perhaps using a subset of wikipedia for instance? | ||
| + | however for truly personal language balance it would be folly to ignore my own past words, and I have an extensive irc history ready for my own use. with statistical filtering as described already it would offer completion of words I use most :) | ||
| + | other datasources may also be available? | ||
| + | |||
| + | == Similar implementations == | ||
| + | |||
| + | Turns out there's [http://thickbuttons.com something similar for Android already]. | ||
| + | <br> | ||
| + | [http://bu4.taipudex.com/pinyin.htm Taipudex] provides dictionary based predictive autocompletion and presents alternative keys in a sorted table. | ||
== IRC logs == | == IRC logs == | ||
| + | |||
| + | === 2010-11-22 === | ||
| + | |||
| + | 22:04 < sivang> lcuk: interesting idea, why not just use a dictionary and speed search through it while typing, only instead of showing remaining possible words, dim the letters that no longer take part | ||
| + | |||
| + | IMHO this would be already a good start instead of trying to do markov chains and more sophisticated stuff upfront. Also, in Maemo there is already support for that but that completes the words which is annoying. Applying this to the keyboard keys highlighted could perhaps be better. | ||
| + | |||
| + | ----- | ||
| + | |||
| + | 18:00 | ||
| + | * <timeless_mbp> marnanel: can you teach it to guess the first letter of subsequent words? | ||
| + | * <marnanel> timeless_mbp: certainly; I didn't put that in because I thought it would get annoying, but it was a specific exclusion, so easy to take out again | ||
| + | * <timeless_mbp> marnanel: it's better to have everything in and decide to turn things off | ||
| + | * <timeless_mbp> THIS IS A TEST OF SOME PREDICTION ALGORITHM | ||
| + | * <timeless_mbp> was what i typed | ||
| + | * <timeless_mbp> n.b. not really as prefs, more you try something, see if things work, and decide not to ship certain bits if they're too awkward | ||
| + | * <timeless_mbp> marnanel: oh, and um... you need to offer punctuation | ||
| + | * <timeless_mbp> marnanel: for now, please add: | ||
| + | * <marnanel> timeless_mbp: it's not supposed to be an entire working solution yet! | ||
| + | * <timeless_mbp> [tab] [caps] [shift] <- left side | ||
| + | * <timeless_mbp> ["["] ["]"] <- right side top row | ||
| + | * <timeless_mbp> [;] ['] <- right side middle row | ||
| + | * <timeless_mbp> [,] [.] < right side third row | ||
| + | * <timeless_mbp> if you drag/swipe left/right over the central area, it starts progressing in the appropriate direction | ||
=== 2010-11-20 === | === 2010-11-20 === | ||
| Line 61: | Line 115: | ||
* <marnanel> lcuk: the meego wiki? sure | * <marnanel> lcuk: the meego wiki? sure | ||
* <lcuk> and then we can flesh it out and do some bits with it | * <lcuk> and then we can flesh it out and do some bits with it | ||
| + | [[Category:MeeGo Input Methods]] | ||
Contents |
The idea is to have the virtual keyboard know what letters are more likely, and:
The back end can take a set of Markov chains based on a given language and output the most likely result. Here's a sketch of a program to produce such a database.
Possibly we could also reuse IM predictive text, but there are statistical reasons why this isn't a perfect solution.
Ideally we should make this use dasher.
(What about the front end?)
There is a JavaScript prototype which you may play with.
http://www.youtube.com/watch?v=8gBtVYMq_ts
http://marnanel.org/DasherKeyboard/
What would be appropriate training texts for each language? Public domain would be helpful, so they will be quite old, but if they're too old (e.g. Chaucer in the original) they'll be less useful for producing the data.
lcuk: we have extensive conversation logs from the IRC channels around which might be useful?
timeless: to some extent you're going to want to filter against a spelling dictionary to avoid typos. Some additional magic should be applied to learn proper nouns. This is all doable, I even have scripts or beginnings of scripts for some of it.
lcuk: clearly for generic training default documents, using text and books from the most general field possible would be useful. perhaps using a subset of wikipedia for instance? however for truly personal language balance it would be folly to ignore my own past words, and I have an extensive irc history ready for my own use. with statistical filtering as described already it would offer completion of words I use most :) other datasources may also be available?
Turns out there's something similar for Android already.
Taipudex provides dictionary based predictive autocompletion and presents alternative keys in a sorted table.
22:04 < sivang> lcuk: interesting idea, why not just use a dictionary and speed search through it while typing, only instead of showing remaining possible words, dim the letters that no longer take part
IMHO this would be already a good start instead of trying to do markov chains and more sophisticated stuff upfront. Also, in Maemo there is already support for that but that completes the words which is annoying. Applying this to the keyboard keys highlighted could perhaps be better.
18:00