The idea
The idea is to have the virtual keyboard know what letters are more likely, and:
- make them easier to hit, and
- (possibly) highlight them visually, or at least dim all the other ones.
Implementation
The back end can take a set of Markov chains based on a given language and output the most likely result. Here's a sketch of a program to produce such a database.
Possibly we could also reuse IM predictive text, but there are statistical reasons why this isn't a perfect solution.
(What about the front end?)
Prototype
There is a JavaScript prototype which you may play with.
Training texts
What would be appropriate training texts for each language? Public domain would be helpful, so they will be quite old, but if they're too old (e.g. Chaucer in the original) they'll be less useful for producing the data.
lcuk: we have extensive conversation logs from the IRC channels around which might be useful?
timeless: to some extent you're going to want to filter against a spelling dictionary to avoid typos. Some additional magic should be applied to learn proper nouns. This is all doable, I even have scripts or beginnings of scripts for some of it.
lcuk: clearly for generic training default documents, using text and books from the most general field possible would be useful. perhaps using a subset of wikipedia for instance?
however for truly personal language balance it would be folly to ignore my own past words, and I have an extensive irc history ready for my own use. with statistical filtering as described already it would offer completion of words I use most :)
other datasources may also be available?
IRC logs
2010-11-22
22:04 < sivang> lcuk: interesting idea, why not just use a dictionary and speed search through it while typing, only instead of showing remaining possible words, dim the letters that no longer take part
IMHO this would be already a good start instead of trying to do markov chains and more sophisticated stuff upfront. Also, in Maemo there is already support for that but that completes the words which is annoying. Applying this to the keyboard keys highlighted could perhaps be better.
2010-11-20
- <marnanel> lcuk: so someone at the conference said (not in these terms) that what we need is to take n-grams of English text and produce Markov chains such that after n-1 letters entered on the osk, the three or so most likely following letters became *slightly* larger
- <lcuk> yes marnanel
- <marnanel> lcuk: so I wondered whether you know whether anyone else was working on it
- <marnanel> lcuk: otherwise I might.
- <thiago_home> marnanel: they don't have to be larger. Just their hit areas.
- <marnanel> thiago_home: true
- <lcuk> marnanel, I remarked that to slightly dim the other non useful characters would be a good visual indicator
- <marnanel> lcuk: oh yeah, I remember you saying that
- <marnanel> lcuk: I could probably fix up UI stuff to do that. but atm I am thinking about the back-end implementation
- <marnanel> I think this could be a thing of great niftiness
- <lcuk> i heard the keymats for the vkb are a big ass svg file
- ...
- <lcuk> anyway, not sure if those keymats can have birightness modified on the fly or if the hitzones can be effected
- <lcuk> and I am not sure how I would proceed using qml
- <lcuk> whether you can have an element but have its hit area larger without intefering
- <marnanel> I don't know either.
- <lcuk> making a new transparant layer ontop with bigger hitzones would suffice
- marnanel nods
- lcuk hmms
- <lcuk> i am wondering how I would implement it by thinking about the liqbase keyboard
- <lcuk> marnanel, how many potential keys would we need to show larger
- <marnanel> lcuk: I am thinking three-ish. more than that would defeat the point
- <lcuk> marnanel, reasonable
- <lcuk> so holding a group of transparent redefinable (and relocatable) widgets sitting ontop of the keyboard which would offer larger hitzone for the press would work?
- <marnanel> I believe so, yes
- <marnanel> lcuk: so if I wrote a library where you gave it zero to three letters and it returned a string of three characters which were likely to follow, that would be a start
- <marnanel> plus the database
- <lcuk> marnanel, one thing about keyboards - using circular distance algorythms from the centre of each letter
- <lcuk> rather than a rectangle hitzone
- <lcuk> would be nicer potentially
- <lcuk> marnanel, so the library would accept a string
- <lcuk> which is the text leading upto the cursor current position
- <marnanel> lcuk: circular> yes, we just use pythagoras with respect to each letter and find the shortest maybe, with weighting for the more likely letters
- <marnanel> lcuk: yes
- <lcuk> marnanel, lets talk again on monday after getting some thoughts sunk in
- <marnanel> lcuk: yeah, good plan
- <marnanel> lcuk: I'll hack around with ngrams a bit maybe
- <lcuk> do you want to perhaps copy paste this convo into the wiki
- <marnanel> lcuk: the meego wiki? sure
- <lcuk> and then we can flesh it out and do some bits with it