(Created page with " This topic provides an outline of requirements for a head unit. The requirements are derived from a real life development project for in-vehicle infotainment platform. ''This …") |
(→UI_Speech_Recognition_Function) |
||
| (28 intermediate revisions not shown) | |||
| Line 1: | Line 1: | ||
| - | |||
This topic provides an outline of requirements for a head unit. The requirements are derived from a real life development project for in-vehicle infotainment platform. ''This is still work in progress--the content will be added gradually.'' | This topic provides an outline of requirements for a head unit. The requirements are derived from a real life development project for in-vehicle infotainment platform. ''This is still work in progress--the content will be added gradually.'' | ||
| Line 16: | Line 15: | ||
# After power-on, the head unit accepts first user input (e.g. HMI menu, screen, volume control) within max. 2 s. | # After power-on, the head unit accepts first user input (e.g. HMI menu, screen, volume control) within max. 2 s. | ||
# After power-on, the head unit displays the Navigation map and starts the route guidance, resuming the previous route, within max. 5 s. | # After power-on, the head unit displays the Navigation map and starts the route guidance, resuming the previous route, within max. 5 s. | ||
| + | # Head unit reacts to user input via HMI (touch screen events, button or switch presses) within max. 100 ms. Head unit displays interim response (e.g. sand glass) for all requests that cannot be completed within this time | ||
| + | # Head unit performs voice recognition for a single word against G2P vocabulary within max. 200 ms measured from the end of utterance (silence detection) to recognition result available (e.g. HMI display update or resulting voice prompt output). | ||
| + | # Head unit performs hand writing recognition against Chinese standard 2000 character set within max. 200 ms measured from end of entry to recognition result available (e.g. HMI display update or resulting voice prompt output). | ||
| + | # Head unit exhibits the refresh rate of min 15 fps for the navigation map in any scaling. | ||
| + | # Head unit exhibits a graphics rendering frame rate (for HMI animations and Navigation maps) of min. 15 fps. | ||
== Audio == | == Audio == | ||
=== Audio_Codecs_Playback === | === Audio_Codecs_Playback === | ||
| + | # Head unit supports the following codecs: | ||
| + | ## AAC LC | ||
| + | ## MP3 | ||
| + | ## WMA | ||
| + | ## WAV | ||
| + | ## Ogg Vorbis | ||
| + | ## FLAC | ||
=== Audio_Processing_And_Management === | === Audio_Processing_And_Management === | ||
| - | # up to 4 audio zones | + | # Head unit supports up to 4 independent audio zones (sinks). One zone if 5.1 channels (e.g. cabin speakers) and others are stereo. (Each zone might require its own independent audio processing and audio codec instances.) |
| - | # audio | + | # Head unit allows the user to adjust at least audio treble and bass level. |
| - | # any | + | # Head unit allows the user to adjust balance and fade level. |
| + | # Head unit is capable of playing back any of its sources to any of 4 audio zones including simultaneous playback to up to 4 audio zones. | ||
| + | # Head unit uses pre-defined fader ramps when switching the audio source for any particular audio zone. | ||
| + | # For each audio zone, the head unit keeps track of the most recently played audio source and restores it upon startup. (For example, when the FM tuner was played to the cabin speakers and the head unit has been switched off, it would attempt tuning to the same FM station and putting it to the cabin speakers upon the next startup.) | ||
| + | # For each audio zone, all available audio sources are assigned a unique priority. When an audio source is being played back and a higher priority source becomes available (e.g. the phone application attempts to play a ring tone), the latter overrides the former. | ||
| + | ## As a configuration option, overriding can be done either by muting the first source and playing back the second source, or by lowering the volume of the first source and mixing it with the second source. | ||
=== Audio_Generation === | === Audio_Generation === | ||
| Line 35: | Line 51: | ||
=== Audio_Volume_Control === | === Audio_Volume_Control === | ||
| - | # vehicle speed | + | # Volume level is adjustable by the user. |
| - | # | + | ## Adjustments by the user are only accepted during actual audio playback. (In other words, any volume changes e.g. during start up phase and before the audio output can be heard will be ignored.) |
| + | ## The volume level selection by the user is maintained for each audio source / sink pair individually. | ||
| + | # Volume level is adjusted by the head unit accordingly to the current vehicle speed. | ||
| + | # Volume level set by the user is preserved through shutdown cycle. | ||
| + | ## Upon start up, the previous volume level is only restored up to a defined 'start up maximum' (as contrasted to the 'absolute maximum' that can be set after the head unit is up and running in its normal mode). | ||
| + | # Under limit volume need to be set as some low level NOT minimum just to make sure that at least user can recognize low level volume even though user set the volume to minimum which is basically "MUTE" before the power cycle | ||
| + | # Upper limit volume need to be set as some high level NOT maximum just to make sure that user can not be annoyed due to the previous maximum volume setting. | ||
| + | # In Emergency Case, volume should not be adjusted by the user, volume need to be fit as some very loud level as much as no sound is masked with ambient noisy background, | ||
| + | |||
| + | === Handfree Functionality === | ||
| + | # Noise reduction | ||
| + | # Echo cancellation | ||
| + | # Residual echo cancellation | ||
| + | # Automatic gain control | ||
| + | # Programmable Equalization | ||
| + | # Performance Requirement | ||
| + | - objective testing : VDA 1.6 | ||
| + | - subjective testing : In-Vehicle live testing | ||
== Connectivity == | == Connectivity == | ||
| Line 84: | Line 117: | ||
==== UI_Touch_Sreen_Hand_Writing_Detection ==== | ==== UI_Touch_Sreen_Hand_Writing_Detection ==== | ||
==== UI_Speech_Recognition_Function ==== | ==== UI_Speech_Recognition_Function ==== | ||
| + | 1.Recognition Category | ||
| + | #discrete command - command, single digit ( e.g. "dial",'1','2',..etc ) | ||
| + | #continuous command - continuous digit ( e.g. "dial 555-1212", " 555-1212" ) | ||
| + | #natural language understaning - flexible recognition ( e.g. "I want to make a call to john","please route to the san franciso regency hyatt hotel",..etc) | ||
| + | #Dynamic vocabulary (e.g. phonebook, MP3 Title) | ||
| + | ##number of phonebook and Music Title is not determined, User flexblly add and remove the list whatever they want, this list is dynamically loaded for a recognition, especially G2P (Grapheme-To-Phoneme) techinique is required to generate the phonetic transcription for the new words | ||
| + | #VDE ( Voice Destination Entry ) | ||
| + | ##Multi Step (e.g. "MI" and then "troy" and then "1307" | ||
| + | ##One shot (e.g. "1307 troy, MI") | ||
| + | |||
| + | 2.Recognition Response time | ||
| + | #discrete command - 300ms | ||
| + | #continuous command - 1200ms | ||
| + | #phone book & Music Title - 1200ms | ||
| + | #natural language understanding, VDE - 1500ms | ||
| + | |||
| + | 3.Recognition performance measurement | ||
| + | #overall accuracy measurement | ||
| + | ##Average Sentence Accuracy = (Total Number of Correct Sentences)/(Total Number of Sentence Attempted) | ||
| + | #individual accuracy measurement | ||
| + | ##Average Word Accuracy = ((Total number of attempts) - (insertions) - (deletions) - (substitutions))/Total number of attempts | ||
| + | |||
| + | 4.Recognition performance requirement | ||
| + | #discrete command | ||
| + | ##IDLE SNR>20dB (98%>) , Middle noisy SNR>10dB ( 95%> ), Too much noisy SNR> 6dB ( 92%) , SNR <6dB (rejection ) | ||
| + | #continuous command | ||
| + | ##IDLE SNR>20dB (98%>) , Middle noisy SNR>10dB ( 95%> ), Too much noisy SNR> 6dB ( 92%) , SNR <6dB (rejection ) | ||
| + | |||
| + | 5.Speech Recognition functionality | ||
| + | #confidence score | ||
| + | ##the recognized result can be accepted/rejected according to confidence score, which represent how confidentially result can be accepted in terms of log likelihood, for example, if the developer set the confidence threshold to 40, basically assumed that confidence score range is between 0(low confidence score)to 100(high confidence score), result can be accepted only if confidence score is greater than confidence threshold 40 | ||
| + | #grammar weight | ||
| + | ##In case of poor accuracy command compare to other candidate in grammar, the weight can be adjusted to eqaulize the result, for example ( 1.1 dial | 0.9 store | 1.0 one | 1.0 two | 1.0 three | 0.9 four | .... | 1.1 oh ) , 1.1 means that more weight , and 1.0 is equal unity gain, 0.9 means that less weight compare to the unity gain | ||
| + | #SNR rejection | ||
| + | ##if ambient noisy is too much, this can be measured by SNR (Signal to Noise Ratio), the it's better to reject recognized result in case of SNR is lower than some specific threahold which is potentially too much corrupted by noise condition | ||
| + | ##even though SNR is lower than some level which is not reliable to get the correct result. if confidence score is extremely high, the result can be accepted depend on the OEM specification | ||
| + | #Talk too soon | ||
| + | ##if user start the utterance before the start beep is playback to user, there is tendancy to chop at the beginning of the utterance, so there is high possibility of result can be misrecognized, but this also can be accepted somewhat confidence score is too high to accepted the result | ||
| + | #AGC(automatic Gain control) | ||
| + | ##speaking style is too difierent user by user, some user's voice is very low and smooth, and others are very strong and loud, even same user can speak different sound occasionally, so AGC is obviouly necessary to bring the volume up in case of soft voice, otherwise bringing the volume down against to loud voice. how this functionality also can be accepted/rejected according to their requirement. Usually AGC does not have impact to the accuracy. but recommed to use this functinality for higher usability. | ||
| + | |||
=== Multiple_HMI_Languages === | === Multiple_HMI_Languages === | ||
=== Software_Support === | === Software_Support === | ||
| Line 99: | Line 173: | ||
== Navigation == | == Navigation == | ||
=== Navigation_Engine === | === Navigation_Engine === | ||
| + | # map source | ||
| + | # routing | ||
| + | # searching (local / web service) | ||
| + | # geocoding / reverse geocoding | ||
| + | |||
=== Map_Function === | === Map_Function === | ||
# database | # database | ||
| - | # 2D/3D representation | + | # 2D/3D representation (pitch) |
| + | # zooming (IN / OUT / auto-zoom) | ||
| + | # panning (NORTH / SOUTH / EAST / WEST / and combination of these 4 eg. NORTH-WEST) | ||
| + | # orientation / northing (ON - map oriented north / OFF - map oriented in travelling direction) | ||
| + | # follow gps signal ( ON - map cursor follows gps signal / OFF - map cursor does not follow gps signal / timeout - number of updates to wait before cursor follows gps signal on map; useful for panning) | ||
| + | # set / clear destination | ||
| + | # center the map | ||
| + | # set/change map layout (day / night / detailed - POIs / plain simple) | ||
| + | # bookmarks (see also Destination_Import) | ||
| + | # OSD (on-screen display) information | ||
| + | # set units: metric / imperial / ... | ||
=== Real_Time_Traffic_Information === | === Real_Time_Traffic_Information === | ||
| Line 108: | Line 197: | ||
=== Points_of_Interestes_POIs === | === Points_of_Interestes_POIs === | ||
=== Destination_Import === | === Destination_Import === | ||
| + | # from file (eg. USB) | ||
| + | # from web server / web service | ||
| + | # free input | ||
| + | # coordinate units conversion | ||
== Network == | == Network == | ||
| Line 173: | Line 266: | ||
# pre-installed | # pre-installed | ||
# nomadic device | # nomadic device | ||
| + | |||
| + | [[Category:IVI]] | ||
This topic provides an outline of requirements for a head unit. The requirements are derived from a real life development project for in-vehicle infotainment platform. This is still work in progress--the content will be added gradually.
Some of the requirements below will be fulfilled outside of the MeeGo IVI based software. For example, the implementation of CAN network interface and early audio functions most probably falls into this category. The decisions about implementation of specific requirements in MeeGo IVI software will be made assuming a specific system architecture.
- objective testing : VDA 1.6 - subjective testing : In-Vehicle live testing
1.Recognition Category
2.Recognition Response time
3.Recognition performance measurement
4.Recognition performance requirement
5.Speech Recognition functionality