Unique_nickname
Full Member | Редактировать | Профиль | Сообщение | Цитировать | Сообщить модератору Why is using vbr mp3 in avis a bad idea? We discussed this issue at length in irc and Cyrius (suiryc) gave a pretty good explanation: [21:33] <Belgabor|Home> cyrius, what did your experimets tell? [21:33] <Suiryc> Belgabor|Home : I think I know know why VBR is not good, and also why Nando's hack works (somehow) [21:33] <Suiryc> s/know/now [21:33] <Belgabor|Home> ok, tell me [21:33] <Suiryc> [21:34] <Suiryc> first of all there are 2 'headers' in the AVI (audio) stream [21:34] <Belgabor|Home> I have the feeling i need to hammer that down some throat soon [21:34] <ChristianHJW> lol [21:34] <Suiryc> first one is a general one (the same struture is used for each track) [21:35] <Suiryc> AVISTREAMINFO [21:35] <Suiryc> (IIRC ... there should use shorter names ...) [21:35] <Belgabor|Home> lol [21:37] <spyder482> ChristianHJW: I won't be moving for a few months still though [21:37] <Suiryc> this one tell how many frames there are in the stream [21:37] <Suiryc> and what is the rate of the frames [21:37] <Suiryc> thanks to dwRate & dwScale fields [21:37] <Belgabor|Home> got that [21:38] <Suiryc> it also contains a field saying the size of 1 frame [21:38] <Suiryc> if VBR, then it is set to 0, otherwise it is set to the correct value [21:38] <Belgabor|Home> dwSampleSize [21:39] <Suiryc> yep [21:39] <Suiryc> then there is a header specific to the audio stream (based on WAVEFORMATEX) [21:39] <Suiryc> this one tell the samplerate (44100, 48000, ...) [21:39] <Suiryc> the byterate [21:39] <Suiryc> the format (wFormatTag) [21:40] <Suiryc> and especially contains a field names nBlockAlign [21:40] <Suiryc> nBlockAlign tell how many bytes an audio frame contains [21:40] <Suiryc> _BUT_ [21:40] <Belgabor|Home> And that musnt be 0 [21:40] <Suiryc> cannot be set to 0 [21:40] <spyder482> so much work for AVI... [21:40] <Suiryc> [21:40] <Belgabor|Home> ok, i think i get the picture [21:40] <Suiryc> ok so let's continue [21:41] <Belgabor|Home> ok [21:41] <ChristianHJW> all with you guys ... [21:41] <Suiryc> in Nandub here is what happens with an MP3 stream (VBR one) [21:42] <Suiryc> Nando set dwRate to the samplerate (44100, 48000, ...) [21:42] <spyder482> don't you two have a channel for this? [21:42] <Suiryc> spyder482 : shut up [21:42] <Suiryc> and set dwScale to 1152 [21:42] <spyder482> lol [21:42] <Belgabor|Home> no, the other one is just for lurking [21:42] <Belgabor|Home> [21:42] <Suiryc> :] [21:42] <spyder482> hehe [21:43] <Suiryc> and set nBlockAlign to 1152 too [21:43] <Suiryc> then, when muxing it only treat whole MP3 frames [21:43] <Suiryc> (i.e. each MP3 frame is in its own Chunk) [21:44] <Suiryc> you still follow ? [21:44] <md`> who has done the mpeg2 import part of vdmod? [21:44] <Belgabor|Home> ok, one mp3 frame is what? [21:44] <Belgabor|Home> pulco-citron [21:44] <md`> hmpf [21:44] <spyder482> pulco-citron [21:44] <spyder482> oh [21:44] <spyder482> [21:45] <md`> why does he generate d2v and dont let the user decide to pick one... [21:45] <Belgabor|Home> dunno [21:45] <md`> if there is one already [21:45] <md`> hmmm [21:45] <Suiryc> Belgabor|Home : an Mpeg1-Layer3 frame is the shorter block of data you can use [21:45] <ChristianHJW> let Suiryc finish guys .. please [21:45] <md`> yes ok [21:45] <Belgabor|Home> ok [21:45] <spyder482> ChristianHJW: check #virtualdub [21:45] <Suiryc> it contains an header saying what is in the frame, and then the data (audio) [21:46] <ChristianHJW> we have to know whats wrong in AVI to be able to advertise matroska [21:46] <Belgabor|Home> this is how much data? [21:46] <Suiryc> somehow 1 MP3 frame ~ 1 video frame [21:46] <Belgabor|Home> ChristianHJW: lol [21:46] <Suiryc> the size of a frame depends on the MP3 settings [21:46] <Suiryc> (i.e. bitrate, ...) [21:46] <Belgabor|Home> ok [21:47] <Belgabor|Home> is it fixed for a file or varible in vbr? [21:47] <Suiryc> however a Mpeg1-layer3 frame conatins 1152 samples [21:47] <Suiryc> the size of a frame is variable [21:47] <Suiryc> even in CBR [21:48] <Suiryc> (e.g. frames will be of 417 or 418 bytes) [21:48] <Belgabor|Home> ok, but 1152 is the upper limit? [21:48] <Suiryc> because a fixed btrate must be achieved [21:48] <Suiryc> 1152 is the number of samples a frame contains [21:48] <Suiryc> each frame (whatever its size may be) contains 1152 samples [21:49] <Belgabor|Home> oic [21:49] <Suiryc> so let's continue [21:49] <Suiryc> each frame contains 1152 samples [21:49] <Belgabor|Home> ok [21:49] <Suiryc> and the rate of the stream (in AVISTREAMINFO) has been set to : [21:49] <Suiryc> dwRate / dwScale = SampleRate/1152 [21:50] <Suiryc> since each Frame contains 1152 it is equal to the 'framerate' [21:50] <Suiryc> (as for video) [21:50] <Belgabor|Home> ok, i think i got that [21:50] <Suiryc> now you must recall that each frame is in its own AVI chunk [21:50] <Belgabor|Home> ok [21:50] <Suiryc> so it is also the 'chunkrate' [21:51] <Suiryc> so here is now what happens (it is most likely what happens) when playing the file in Window Media Player [21:51] <Belgabor|Home> ic [21:51] <Suiryc> WMP will get both headers [21:52] <Suiryc> which will say to it that the rate of the stream is SampleRate/1152 [21:52] <Belgabor|Home> gimme a sec, brb [21:52] <Suiryc> and that each audio frame is 1152 bytes long (nBlockAlign) [21:52] <Suiryc> k [21:53] <Belgabor|Home> back [21:54] <Suiryc> ok so WMP believe each frame is 1152 bytes long [21:54] <Belgabor|Home> yeah [21:54] <Suiryc> which is not the case (generally frames are around 400 bytes long with 128kbps stream) [21:55] <Suiryc> but [21:55] <Belgabor|Home> yeah, got that much [21:55] <Suiryc> now you are reading data in the file [21:55] <Suiryc> and WMP needs to know when to read the audio [21:55] <Suiryc> (i.e. to which time correspond an audio frame) [21:56] <Suiryc> to do so it will look at all the previous audio chunks in the file [21:56] <Suiryc> for each shunk it divide the size (in bytes) of the chunk by nBlockAlign to know how many frames there were in the chunk [21:56] <Belgabor|Home> ok [21:56] <Suiryc> s/shunk/chunk [21:57] <Belgabor|Home> ok [21:57] <Suiryc> (since every tools dealing with the stream must cut on nBlockAlign boundaries) [21:57] <Suiryc> since each chunk is shorter than 1152 bytes (nBlockAling) it shoul get 0 [21:57] <Suiryc> but this is not possible [21:58] <Suiryc> since tools work on blocks of nBlockAlign bytes, it must assume than there is at least 1 frame in the chunk [21:58] <Suiryc> (even if the chunk is shorter) [21:59] <Suiryc> so for each chunk it find there is 1 frame in it [21:59] <Suiryc> which is really the case (each mp3 frame is in its own chunk) [21:59] <Suiryc> so WMP got the correct number of mp3 frames played so far [22:00] <Suiryc> and since it has the correct rate (each frame contains 1152 samples, and the rate of the stream is SampleRate/1152) [22:00] <Suiryc> it also got the correct timecode for the frame [22:00] <Belgabor|Home> ok [22:00] <Suiryc> resulting in a perfectly synched MP3 stream [22:01] <Suiryc> I was lead to this conclusion without debugging WMP while playing but with some tests I made : [22:02] <Suiryc> I changed the dwScale value (with or without the nBlockAlign value) [22:02] <Suiryc> but this resulted in otu of synch issues (audio playing too fast/slow) [22:02] <Suiryc> out* [22:02] <Suiryc> I changed the nBlockAlign valuie : [22:03] <Suiryc> setting it to 1 and then I have out of synch issues too [22:03] <Suiryc> but setting it 2304 and I stil have a perfectly synched stream [22:03] <Belgabor|Home> ok [22:04] <Suiryc> so in fact the 1152 value in nBlockAlign could be anything else [22:04] <Suiryc> _but_ [22:04] <Suiryc> must be higher than the size of an mp3 frame [22:04] <Belgabor|Home> ok, what happens if you set it to 0? [22:04] <Suiryc> lol [22:05] <Suiryc> if you set it to 0 then WMP won't play the stream (the icon for audio is disabled like if there is no audio in the file) [22:05] <Suiryc> so no VBR [22:05] <Belgabor|Home> ok [22:06] <Belgabor|Home> so the failure is in priciple not in avi, but in the WAVEFORMATEX header [22:06] <Suiryc> yep [22:06] <Suiryc> but since the AVI will use WAVEFORMATEX for audio headers, it is still a failure in AVI specs [22:07] <Belgabor|Home> do you have the resemblance of an idea why vbr mp3 fails? [22:07] <Belgabor|Home> yep [22:07] <Suiryc> <Belgabor|Home> do you have the resemblance of an idea why vbr mp3 fails? <-- you mean why it is not good ? [22:08] <Belgabor|Home> yep, why it fails sometimes [22:08] <ChristianHJW> thats what i am interested in also [22:08] <Suiryc> well in the case of WMP, it will divide the chunk size by nBlockAlign [22:08] <Suiryc> (that's what I think, since the synch is good) [22:08] <Suiryc> and will set it to 1 if the chunk size is too small [22:09] <Suiryc> but there is another way to compute timecode [22:09] <Suiryc> (assuming that you have CBR of course) [22:09] <Suiryc> you take the total bytes in previous chunks [22:09] <Suiryc> and divide it by nblockAlign [22:10] <Belgabor|Home> which fails miserably for the vbr hack [22:10] <Suiryc> of course in this case you get a completly wrong value since mp3 frames are not 1152 bytes lnog [22:10] <Suiryc> yep [22:10] <Suiryc> otehr tools may also assume that the chunk is not valid (corrputed) since its size is shorter than nBlockAlign [22:11] <Belgabor|Home> ok, thats the failure in principle, but why are some files broken? [22:12] <Suiryc> what files ? [22:12] <Suiryc> broken ? what do you mean by broken ? [22:13] <Belgabor|Home> i had some vbr mp3 avis which seemed like having divx3 freeze frames but where ok when demuxed [22:13] <Suiryc> dunno [22:13] <Suiryc> maybe a problem with the decoder [22:14] <Belgabor|Home> ok, well that cleared things up a bit [22:14] <Belgabor|Home> thx [22:14] <Suiryc> [22:14] <Suiryc> btw there may be problems with Nandub code [22:14] <Suiryc> because : [22:14] <Suiryc> 1. layer1 streams only have 384 samples per frame [22:15] <Suiryc> 2. IIRC with very high bitrates an mp3 frame can be higher than 1152 bytes [22:15] <Suiryc> s/higher/bigger [22:16] <Suiryc> (the max size is near 2000 bytes IIRC) [22:16] <Belgabor|Home> ok, so nBlockAlign should be >2000 [22:17] <Suiryc> so depending on the way dividing is used (rounding to floor or ceil or nearest value) [22:17] <Suiryc> and the max size of a frame, it may find there are 2 frames in a chunk where there is only 1 frame [22:17] <Belgabor|Home> ok, i got that [22:18] <Suiryc> but this is for really high bitrates ... [22:18] <Suiryc> lemme check ... [22:19] <Belgabor|Home> what would happen if we put two frames in one chunk? aka set dwRate = 2* sample rate and so on? [22:20] <Belgabor|Home> no, not two, just double the values? [22:21] <Suiryc> if you double the value the rate of the audio will be changed accordingly [22:21] <Suiryc> so to keep it correct you would have to put 2 mp3 frames in each chunk [22:22] <Suiryc> but then you would most likely go beyond the 1152 bytes per chunk [22:22] <Suiryc> and increase the chances to generate out of synch problems [22:23] <Belgabor|Home> let me rethink [22:24] <Suiryc> changing dwRate and dwScale only affects the rate of the stream [22:25] <Suiryc> multiplying dwRate by 2 => audio play 2 times faster [22:25] <Suiryc> multiplying dwScale by 2 => audio play 2 times slower [22:25] <Belgabor|Home> if we double dwrate, dwscale, nblockalign and dwsamplesize? [22:25] <Suiryc> multipyling both => no change [22:25] <Suiryc> dwSampleSize is set to 0 [22:26] <Belgabor|Home> ah ok, so skip that [22:26] <Suiryc> (dwRate, dwScale) and nBlockAlign are not linked [22:26] <Suiryc> you can use a higher value in nBlockAlign [22:27] <Suiryc> (like the 2304 I tested) [22:27] <Belgabor|Home> nvertheless, if we double all three, shouldnt it be safe for larger mp3 frames? [22:27] <Suiryc> this won't change anything in the case of WMP because something lower than 1152 divided by 1152 or 2304 will still be rounded to 0 [22:27] <Suiryc> Belgabor|Home : this would be safer [22:28] <Suiryc> but would cause even more troubles in apps that don't work the same way than Nandub & WMP [22:28] <Suiryc> I think some apps sometimes check a value of 1152 to know it was made by Nandub [22:28] <Belgabor|Home> ok, i see the point [22:29] <Belgabor|Home> faulty concept stays faulty [22:33] <Suiryc> k I checked [22:33] <Suiryc> keeping 1152 shoudln't cause too much problems [22:33] <Suiryc> for Mpeg1-Layer2/3 the mas is near 1750 bytes long [22:34] <Suiryc> there could be problems with Mpeg2/2.5-layer2/3 [22:34] <Suiryc> where a 160kbps stream of 8kHz have frames of 2881 bytes long at most [22:35] <Suiryc> anyway I don't think people use this kind of stream [22:36] <ChristianHJW> highly unlikely .. [22:51] <Suiryc> nite [22:52] * Suiryc has left #matroska |