IETF
avtcore@jabber.ietf.org
Thursday, March 11, 2021< ^ >
Jonathan Lennox has set the subject to: IETF AVTCore Working Group
Room Configuration
Room Occupants

GMT+0
[11:50:03] Alessandro Amirante_web_274 joins the room
[11:50:03] Youenn Fablet_web_251 joins the room
[11:50:03] Bernard Aboba_web_790 joins the room
[11:50:21] alex-meetecho joins the room
[11:51:47] Harald Alvestrand_web_787 joins the room
[11:51:49] Jonathan Lennox_web_627 joins the room
[11:51:57] hta joins the room
[11:53:10] Sergio Garcia Murillo_web_207 joins the room
[11:53:29] Takio Yamaoka_web_336 joins the room
[11:53:33] Takio Yamaoka_web_336 leaves the room
[11:53:38] Takio Yamaoka_web_865 joins the room
[11:54:04] Alex Gouaillard_web_235 joins the room
[11:54:20] <Alex Gouaillard_web_235> ready to take notes
[11:55:38] Cullen Jennings_web_360 joins the room
[11:55:53] Cullen Jennings_web_360 leaves the room
[11:57:07] Alan Ford_web_466 joins the room
[11:57:16] Bernard Aboba_web_790 leaves the room
[11:57:27] Youngkwon Lim_web_478 joins the room
[11:57:34] <Jonathan Lennox_web_627> Thank you!
[11:57:37] Renan Krishna_web_577 joins the room
[11:57:48] <Alex Gouaillard_web_235> ;-)
[11:59:07] James Gruessing_web_349 joins the room
[11:59:16] Barry Leiba_web_709 joins the room
[11:59:34] Youngkwon Lim_web_478 leaves the room
[11:59:42] Stephan Wenger_web_825 joins the room
[11:59:49] Youngkwon Lim_web_317 joins the room
[11:59:57] Roni Even_web_396 joins the room
[12:00:01] James Gruessing joins the room
[12:00:43] Yago Sanchez_web_634 joins the room
[12:00:46] Tim Bruylants_web_142 joins the room
[12:00:47] Bernard Aboba_web_801 joins the room
[12:01:05] <Alex Gouaillard_web_235> good evening
[12:01:08] Roland Jesske_web_624 joins the room
[12:01:16] Shuai Zhao_web_531 joins the room
[12:01:18] Shuai Zhao_web_531 leaves the room
[12:01:21] Shuai Zhao_web_533 joins the room
[12:01:28] Florent Castelli_web_159 joins the room
[12:01:43] Bernard Aboba_web_801 leaves the room
[12:01:47] Roni Even_web_396 leaves the room
[12:01:47] Shuai Zhao_web_533 leaves the room
[12:01:50] Shuai Zhao_web_950 joins the room
[12:01:56] <Alex Gouaillard_web_235> i m no tpresenting
[12:01:58] <Alex Gouaillard_web_235> not
[12:02:03] Shuai Zhao_web_950 leaves the room
[12:02:04] Roni Even_web_267 joins the room
[12:02:04] Murray Kucherawy_web_690 joins the room
[12:02:07] Shuai Zhao_web_256 joins the room
[12:02:29] Joerg Ott_web_604 joins the room
[12:02:53] Cullen Jennings_web_381 joins the room
[12:03:18] Joerg Ott_web_604 leaves the room
[12:03:31] Timothy Panton_web_481 joins the room
[12:03:37] Mo Zanaty_web_434 joins the room
[12:04:23] Justin Uberti_web_404 joins the room
[12:04:32] Chris Wendt_web_646 joins the room
[12:05:26] Cullen Jennings_web_381 leaves the room
[12:05:29] Cullen Jennings_web_631 joins the room
[12:06:04] Colin Perkins_web_946 joins the room
[12:09:08] <Barry Leiba_web_709> Usual protocol: If you want me to speak on your behalf, please prefix comment with "mic:"
[12:12:37] james welch_web_454 joins the room
[12:14:20] Bernard Aboba_web_731 joins the room
[12:14:59] Murray Kucherawy_web_690 leaves the room
[12:15:05] Murray Kucherawy_web_938 joins the room
[12:20:46] <Justin Uberti_web_404> libwebrtc doesn't fill in max-fs or max-fr at this point in time.
[12:21:04] <Jonathan Lennox_web_627> Does it consume them?
[12:21:25] Joerg Ott_web_332 joins the room
[12:21:25] Joerg Ott_web_332 leaves the room
[12:21:38] Joerg Ott_web_813 joins the room
[12:25:42] <hta> I think it still ignores them. Been on my TODO list for a few years.
[12:28:24] Roman Shpount_web_221 joins the room
[12:30:26] Mike English_web_630 joins the room
[12:32:15] <Mo Zanaty_web_434> For VP9 max-fs/max-fr, hardware decoders often have hard limits unlike software. Hardware decoders have become more widely deployed than before, so making decisions based primarily on the libwebrtc VP9 software implementation may not be the best choice now.
[12:32:58] <hta> I have had it on my list since we agreed to add them for VP8; the same support code is needed for doing proper H.264 profile support.
[12:33:31] <hta> just hasn't made it to the top of anyone's list yet.
[12:34:46] Joerg Ott_web_813 leaves the room
[12:35:25] <Jonathan Lennox_web_627> Sorry, my slides tab seems to be very sensitive.
[12:35:27] <Mo Zanaty_web_434> I'm less concerned about the Chrome implementation, more about a standard which says all limits are soft, because it mostly considers a single software implementation not the hardware decoders starting to get wider deployment.
[12:37:14] <hta> I tend to agree. Overrunning a buffer is not a nice thing to do.
[12:37:47] Mike English_web_630 leaves the room
[12:38:22] <Jonathan Lennox_web_627> OTOH current implementations as mentioned aren't respecting it
[12:40:12] <hta> I think we should declare that as a bug. If existing implementations advertise max-fs and max-fr that are realistic for current usage, there should be no interoperability issue caused by that - and when people start sending frames beyond current decoders' capabilities, crashes *will* ensue. (8K@120 anyone?)
[12:41:52] <Justin Uberti_web_404> That makes sense. But will 5K @ 1fps cause the same problems?
[12:42:26] <hta> seems like it's time to write a test :-)
[12:44:58] james welch_web_454 leaves the room
[12:46:22] Joerg Ott_web_484 joins the room
[12:54:59] <Alex Gouaillard_web_235> NALU/OBU is an encoder packetization, let s be explicit and call the 'packetizer' box in the diagram an 'RTP packetizer'
[12:56:12] <hta> we have 2 level chunking today in most codecs. The first level is called "frame" or "nalu".
[12:56:21] <Alex Gouaillard_web_235> no I agree, but ....
[12:56:37] <Alex Gouaillard_web_235> playing on the fact that youenn wrote packetizer instead of explicitely RTP packetization .......
[12:57:51] <Mo Zanaty_web_434> 3 layers:
1. NALU/OBU
2. layer-frame / sub-picture (if spatial scaling)
3. RTP packets
[12:58:20] Magnus Westerlund_web_426 joins the room
[12:58:53] <Stephan Wenger_web_825> Mo, are these meant to be in order?  Shouldn't the order be 1 3 2?
[13:00:09] <Mo Zanaty_web_434> Did Taxonomy ever standardize a name for "layer frame / sub-frame / sub-picture"? It would be good to agree on that now for clarity of discussion.
[13:00:38] <Mo Zanaty_web_434> Stephan, it is 1 2 3 in this presentation.
[13:01:49] <Mo Zanaty_web_434> The subframe extractor must look at encoder output and produce separate subframes (along with metadata), prior to the transform and packetizer.
[13:02:02] <Shuai Zhao_web_256> what does "transform" really mean here?  a algorithm extract common metadata from the codec?
[13:02:16] <Sergio Garcia Murillo_web_207> e2e encryption
[13:02:21] <Mo Zanaty_web_434> transform = E2E encrypt
[13:02:41] <Stephan Wenger_web_825> Taxonomy: I don't recall for sure, but don't think so.  Also, these things change significantly from codec to codec generation, and I don't think the IETF should attempt to map above terms of the codec designs to pseudo-generic terms with the same name in the IETF's realm.
[13:02:58] <Shuai Zhao_web_256> what does this to do with codec agnostic? trying to learn...
[13:03:57] <Mo Zanaty_web_434> Just for our discussion, not written in any draft, can we agree on something now to keep things clear.
[13:05:03] <Justin Uberti_web_404> we had used the term 'IDU', independent decodable unit, in the previous meeting.'
[13:06:13] <Mo Zanaty_web_434> Frame marking attempted to be codec-agnostic metadata. A few implementers say it failed.
[13:07:27] <Mo Zanaty_web_434> IDD is not what you guys think.
[13:11:15] <Jonathan Lennox_web_627> I wouldn't have a problem with scoping this down from "fully generic codec agnostic" to "agnostic for interactive a/v conferences" if that's the target use case.
[13:12:19] <Mo Zanaty_web_434> What about agnostic for all currently deployed codecs and scalability structures? Or do we want futures too?
[13:12:38] Timothy Panton_web_481 leaves the room
[13:12:52] <Joerg Ott_web_484> +1 on Colin’s comment on codec-agnostic
[13:12:58] <Jonathan Lennox_web_627> Well, beyond that, saying that MPEG2TS streaming is out of scope would be fine.
[13:13:02] <Justin Uberti_web_404> do we have an exemplar of where it doesn't work?
[13:13:27] <Jonathan Lennox_web_627> (I haven't analyzed whether it'd work, but I don't particularly care.)
[13:14:11] Timothy Panton_web_904 joins the room
[13:14:35] <Justin Uberti_web_404> ISTM that the packetizer can be reasonably codec-agnostic, but the metadata stuff is where it gets hard.
[13:15:02] <Alex Gouaillard_web_235> the answer to the underlying codec negotiation is in the draft.
[13:15:14] <Alex Gouaillard_web_235> and was in the two presentations done at SFrame
[13:15:19] <Alex Gouaillard_web_235> and is in the rtp paylaod format .
[13:15:33] Timothy Panton_web_904 leaves the room
[13:15:40] Juliana Guerra_web_704 joins the room
[13:16:01] Timothy Panton_web_459 joins the room
[13:16:04] Timothy Panton_web_459 leaves the room
[13:16:07] Timothy Panton_web_758 joins the room
[13:16:14] <Jonathan Lennox_web_627> The packetizer can be codec-agnostic only because the codec APIs we have today tend to be frame-based.  If codec APIs were producing an unmarked stream of bits the packetizer would need to be a lot smarter.
[13:16:30] Timothy Panton_web_758 leaves the room
[13:16:33] Timothy Panton_web_765 joins the room
[13:16:57] <Jonathan Lennox_web_627> I.e. the codec API is itself acting as a packetizer.
[13:17:38] <Mo Zanaty_web_434> What if we want to go over QUIC streams (not datagrams) and the packetizer has to hunt for start codes?
[13:19:30] <Justin Uberti_web_404> aren't we talking specifically about a RTP packetizer here?
[13:19:53] <Justin Uberti_web_404> agree that the transform needs to have some sort of 'frame' concept though to do its worj
[13:20:08] Timothy Panton_web_765 leaves the room
[13:20:57] <hta> if we go over QUIC stream or webtransport, we'll have lost the RTP architecture and all its functions, so using the RTP architecture isn't an option, I think.
[13:21:12] <Justin Uberti_web_404> right
[13:21:17] Timothy Panton_web_149 joins the room
[13:22:08] <James Gruessing> If we use QUIC datagrams (or QRT) then we keep the RTP architecture?
[13:22:16] Asad Saeed_web_762 joins the room
[13:22:29] <Magnus Westerlund_web_426> Harald, I don't think you want to through out all of RTPs functionality. There are a subset of the functionality that you need to handle multiparty in a good way. That this is likely a new protocol and most definitly not RTP v2.
[13:25:18] <hta> Magnus, if we throw out SSRCs as a stream identification mechanism and RTCP as a feedback mechanism...?
[13:25:54] Juliana Guerra_web_704 leaves the room
[13:26:01] Juliana Guerra_web_428 joins the room
[13:27:06] <Justin Uberti_web_404> citation needed
[13:27:10] <hta> so what is the vast majority using?
[13:27:34] <hta> zoom uses websockets in their web instance, I think.
[13:28:14] Asad Saeed_web_762 leaves the room
[13:28:29] Tim Bruylants_web_142 leaves the room
[13:28:30] Takio Yamaoka_web_865 leaves the room
[13:28:34] <Jonathan Lennox_web_627> MPEG2TS streaming, I think?  For cable TV-type services.
[13:28:35] Tim Bruylants_web_697 joins the room
[13:28:36] Takio Yamaoka_web_244 joins the room
[13:28:42] <Magnus Westerlund_web_426> A multiparty capable centralized conferencing supporting frame work do need even over a single leg have media source identification, it needs some timestamp related to sampling or frames. It needs a control level that can refer to frame/timestamps. So in a WebRTC case you could leave all this to the JS application, however if you are going to federate that layer need to be exposed to support two implementation systems to work with each other.
[13:33:19] <Magnus Westerlund_web_426> And if you goal is to enable service interoperability with SFRAME, then one needs specifications for the format one put inside the SFRAME, i.e. the equivalent to the RTP payload format but for SFRAME.
[13:35:00] <Magnus Westerlund_web_426> I understand that the main focus for now is to have a SFRAME format that is a standard, and the meta data that allows the endpoint to express things necessary for sub set selection and local repair in a sane way by the SFUs. However, this is creating truly non-interoperable islands above SFRAME.
[13:35:52] <Jonathan Lennox_web_627> I think the disconnect here is that the specification for what goes inside the SFRAME/IDU doesn't have to be *complicated* or *controversial* - for most of the codecs we have it'd probably be one sentence, e.g. "the IDU is the VP8 frame".
[13:36:30] Renan Krishna_web_577 leaves the room
[13:36:36] Renan Krishna_web_134 joins the room
[13:37:12] <Mo Zanaty_web_434> Chairs, is the q cut? If so, can we finish the preso and reopen the q again for discussion?
[13:38:16] <Timothy Panton_web_149> Strikes me there are 2 transforms, one (encryption) for the body the other for the metadata.
[13:38:50] <hta> I'm not Cullen's "most people".
[13:40:28] <Mo Zanaty_web_434> The codec-specific metadata is the first few (single digit) bytes of the payload. Just put those first few bytes in the clear after checking they don't leak anything significant. If you want to give up on codec-agnostic metadata.
[13:40:43] <Jonathan Lennox_web_627> To enhance Cullen's example, audio levels probably wouldn't work for MIDI, and that's okay
[13:41:04] <Magnus Westerlund_web_426> Jonathan I agree on some level. However, I think part is the meta data and mapping into sequence of SFRAMES relates and being able to determine when something has been lost figure out where it actually belongs.
[13:41:17] <Magnus Westerlund_web_426> And if one actually need that particular frame or not.
[13:41:58] <Timothy Panton_web_149> The SFU doesn’t really care about the codec, it just wants to know what it can drop to meet the recipients bandwidth constraint in a way that is best for the user.
[13:42:17] <Jonathan Lennox_web_627> Yes, agreed.  Now, something like AV1 DD can manage that, as well as giving the information the SFU needs, so it can be combined.  But it's serving two conceptually rather different purposes.
[13:42:19] <James Gruessing> @Mo Zanaty_web_434 Whilst the first few bytes of metadata probably won’t expose anything meaningful about the contents, anything not encrypted is helpful in identifying the purpose of the payload and/or fingerprinting. Depending on the threat model, is is a problem, and in part why QUIC just encodes all the things.
[13:43:06] Joerg Ott_web_484 leaves the room
[13:43:11] <Timothy Panton_web_149> So a potential header extension would be ‘drop this for < 10mit/s ‘
[13:43:20] <Magnus Westerlund_web_426> James, that is why you have two level of encryption. One end-to-end and one hop-by-hop.
[13:45:09] <Mo Zanaty_web_434> Chairs, do we q or wait for end of preso?
[13:45:31] <Magnus Westerlund_web_426> On the current debate, I am convinced that one SFRAME per independently decodable unit is the right level. The fragmentation to IP packet MTU should happen after, and the repair can happen hop-by-hop on that level. There are no need real need to have the partial IDU be decryptable by a receiver.
[13:46:15] <hta> @magnus this makes sense to me.
[13:46:39] <hta> calling the IP-packet-breakup "fragmentation" rather than "packetization" might be nice for the vocabulary.
[13:46:52] <hta> of course H.264 packetization mode 0 is the ugly duckling in this area.
[13:47:15] <Mo Zanaty_web_434> fragmentation and aggreggation, not just frag.
[13:47:15] Roni Even_web_267 leaves the room
[13:47:21] Roni Even_web_756 joins the room
[13:47:30] Timothy Panton_web_149 leaves the room
[13:47:43] Timothy Panton_web_526 joins the room
[13:47:46] <Cullen Jennings_web_631> so the first bullet on this is sort of insane. reall apps pretty much need to start sending before the network MTU is known
[13:47:52] <Stephan Wenger_web_825> I don't get it.  How do you do hop-by-hop repair of UDP streams "below" RTP?  Are we talking non-UDP transport here?
[13:47:52] <Magnus Westerlund_web_426> So, in H.264 and H.265 an IDU can consists of multiple NALUs.
[13:47:53] Roni Even_web_756 leaves the room
[13:47:58] Roni Even_web_197 joins the room
[13:48:09] <Justin Uberti_web_404> real apps make assumptions about MTU
[13:48:18] <Mo Zanaty_web_434> What is IDU then? Layer frame (taxonomy!!)
[13:48:46] <Justin Uberti_web_404> magnus, you are referring to SPS/PPS NALU, I assume
[13:49:06] <Cullen Jennings_web_631> So this slide is the opposite of what was the answer to Colin was earlier.
[13:49:10] <Magnus Westerlund_web_426> Stephan, so you can use FEC, RTX on RTP level to do repair between a sender and a SFU or a receiver.
[13:50:13] Roni Even_web_197 leaves the room
[13:50:58] <Magnus Westerlund_web_426> Justin, yes it the main example. In many uses, the combination of SPS and PPS and the video coding NALUs are needed to be able decoded a single frame. Sure, this is not true in all usages, but in some.
[13:51:33] <Justin Uberti_web_404> yeah, agreed.
[13:52:05] <Stephan Wenger_web_825> ah.  Yes.
[13:52:42] Justin Uberti_web_404 leaves the room
[13:53:17] <hta> @cullen the MTU is 1195.
[13:53:54] <Cullen Jennings_web_631> @hta - that sounds reasonable but was not what slide said
[13:53:59] Paolo Saviano_web_656 joins the room
[13:54:43] <Mo Zanaty_web_434> This is similar to Flex FEC, which is also a tranform format that binds to other payload types within its own payload header. So I do see precendents for this approach.
[13:54:58] Paolo Saviano_web_656 leaves the room
[13:55:01] Tobia Castaldi_web_385 joins the room
[13:56:15] Justin Uberti_web_561 joins the room
[13:57:28] <Jonathan Lennox_web_627> I guess the difference is that it makes sense to negotiate Flex FEC independently of what video codecs you're using.
[13:57:59] <Magnus Westerlund_web_426> We should declare RED as historic
[13:58:07] <Colin Perkins_web_946> yup
[13:58:12] <Justin Uberti_web_561> well, to some extent the same could be said for SFRAME - it's just another on-wire transform
[13:58:25] <Justin Uberti_web_561> (responding to Jonathan
[13:59:32] Alan Ford_web_466 leaves the room
[14:00:01] Roland Jesske_web_624 leaves the room
[14:00:40] <Cullen Jennings_web_631> I would like to see a bit an an example on how many PT get used and if we run out. Can you please send that to the list.
[14:00:48] <Stephan Wenger_web_825> @Mo.  That's an oversimplification; certainly for SHVC
[14:01:21] Barry Leiba_web_709 leaves the room
[14:01:23] <Cullen Jennings_web_631> If that is and issue, I would like us to consider just an extended PT space extention to RTP
[14:01:24] <Tim Bruylants_web_697> thank you
[14:01:27] Timothy Panton_web_526 leaves the room
[14:01:28] Renan Krishna_web_134 leaves the room
[14:01:29] Juliana Guerra_web_428 leaves the room
[14:01:31] Colin Perkins_web_946 leaves the room
[14:01:31] Stephan Wenger_web_825 leaves the room
[14:01:32] Florent Castelli_web_159 leaves the room
[14:01:33] Youngkwon Lim_web_317 leaves the room
[14:01:36] Bernard Aboba_web_731 leaves the room
[14:01:41] Takio Yamaoka_web_244 leaves the room
[14:01:41] James Gruessing_web_349 leaves the room
[14:01:42] Magnus Westerlund_web_426 leaves the room
[14:01:42] <Cullen Jennings_web_631> Some way to negoate use of 16 bit or varint PTs
[14:01:45] Roman Shpount_web_221 leaves the room
[14:01:46] <hta> @cullen chrome's default offer is now putting AV1 in the extended range, because 96-127 is full.
[14:01:47] Shuai Zhao_web_256 leaves the room
[14:01:52] Chris Wendt_web_646 leaves the room
[14:01:54] Youenn Fablet_web_251 leaves the room
[14:01:55] Tim Bruylants_web_697 leaves the room
[14:01:57] Harald Alvestrand_web_787 leaves the room
[14:01:58] <Justin Uberti_web_561> yep. we're already at 32+ in webrtc
[14:02:05] Justin Uberti_web_561 leaves the room
[14:02:16] Murray Kucherawy_web_938 leaves the room
[14:02:22] Yago Sanchez_web_634 leaves the room
[14:02:46] <Cullen Jennings_web_631> well, the raanger 96-127 is not  limit for any real usage, agree with that
[14:03:14] Alex Gouaillard_web_235 leaves the room
[14:03:19] Alex Gouaillard_web_439 joins the room
[14:03:31] Justin Uberti_web_614 joins the room
[14:03:33] <Cullen Jennings_web_631> But my point remains, if that is a problem, lets fix it
[14:03:49] Mo Zanaty_web_434 leaves the room
[14:03:58] <Cullen Jennings_web_631> That said, I think we should be able to indicate use of SFRAME without using *any* additional PT
[14:04:25] Jonathan Lennox_web_627 leaves the room
[14:04:26] Sergio Garcia Murillo_web_207 leaves the room
[14:04:26] Tobia Castaldi_web_385 leaves the room
[14:04:26] Alex Gouaillard_web_439 leaves the room
[14:04:26] Cullen Jennings_web_631 leaves the room
[14:04:26] Justin Uberti_web_614 leaves the room
[14:04:26] Alessandro Amirante_web_274 leaves the room
[14:05:37] Jonathan Lennox joins the room
[14:14:19] alex-meetecho leaves the room
[14:40:50] hta is now known as iabopen
[14:41:11] iabopen leaves the room
[16:00:00] James Gruessing leaves the room
[17:34:51] Jonathan Lennox leaves the room
[17:35:55] Jonathan Lennox joins the room
[18:19:38] Jonathan Lennox leaves the room
[20:18:08] James Gruessing joins the room
[21:01:21] James Gruessing leaves the room
Powered by ejabberd - robust, scalable and extensible XMPP server Powered by Erlang Valid XHTML 1.0 Transitional Valid CSS!