IETF
pearg
pearg@jabber.ietf.org
Monday, July 25, 2022< ^ >
Meetecho has set the subject to: PEARG IETF 113
Room Configuration
Room Occupants

GMT+0
[05:33:01] Matthew joins the room
[05:33:01] sftcd joins the room
[05:33:02] npd joins the room
[05:33:03] cjsu joins the room
[05:33:59] zulipbot joins the room
[09:37:08] FXTIA joins the room
[10:17:44] FXTIA leaves the room
[10:18:01] FXTIA joins the room
[10:40:07] FXTIA leaves the room
[10:40:24] FXTIA joins the room
[17:11:36] FXTIA leaves the room
[17:11:54] FXTIA joins the room
[17:23:51] <zulipbot> (Christian Huitema) Good afternoon everyone!
[17:27:55] <zulipbot> (Christopher Patton) Hi!
[17:29:18] <zulipbot> (Randy Bush) any chance some can do a sound check?
[17:29:32] <zulipbot> (Sara Dickinson) Hi All - I believe we need a note taker is anyone is willing to volunteer?
[17:29:45] <zulipbot> (Randy Bush) well, thank you
[17:29:59] <zulipbot> (Christian Huitema) yes, we hear you. At least some of us do
[17:30:12] <zulipbot> (Sofia Celi) hearing here as well ;)
[17:30:39] <zulipbot> (Randy Bush) chris, check room mic please
[17:31:33] <zulipbot> (Mike Rosulek) as a remote speaker, will I screen-share my own slides or will they be operated by someone in the room?
[17:32:01] <zulipbot> (Christopher Patton) I can take notes.
[17:32:01] <zulipbot> (Sara Dickinson) We can share you slides so you can control them via Meetecho
[17:32:40] <zulipbot> (Mike Rosulek) ok great, as long as I don't have to command a person to advance the slide a gazillion times ;) -- I used a lot of overlays
[17:33:37] <zulipbot> (Rob Austein) I just lost audio.  Just me?
[17:33:51] <zulipbot> (Randy Bush) /me haz audio
[17:34:04] <zulipbot> (Mike Rosulek) audio is still working for me
[17:34:05] <zulipbot> (Rob Austein) Ack, tnx
[17:36:06] <zulipbot> (Martin Thomson) @**Sara Dickinson** I think that you can hand control of the slides over to Sofìa
[17:37:07] <zulipbot> (Sara Dickinson) OK - I think sofia should have control now
[17:37:21] <zulipbot> (Christopher Wood) Yes, it looks that way.
[17:38:42] <zulipbot> (Sara Dickinson) @Mike - yes we have the power to hand you control of your slides (Thanks @Martin for the prompt!)
[17:40:02] <zulipbot> (Rob Austein) Since Meetecho gets blamed for everything even when it's not their fault: it was not their fault, totally local issue (bluetooth transceiver glitched)
[17:42:45] <npd> what do we mean by "costly" in describing RAPPOR? just that it's less efficient per report, because their is client-side randomization/noise?
[17:48:19] <zulipbot> (Mike Rosulek) @Sofía: can you list some functions that would be useful to compute using PPM techniques, but for which we currently lack efficient techniques? (asking from a research perspective)
[17:49:35] <zulipbot> (Christopher Wood) Given the time remaining, we may have to take Q&A offline to the chat.
[17:50:04] <zulipbot> (Martin Thomson) I don't think that this characterization is correct for STAR
[17:50:46] <zulipbot> (Shivan Sahib) Definitely taking all Qs to chat :)
[17:51:35] <zulipbot> (Martin Thomson) Prio is probably on the cheap side, though it might depend on the circuit you are executing.  A simple sum over a small range of input values is pretty cheap.
[17:52:30] <zulipbot> (Martin Thomson) @_**Mike Rosulek|681** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20721):
```quote
@Sofía: can you list some functions that would be useful to compute using PPM techniques, but for which we currently lack efficient techniques? (asking from a research perspective)
```
You should look at the attribution problem.  Private set intersection or similar protocols are currently quite difficult.
[17:53:11] <zulipbot> (Deirdre Connolly) 👏
[17:53:20] <npd> thank you for the presentation!
[17:53:20] <npd> there's lots of options to summarize and it's good to start thinking about comparison. and I really appreciate explicitly considering those questions of whether users understand or how they would choose whether to participate
[17:53:24] <zulipbot> (Christian Huitema) When it comes to telemetry, there are two usages. Statistics is one. But debugging requires looking at complete sessions. Are there techniques for that too?
[17:54:36] <zulipbot> (Sofia Celi) @_**Jabber|59** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20692):
```quote
npd: what do we mean by "costly" in describing RAPPOR? just that it's less efficient per report, because their is client-side randomization/noise?
```
yes. The PROCHLO paper goes into more details
[17:54:49] <zulipbot> (Martin Thomson) @**Christian Huitema** the problem with debugging is that you often don't know a priori what you will need to know, which tends to lead to systems that gather everything
[17:54:49] <zulipbot> (Mike Rosulek) Thanks @Martin, if you have a good reference for this attribution problem, I'd love to see it. I like PSI-related problems ;)
[17:55:17] <zulipbot> (Martin Thomson) @**Mike Rosulek** something I've been working on is ad attribution - you want to count correlated events in contexts where you don't want to reveal the correlation
[17:55:54] <zulipbot> (Rob Austein) One could envision two levels of debug, one still sort of privacy preserving, the other just show me everything.  The challenge would be making the first useful enough that anybody would bother to use it.
[17:56:07] <zulipbot> (Martin Thomson) This is old, but there's a sketch of a solution here: https://github.com/patcg/private-measurement/issues/9
[17:56:20] <zulipbot> (Sofia Celi) @_**Mike Rosulek|681** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20721):
```quote
@Sofía: can you list some functions that would be useful to compute using PPM techniques, but for which we currently lack efficient techniques? (asking from a research perspective)
```
Depends on the scheme used. In Prio-based, there is a more efficient system called 'Prio+' that uses boolean circuits. Unfortunately, not all aggreagate functions work in the boolean, so one translates from boolean to arithmetic. It will be nice to expand all functions to boolean
[17:56:49] <zulipbot> (Sofia Celi) and, of course, in Prio-based (except for POPLAR) is all numeric data types
[17:56:55] <npd> @sofia "costly" in terms of efficiency of every report is pretty different from costly-for-the-client or costly-and-infeasible-to-adopt. some measurements will take more reports to gather with RAPPOR style systems. is that completely unacceptable?
[17:57:31] <zulipbot> (Sofia Celi) @_**Martin Thomson|26** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20750):
```quote
Prio is probably on the cheap side, though it might depend on the circuit you are executing.  A simple sum over a small range of input values is pretty cheap.
```
Yes, but only over numeric values.
[17:57:44] <zulipbot> (Nils Wisiol) Sofia, why is the randomness server needed in STAR? What does it do that the client cannot do by itself?
[17:57:58] <zulipbot> (Martin Thomson) @**Sofia Celi** one of the things you didn't emphasize enough in comparing poplar and star is that star can do arbitrary strings, whereas the strings in poplar are finite length, which can be difficult for some applications
[17:58:24] <zulipbot> (Sofia Celi) @_**Martin Thomson|26** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20820):
```quote
@**Sofia Celi** one of the things you didn't emphasize enough in comparing poplar and star is that star can do arbitrary strings, whereas the strings in poplar are finite length, which can be difficult for some applications
```
oh yes! I should add that to the note. Thank you!
[17:59:03] <zulipbot> (Shivan Sahib) @Nils Wisiol come to the PPM session on Thursday where we'll be discussing STAR :) but in a nutshell, some input data doesn't have sufficient entropy to ensure it's not easy for the Aggregation Server to break
[17:59:16] <zulipbot> (Sofia Celi) @_**Jabber|59** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20812):
```quote
npd: @sofia "costly" in terms of efficiency of every report is pretty different from costly-for-the-client or costly-and-infeasible-to-adopt. some measurements will take more reports to gather with RAPPOR style systems. is that completely unacceptable?
```
It is not infeasible to adopt. It was used some time ago, but I think preceded by PROCHLO. Depends on your needs.
[17:59:57] <zulipbot> (Martin Thomson) Talking to the folks at Google who deployed RAPPOR, I think that they might disagree about it being feasible.  The quality of the data it produced was pretty poor.
[18:00:26] <zulipbot> (Shivan Sahib) @Sofia Celi, maybe "efficiency" is more accurate than "costly" for RAPPOR?
[18:00:27] <zulipbot> (Shivan Sahib) @Martin, that's also what I've heard
[18:01:13] <zulipbot> (Martin Thomson) The problem, as I understand it, was that the layers of randomness that RAPPOR added was all per-record noise (local DP), which loses the advantages of central DP and just makes the output close to pure noise
[18:01:26] <zulipbot> (Sofia Celi) @_**Shivan Sahib|765** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20853):
```quote
@Sofia Celi, maybe "efficiency" is more accurate than "costly" for RAPPOR?
```
yes! I don't remember what I put in the slides. If it is only that I said it, it is prob that I get nervous and forget words haha ;)
[18:04:11] <zulipbot> (Chris Box) Barath's presentation is really clear. Useful to have this terminology.
[18:05:08] <zulipbot> (Nick Doty) I had thought the premise of RAPPOR was that there was per-record noise but that they showed that an average could still give increasingly useful data as the population size increased. but if they reported later that actually it didn't work out too well, then I need to read more of the latest writing
[18:06:34] <zulipbot> (Nick Doty) is it that users don't care whether they reveal they use a service? or just that we haven't provided it, and they've had the risk as mostly inevitable?
[18:06:47] <zulipbot> (Matthew Finkel) @_**Martin Thomson|26** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20864):
```quote
The problem, as I understand it, was that the layers of randomness that RAPPOR added was all per-record noise (local DP), which loses the advantages of central DP and just makes the output close to pure noise
```
But that is ideal, we just don't know how to get data with utility at the end. Local DP should be a goal in some systems.
[18:07:07] <zulipbot> (Sofia Celi)  The overall magnitude of this added random noise (which is Gausian noise) can be very large in RAPPOR: even in the theoretical best case, the standard deviation grows in proportion to the square root of the survey count (√n, where n is the number of participants), and the randomness is in practice higher by an order of magnitude. Thus, if a billion users data are analyzed, then, a common signal from even up to a million reports may be missed.
[18:07:33] <zulipbot> (Antoine Fressancourt) I am puzzled by the fact that sensitivity is considered « in absolute » while I think it is depending on the observer interest and capabilities (imho)
[18:07:59] <zulipbot> (Andrew Campling) With Oblivious DNS you have to trust that the decoupling is applied as it is not forced by the protocol.  You can have colluding proxies.
[18:08:25] <zulipbot> (Shivan Sahib) Non-collusion is an assumption for all of these systems
[18:08:37] dkg joins the room
[18:08:40] <zulipbot> (Martin Thomson) @**Matthew Finkel**  Yeah, I agree that local DP can be useful, but the trade-off is often very poor.  There is a continuum between "no noise" and "no utility", which is connected by a line where you have both "too much noise" and "too little utility".  I generally don't think that local DP is useful unless you have an absurdly large set of inputs.
[18:08:54] <zulipbot> (Andrew Campling) Discovery and selection of proxies by the user / client software could help avoid collusion
[18:09:50] <zulipbot> (Sofia Celi) @_**Nick Doty|550** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20884):
```quote
is it that users don't care whether they reveal they use a service? or just that we haven't provided it, and they've had the risk as mostly inevitable?
```
Not so sure I follow. But, from what I understand (sorry if I misunderstand): privacy is a main concern of these schemes, but it should be also user consent, meaning, some users even if they know it is private might not want to participate in x survey.
[18:10:22] <zulipbot> (Nick Doty) we shouldn't assume that our failure to provide privacy is evidence that users don't care about their privacy of that information
[18:11:01] <zulipbot> (Martin Thomson) this SSH design sounds awful
[18:11:14] <zulipbot> (Sofia Celi) @_**Nick Doty|550** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20915):
```quote
we shouldn't assume that our failure to provide privacy is evidence that users don't care about their privacy of that information
```
yes! one thing also that some papers have higlighted is that good explanations of privacy matter to users. It is not enough to say 'it is private'
[18:11:27] <zulipbot> (Andrew Campling) There is reseach that shows US consumers would prefer to preserve privacy but are not provided options to do so
[18:11:40] <zulipbot> (Sofia Celi) one of those studies: https://arxiv.org/abs/2110.06452
[18:12:21] <zulipbot> (Alissa Cooper) @_**Nick Doty|550** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/20884):
```quote
is it that users don't care whether they reveal they use a service? or just that we haven't provided it, and they've had the risk as mostly inevitable?
```
I also found this a little overbroad -- it seems that the motivation for quite a number of uses of mixnets, Tor, etc. is that for those use cases/users they don't want to reveal that they are using a service. It might not be true for the broad population, but for vulnerable populations it is.
[18:13:16] <zulipbot> (Matthew Finkel) @**Martin Thomson** Yeah, that's unfortunate. Users are more likely to believe the privacy benefits of DP if the protections are achieved before the data leaves their device. That's not to say people understand DP, but if we are going to say people should enable telemetry because we use DP, then central DP is a bit sketchy.
[18:13:47] <zulipbot> (Erik Nygren) @mt: I still have scars from trying to work through the SSH Protocol 1 to Protocol 2 migration long ago.
[18:14:26] <zulipbot> (Alissa Cooper) More broadly regarding the decoupling principle, how can it be used in an actionable way in protocol design? It kind of seems like the caveats swallow the value (e.g., if you a declare a protocol "conforms" to the decoupling principle when in reality in deployment there are side channels that allow user re-identification).
[18:15:07] <zulipbot> (Matthew Finkel) Hrm. Deniable SSH auth?
[18:15:49] <zulipbot> (Christopher Wood) @Matthew Relevant: https://privacytools.seas.harvard.edu/publications/algorithmic-institutional-logics-politics-differential-privacy
[18:17:16] <zulipbot> (Barath Raghavan) I think of decoupling as a design test, but not the end-all. And side channels tend to be discovered over time, so at the time of design I think we want to plug as many side channels as possible with the expectation that new ones will be found (and then mitigated). However within the context of that mitigation, I think we'd like the architecture itself to have privacy baked in, and that's where decoupling may be beneficial.
[18:18:37] <zulipbot> (Shivan Sahib) I guess it's also interesting for the trust model when the decoupling proxy is used for some other function in the protocol. For e.g. in STAR, you could potentially host the Randomness Server on the OHAI proxy. But this means that if the entity controlling the server colludes with the randomness server, it also automatically colludes with the decoupling proxy
[18:22:19] <zulipbot> (Christopher Patton) @mike: concrete performance is client or server side?
[18:23:11] <zulipbot> (Christopher Patton) Thanks!
[18:23:11] <zulipbot> (Erik Nygren) How does this work when different client keys have different configuration and constraints server-side?
[18:25:45] <zulipbot> (Martin Thomson) @**Mike Rosulek** why would you reveal the repository name?  some of those are private/secret
[18:27:35] <zulipbot> (Mike Rosulek) @Martin the SSH username (repository name in my example) is sent after the client authenticates the identity of the server. I think there is no way to avoid having github learn what repository you are accessing, but the repository name is not sent in the clear for eavesdroppers to see.
[18:28:03] <zulipbot> (Nick Doty) to follow our decoupling model, doesn't the server always learn the sensitive data in the contents of the communication, like which data you are accessing that is served by that server?
[18:29:33] <zulipbot> (Barath Raghavan) @Nick yes in most cases the server/recipient does learn sensitive information because you want something from them. So, the notion here is that at that moment you may not need to reveal your identity (unless your identity is bound to the request itself, and even then decoupling is sometimes possible)
[18:29:49] <zulipbot> (Mike Rosulek) @Erik: I think I now understand your question: our protocol only makes sense if the server can specify a set of keys, all of which are authorized for some requested action.. If some keys are differently authorized then it wouldn't make sense to hide the identity of the key
[18:31:14] <zulipbot> (Mike Rosulek) @Nick: we don't have a great killer application for deniability about which key was used -- you could use github to set up an anonymous bulletin board where no one knows which of the authorized users has made any given commit
[18:31:14] <zulipbot> (Sofia Celi) @**Mallory Knodel** user consent prob in PPM schemes
[18:31:27] <zulipbot> (Christopher Patton) +1 Sofia
[18:31:53] <zulipbot> (Nick Doty) /me waves
[18:31:54] <zulipbot> (Sofia Celi) maybe a 'user considerations' draft for PPM
[18:32:08] <zulipbot> (Martin Thomson) @_**Mike Rosulek|681** [said](https://zulip.ietf.org/#narrow/stream/289-pearg/topic/jabber/near/21031):
```quote
@Martin the SSH username (repository name in my example) is sent after the client authenticates the identity of the server. I think there is no way to avoid having github learn what repository you are accessing, but the repository name is not sent in the clear for eavesdroppers to see.
```
In the case that a client probes a repo that it maybe doesn't know exists, doesn't the client learn the number of keys that the server accepts?  Which might reveal if the repo exists...
[18:32:08] <zulipbot> (Barath Raghavan) @Nick ah, good question -- I haven't through through that
[18:32:28] <zulipbot> (Mike Rosulek) @Martin: yes that is an excellent point
[18:33:39] <zulipbot> (Jonathan Hoyland) @Meetecho mics still hot.
[18:34:20] <zulipbot> (Lorenzo Miniero) @**Jonathan Hoyland** what does that mean? My Italianglish parser can't compute :)
[18:34:21] <zulipbot> (Mike Rosulek) @Martin: to be more precise, the # of keys is not even leaked if all the keys are ECDSA/EdDSA.. so in that case the server can easily give a dummy response even for a non-existent repository.. then the PSI will fail
[18:34:48] <zulipbot> (Jonathan Hoyland) The microphone is still live, even though the session has ended, and people would not expect to be caught by it.
[18:35:03] <zulipbot> (Jonathan Hoyland) s/caught/picked up/
[18:35:22] <zulipbot> (Lorenzo Miniero) Ah ok, we probably didn't notice it ended: next time please write a note on the chat so that we're aware (watching 8 videos at the same time can cause us to miss things)
[18:56:47] dkg leaves the room