• Dropped messages in for-await

    Swift concurrency has a feature called for-await that can iterate over an AsyncSequence. Combine has a .values property that can turn Publishers into an AsyncSequence. This feels like a perfect match! But it is surprisingly subtle and makes it very easy to drop messages if you’re not careful.

    Consider the following example (full code is at the end).

            // A NotificationCenter publisher that emits Int as the object.
            // (Yes, this is an abuse of `object`. Hush. I'm making the example simpler.)
    
            let values = nc.publisher(for: name)
                .compactMap { $0.object as? Int }
                .values
    
            // Loop over the notifications... right?
            for await value in values {
                // At this point, nothing is "subscribed" to values, so messages will be dropped until the next loop begins.
    
                // ... Process notification ...
            }
    

    This feels right, but it’s subtly broken and will drop notifications. AsyncPublisher provides no buffer. If nothing is subscribed, then items will be dropped. This makes sense. Imagine if .values did store all of the elements published until they were consumed. Then if I failed to actually consume it, it would leak memory. (We can argue about the precise meaning of “leak” here, but still, grow memory without bound.) Just creating an AsyncPublisher shouldn’t do that. Nothing else works like that in Combine or in Swift Concurrency. An AsyncStream is a value. You should be able to stick it in a variable to use later without leaking memory. (Ma’moun has made me rethink this. It’s true that this is how it works, but I’m now torn a bit more on whether it should.)

    Similarly, the fact that for-await doesn’t create a subscription makes sense. In what way would it do that? That’s not how AsyncSequence works. Its job is to call makeAsyncIterator() and then repeatedly call next(). It doesn’t know about buffering or Subscriber or any of that. And say makeAsyncIterator() could take buffering parameters. Where would they go in the for-await syntax?

    The answer to all of this is that you need a buffer, and it’s your job to configure it. If you want an “infinite” buffer (which is what people usually think they want), then it looks like this:

             let values = nc.publisher(for: name)
                .compactMap { $0.object as? Int }
                .buffer(size: .max, prefetch: .byRequest, whenFull: .dropOldest) // <----
                .values
    

    And IMO this probably is the most sensible way to solve this, even if the syntax is a bit verbose. Obviously we could add a bufferedValues(...) extension to make it a little prettier….

    BUT….

    Yeah, nobody remembers this, even if they’ve heard about it before. .values is just so easy to reach for. And the bug is a subtle race condition that drops messages. And you can’t easily unit test for it. And the compiler probably can’t warn you about it. And this problem exists in any situation where an AsyncSequence “pushes” values, which is basically every observation pattern, even without Combine.

    And so I struggle with whether to encourage for-await. Every time you see it, you need to think pretty hard about what’s going on in this specific case. And unfortunately, that’s kind of true of AsyncSequence generally. I’m not sure what to think about this yet. Most of my bigger projects use Combine for these kinds of things currently, and it “just works” including unsubscribing automatically when the AnyCancellable is deinited (another thing that’s easy to mess up with for-await). I just don’t know yet.

    ADDENDUM

    I strangely forgot to also write about NotificationCenter.notifications(named:), which goes directly from NotificationCenter to AsyncSequence. It’s a good example of the subtlety. It has the same dropped-messages issue:

           // Also drops messages if they come too quickly, but not as many as an unbuffered `.values`.
           let values = nc.notifications(named: name)
                .compactMap { $0.object as? Int }
    
            for await value in values { ... }
    

    Unlike the Combine version, I don’t know how to fix this one. (Maybe this should be considered a Foundation bug? But maybe it’s just “as designed?”) After experimenting a bit, I believe the buffering policy is .bufferingNewest(8). If more than 8 notifications come in during your processing loop, you’ll miss some. Should you send notifications that fast? Maybe not? I don’t know. But the bugs are definitely subtle if you do.

    Here’s the full code to play with:

    @MainActor
    struct ContentView: View {
        let model = Model()
    
        var body: some View {
            Button("Send") { model.sendMessage() }
                .onAppear { model.startListening() }
        }
    }
    
    @MainActor
    class Model {
        var lastSent = 0
        var lastReceived = 0
    
        let nc = NotificationCenter.default
        let name = Notification.Name("MESSAGE")
    
        var listenTask: Task<Void, Error>?
    
        func sendMessage() {
            lastSent += 1
            nc.post(name: name, object: lastSent)
        }
    
        // Set up an infinite for-await loop, listening to notifications until canceled.
        func startListening() {
            listenTask?.cancel()
            listenTask = Task {
                var lastReceived = 0
    
                let values = nc.publisher(for: name).values
                    .compactMap { $0.object as? Int }
    
                for await value in values {
                    // At this point, nothing is "subscribed" to values, so messages will be dropped.
                    let miss = value == lastReceived + 1 ? "" : " (MISS)"
                    print("Received: \(value)\(miss)")
                    lastReceived = value
                    // Sleep to make it easier to see dropped messages.
                    try await Task.sleep(for: .milliseconds(500))
                }
            }
        }
    
        deinit {
            listenTask?.cancel()
        }
    }
    
  • Externalizing properties to manage retains

    I have a very simple type in my system. It’s just an immutable struct:

    public struct AssetClass: Sendable, Equatable, Hashable, Comparable {
        public let name: String
        public let commodities: [String]
        public let returns: Rates
    
        public static func < (lhs: AssetClass, rhs: AssetClass) -> Bool {
            lhs.name < rhs.name
        }
        
        public static func == (lhs: AssetClass, rhs: AssetClass) -> Bool {
            lhs.name == rhs.name
        }
    
        public func hash(into hasher: inout Hasher) {
            hasher.combine(name)
        }
    }
    

    It has a unique name, a short array of symbols that are part of this asset class, and a couple of doubles that define the expected returns. My system uses this type a lot. In particular, it’s used in the rebalancing stage of monte carlo testing portfolios. There are only 5 values of this type in the system, but they are used tens of millions of times across 8 threads.

    They’re value types. They’re immutable value types. So no problem. Except…

    Instruments output showing retain and release as over 50% of all time

    ???

    Yes, across all threads, swift_retain and swift_release in this one function represent over half of my run time (wall clock, that’s about 12 seconds; the times here count all threads).

    But… structs. It’s all structs. Why? What?

    Most Swift Collections (Array, String, Dictionary, etc.) in most cases store their contents on the heap in a reference-counted class. This is a key part of copy-on-write optimizations, and it’s usually a very good thing. But it means that copying the value, which happens many times when it’s passed to a function or captured into a closure, requires a retain and release. And those have to be atomic, which means there is a lock. And since this value is read on 8 threads all doing basically the same thing, there is a lot of contention on that lock, and the threads spend a lot of time waiting on each other just for reference counting.

    In this case, none of the String values are a problem. Both the name and all the commodity symbols are short enough to fit in a SmallString (also lovingly called “smol” in the code). Those are inlined and don’t require heap storage.

    But that commodities Array. It’s small. The largest has 5 elements, and the strings are all 5 bytes or less. But it’s on the heap, and that means reference counting. And it isn’t actually used anywhere in the rebalancing code. It’s only used when loading data. So what to do about it?

    How about moving the data out of the struct and into a global map?

    public struct AssetClass: Sendable, Equatable, Hashable, Comparable {
    
        // Global atomic map to hold commodity values, mapped to asset class names
        // All access to `commodities` is now done through .withLock
        private static var commodityMap = OSAllocatedUnfairLock<[String: [String]]>(initialState: [:])
    
        public init(name: String, commodities: [String], returns: Rates) {
            precondition(Self.commodityMap.withLock { $0[name] == nil }, 
                         "AssetClass names must be unique: \(name)")
    
            self.name = name
            Self.commodityMap.withLock { $0[name] = commodities }
            self.returns = returns
        }
    
        public let name: String
        public var commodities: [String] { 
            Self.commodityMap.withLock { $0[name, default: []] }
        }
        public let returns: Rates
        // ...
    

    And just like that, 30% performance improvement. Went from 12s to 8s.

    This is a sometimes food. Do not go turning your structs inside out because Rob said it was faster. Measure, measure, measure. Fix your algorithms first. This slows down access to commodities, so if that were used a lot, this would be worse. And it relies on AssetClass being immutable. If there are setters, then you can get shared mutable state when you don’t expect it.

    But…for this problem. Wow.

    There’s more to do. Even after making this change, retain and release in this one function are still over 50% of all my runtime (it’s just a smaller runtime). But at least I have another tool for attacking it.

    Oh, and how did I know what kinds of objects were being retained? Try this as a starting point:

    Xcode breakpoint: Set symbol to swift_bridgeObjectRetain. Set action to Debugger Command, "e -- (id)`$arg1`" (resolve dollar-arg1 as an ObjC id), and Automatically continue.

    Be warned: this is a lot of output and very slow and the output is going to tell you about low-level types that may not have obvious correlation to your types, but it’s a starting point.

    OK, back to Instruments. I should definitely be able to get this under 5 seconds.

  • A short anecdote on getting answers on the modern internet

    This post is way off topic, and a bit in the weeds, and maybe even a bit silly, which is why I’m throwing it here rather than getting back to making my proper blog work. I really will do that, but… ok, not today.

    I’ve been trying to learn some abstract algebra. I never studied it in school, and sometimes I bump into people describing operations in “group-speak” (throwing around words like “abelian” and “symmetric group”), and I’d like to try to keep up with what they’re saying even if I probably will never do an actual proof in anger.

    Anyway, I got myself confused about the meaning of “order” when talking about groups vs elements. It’s the kind of dumb “how could you not know this at this point in the course?” question that you’re embarrassed to ask (and so I blog about it because I have an under-developed sense of shame). So… let’s ask some LLMs! And also search engines. And how would I figure this out?

    Good news: it took longer to write this up than to fix my misunderstanding. OK news: nothing lied to me (well, much). Bad news: well, we’ll get to that as it comes up.

    Here’s the exact text I stuck into a bunch of things:

    Is the order of a group the lcm of the order of its elements?

    To be clear to anyone who knows abstract algebra: I was confused and thought this might be the definition of order. I was misunderstanding the point of Lagrange’s theorem. To anyone who doesn’t know abstract algebra, the correct answer is just “the number of elements in the group.” It really is that simple, and I was making it really over-complicated.

    Kagi

    Kagi has become my go-to search engine. I like the model of “I pay you, and you provide a service, that’s it, that’s the whole business model.” It gave me several links that weren’t particularly useful (basically ones that prove Lagrange’s theorem, which assume you already know the answer to my question).

    But the third link was to a Saylor Academy text book, which answered my question:

    In group theory, a branch of mathematics, the term order is used in two closely-related senses:

    The order of a group is its cardinality, i.e., the number of its elements.

    • …

    OK, that’s cleared up.

    Am I done? Of course not! I had to dig through three or four links, most of which confused me more, before I finally found the link to an open-source textbook, where I could go double-check the definition. The internet should be better than that!

    Other search engines

    To quickly summarize other search engines (because you’re all familiar with those, and of course the point is to get to LLMs):

    • Duck Duck Go – (OK) Second link was Wikipedia, which gives the answer in the first sentence. For completeness, searching for the same question on Wikipedia directly also gets you right to the page.

    • Google – (Bad) The Q&A section gives several links to different questions on Stack Exchange, Quora, and Studysmarter.us (?). Importantly, the second answer Google highlights is excerpted as: “Yes, in fact, the order of an element in a finite group…” (The “…” is in the Google excerpt.) You have to carefully notice that this is the answer to a different question. If you read quickly, you could easily walk away thinking “yes” is the answer. This is a big problem with search engine excerpting IMO. If you scroll down past the Q&A section, the first (unrelated) math.stackexchange answer, the “People also ask” section, and then two more links, you will finally come to the Wikipedia article that does answer the question and is where I should have started anyway.

    • Bing – (Good!) The highlighted answer links to Math.SE, and is extremely useful. It provides this directly on the search page:

    No. For instance if G = Z2 ⊕ Z2, then | G | = 4, but the least common multiple of the order of it’s group elements is 2.

    Bing added the bold on “No.” This is the only search that directly answered my question, with citation, in one step. The Math.SE link is a bit funny, though. Five years ago, someone had almost exactly the same question as I did. It was answered in a couple of useful ways, had an accepted answer, and then was closed as off-topic. (!?!?!) This question does not show up in most web searches (other than Bing) and does not show up (at least not prominently) when I put my search directly into math.se.

    Bing definitely won this one. Also, of course, just going to Wikipedia.

    The LLMs - Bing

    OK, but we all know what this was really about. How do the LLMs handle this kind of question? Well, let’s start with Bing, since it answered the question so well through search.

    Me: is the order of a group the lcm of the order of its elements?

    Bing: No, the order of a group is not necessarily the least common multiple of the order of its elements. For instance, if G = Z2 ⊕ Z2, then |G| = 4, but the least common multiple of the order of its group elements is 2.

    Precisely the same answer as in the “quick answer” from search, except this time it’s plagiarism. It is word-for-word the answer from Aweygan on math.se, but gives no link, citation, or even hint that this isn’t original content.

    Maybe I could get the citation by asking?

    Me: What’s the source of your answer?

    Bing: My answer is based on my internal knowledge and information. I am programmed to provide accurate and reliable information to the best of my ability.

    O_o

    Yeah.

    ChatGPT

    And of course I must check ChatGPT:

    No, the order of a group is not necessarily the least common multiple (lcm) of the orders of its elements.

    The order of a group is the number of elements in the group. On the other hand, the order of an element in a group is the smallest positive integer n such that the element raised to the nth power is the identity element of the group.

    Hey, that’s actually really good. Just the right about of extra information. I’d like citations of course, but just being right about basic facts without plagiarism is pretty good (after all, the math.SE answer doesn’t provide citations, either). I like this. This is good.

    And it goes on:

    It is true that the order of an element divides the order of the group, by Lagrange’s theorem. However, the lcm of the orders of the elements may be larger than the order of the group. For example, consider the cyclic group of order 6 generated by a. The group contains elements of orders 1, 2, 3, and 6. The lcm of these orders is 6, but the order of the group itself is only 6.

    Therefore, the order of a group cannot be expressed in terms of the lcm of the orders of its elements, in general.

    Ah, a counter-example where the lcm is larger than the order. Very helpful. In the case where the order is six and the lcm is…six?

    Oh ChatGPT. If only you’d quit while you were ahead.

  • A Reminder About NSRange and String

    The NSRange for a full string is NSRange(location: 0, length: (text as NSString).length). You cannot use text.count for this. NSRanges on strings are based on UTF-16 code points, not Swift Characters. A character like 👩‍👩‍👧‍👧 has a .count of 1, but a .length of 11. ObjC-based NSRange APIs (even ones ported to Swift) need the latter. Swift interfaces (which use Range rather than NSRange) need the former.

    Whenever possible, avoid NSRange. But when it’s needed, make sure to use it correctly. It’s very easy to make mistakes that break on non-ASCII, and especially on emoji.

    (Side note: (text as NSString).length is the same as text.utf16.count. I don’t have a strong opinion on which to use. In both cases, future developers may get confused and try to “fix” the code back to text.count. (text as NSString) is nice because it ties it back to this being an NSString API and is easier IMO to see that something special is happening here, but it also looks like an auto-conversion barnacle. .utf16.count is shorter and calls out what is really happening, but I almost always type .utf8.count out of muscle-memory and it’s hard to see the mistake in code-review. So do what you want. A comment is probably warranted in either case. I hate NSRange in Swift…)

  • Big-O matters, but it's often memory that's killing your performance.

    We spend so much time drilling algorithmic complexity. Big-O and all that. But performance is so often about contention and memory, especially when working in parallel.

    I was just working on a program that does Monte Carlo simulation. That means running the same algorithm over the same data thousands of times, with some amount of injected randomness. My single-threaded approach was taking 40 seconds, and I wanted to make it faster. Make is parallel!

    I tried all kinds of scaling factors, and unsurprisingly the best was 8-way on a 10-core system. It got me down to…50 seconds?!?!? Yes, slower. Slower? Yes. Time to pull out Instruments.

    My first mistake was trying to make it parallel before I pulled out Instruments. Always start by profiling. Do not make systems parallel before you’ve optimized them serially. Sure enough, the biggest bottleneck was random number generation. I’d already switched from the very slow default PRNG to the faster GKLinearCongruentialRandomSource. The default is wisely secure, but slow. The GameKit PRNGs are much faster, but more predictible. For Monte Carlo simulation, security is not a concern, so a faster PRNG is preferable. But it was still too slow.

    Why? Locking. GKLinearCongruentialRandomSource has internal mutable state, and is also thread-safe. That combination means locks. And locks take time, especially in my system that generates tens of millions of random values, so there is a lot of contention.

    Solution: make the PRNG a parameter and pass it in. That way each parallel task gets its own PRNG and there’s no contention. At the same time, I switched to a hand-written version of xoshiro256+ which is specifically designed for generating random floating-point numbers. Hand-writing my own meant that I know what it does and can manage locking. (I actually used a struct that’s passed inout rather than locking. I may test out a class + OSAllocatedUnfairLock to see which is faster.)

    Anyway, that got it down to 30s (with 8-way parallelism), but still far too slow. Using 8 cores to save 25% is not much of a win. More Instruments. Huge amounts of time were spent in retain/release. Since there are no classes in this program, that might surprise you, but copy-on-write is implemented with internal classes, and that means ARC, and ARC means locks, and highly contended locks are the enemy of parallelism.

    It took a while to track down, but the problem was roughly this:

    portfolio.update(using: bigObjectThatIncludesPortfolio)
    

    bigObject includes some arrays (thus COW and retain/release) and includes the object that is being updated. Everything is a struct, so there’s definitely going to be a copy here as well. I rewrote update and all the other methods to take two integer parameters rather than one object parameter and cut my time down to 9 seconds.

    Total so far from cleaning up memory and locks: >75% improvement.

    Heaviest remaining stack trace that I’m digging into now: swift_allocObject. It’s always memory…

  • Pull Requests are a story

    I’ve been thinking a lot about how to make PRs work better for teams, and a lot of my thinking has gone into how to show more compassion for the reviewer. Tell them your story. What was the problem? What did you do to improve it? How do you know it is a working solution (what testing did you do)? Why do you believe it is the right solution? Why this way rather than all the other ways it might be solved?

    Code does not speak for itself. Even the most clear and readable code does not explain why it is necessary, why you are writing it now. Code does not explain why it is sufficient. The problem it solves lives outside the program. The constraints that shape a program can only be inferred. They are not in the code itself. When we want others to review our coding choices, we have to explain with words. We have to tell our reviewers a story.

    And that brings me to the most important writing advice I’ve ever been taught. If you want to write well, you must read what you wrote. There’s an old saying that writing is rewriting, but hidden in that adage is that rewriting is first re-reading.

    The same is true of PRs and code review. Before you ask another person to review your code, review it yourself. See it on the screen the same way they will. Notice that commented-out block and the accidental whitespace change. Is refactoring obscuring logic changes? If you were the reviewer, what kinds of testing (manual or automated; this isn’t a post about unit testing) would make you comfortable with this change?

    Maybe you need to do the hard work of reorganizing your commits (and checking that your new code is precisely the same as your old code!). But maybe you just need to explain things a bit more in the PR description. Maybe a code-walkthrough is needed. Or maybe it really is an obvious change, and your reviewer will understand at once. There’s no need to over-do it. Let compassion and empathy lead you, not dogmatic rules.

    And remember that compassion and empathy, that feeling of being in another person’s place, when it’s time for you to be the reviewer.

  • Solving "Required kernel recording resources are in use by another document" in Instruments

    So you have a Swift Package Manager project, without an xcodeproj, and you launch Instruments, and try to profile something (maybe Allocations), and you receive the message “Required kernel recording resources are in use by another document.” But of course you don’t have any other documents open in Instruments and you’re at a loss, so you’ve come here. Welcome.

    (Everything here is from my own exploration and research over a few hours. It’s possible there are errors in my understanding of what’s going on, or there’s a better solution, in which case I’d love to hear from you so I can improve this post.)

    First, this error message has nothing to do with the actual error. The real error is that your binary doesn’t have the get-task-allow entitlement. I believe this is because it’s a release build, and SPM doesn’t distinguish between “release” and “profiling.” So you need to re-sign the binary.

    Edit your scheme (Cmd-Shift-,) and open the Profile>Pre-actions section. Add the following to re-sign prior to launching Instruments. Set your shell to /bin/zsh (this won’t work with bash).

    # For Instruments, re-sign binary with get-task-allow entitlement
    codesign -s - -v -f --entitlements =(echo -n '<?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "https://www.apple.com/DTDs/PropertyList-1.0.dtd"\>
    <plist version="1.0">
        <dict>
            <key>com.apple.security.get-task-allow</key>
            <true/>
        </dict>
    </plist>') ${TARGET_BUILD_DIR}/${PRODUCT_NAME}
    

    The funny =(...) syntax is a special zsh process subtitution that creates a temporary file containing the output of the command, and then uses that temporary file as the parameter. Note the -n on the echo. It’s required that there be no trailing newline here.

    This script will be stored in .swiftpm/xcode/xcshareddata/xcschemes/<schemename>.xcscheme.

    You might think you could just have a plist in your source directory and refer to it in this script, but pre-action scripts don’t know where the source code is. They don’t get SRCROOT.

    Also beware that if there’s a problem in your pre-action script, you’ll get no information about it, and it won’t stop the build even if it fails. The output will be in Console.app, but other than that, it’s very silent.

    So this is a mess of a solution, but I expect it to be pretty robust. It only applies to the Profile action, so it shouldn’t cause any problems with your production builds.

    You can also switch over to using an xcodeproj, but… seriously? Who would do that?

  • Time for something new

    So, all my tweets about C++ and Rust brings me to a minor announcement. After nearly 6 years with Jaybird, learning so much about Bluetooth on iPhones and custom firmware and audio, it’s time for something new. Luckily, Logitech is large enough that I can change jobs without changing my health insurance.

    In October I’m moving to the Logitech Software Engineering team. My exact projects are still up in the air, but I’ll likely work mostly on desktop products and interfacing with peripherals, particularly for gaming. Lots of C++ and thinking about Rust. Some Objective-C. Some Electron. Maybe a little Swift, but probably not right away.

    I love Swift, but it’s become almost everything I do for the last few years, and it’s gotten me into a silo where it’s all I really know well. I’m really looking forward to branching out again.

  • OK, I just fell in 😍 with stdlib’s Result(catching:).

    extension Request where Response: Decodable {
        func handle(response: Result<Data, Error>, 
                              completion: (Result<Response, Error>) -> Void) {
            completion(Result { 
                try JSONDecoder().decode(Response.self, from: response.get())
            })
        }
    }
    
  • Reminder: If you’re running into limitations with default parameters (for example, the value can’t be computed, they can’t specialize a generic, defaults don’t exist in literals), you can always replace a default with an explicit overload.

    func f(x: Int = 0) {}
    

    is the same as

    func f(x: Int) {}
    func f() { f(0) }
    

    stackoverflow.com/questions…

subscribe via RSS