Rixstep
 About | ACP | Buy | Industry Watch | Learning Curve | News | Products | Search | Substack
Home » Learning Curve » Developers Workshop

_CFStringAppendFormatAndArgumentsAux

Perhaps Cupertino should place a call to Princeton.


Get It

Try It

NSString is a real trouper of a code class. It's immutable. You can't modify things in it. 'NSString *' is a pointer to an instance of the class. You never access classes directly - only with pointers.

You can initialise an NSString if you want. In such case you have to release it when you're finished with it. NSString pointers on the stack don't need this - they only point to things. Of course if they point to things that have been initialised...

NSString is an abstract class. You never see NSString instances. You see their representations all over the place. They work with derivatives of NSText which is a base class used all over the place. The derivates are mostly visible. They're seen in text fields (NSTextField) and in text views (NSTextView) and so forth.

These are incredibly complex classes and NSText is pervasive in the system. Almost everything seen in text on screen is through NSText. Theoretically everything should be. From Safari's web views to Mail's text views. It would be ridiculous to rebuild all this. Which of course is the reason Carbon editors have been so ridiculous for so many years: spittle in the face of the technologies made available.

Manipulating text is very similar. All basic classes are found first in an immutable model. This makes for very sound engineering: implement the basic functionality first, then add the fancy stuff afterwards. The mutable class for NSString is called rather expectedly NSMutableString. Given any string there are any number (all possible) things you can do with it.

  • Modify it by replacing occurrences of one string with another.
  • Extract substrings from or to a specific index.
  • Get the character at a given index.
  • Append new strings to the existing one.

And so forth. Strings can get very complex. Study if you will the dumps from Xframe - either in the dump window or the log. This is one long string for each row. Everything is put in there - space padding, etc. Or in Lightman. Each section of an export is a single string. Or in TMI where the same applies: each tab is a single string doctored for output in a text view (NSTextView).

It's very powerful code.

One of the most powerful (and ubiquitous) methods for NSMutableString is appendFormat:.

appendFormat:

Adds a constructed string to the receiver.

-(void)appendFormat:(NSString *)format ...

Note the ellipsis at the end. This implies C argument passing - in other words: there can be as many function arguments as you need. This goes counter to MSFT and Pascal calls where arguments are pushed on the stack in the back-arsewards order. There's an advantage for MSFT here as Intel have a very fast RET call which works for this type of stack protocol but otherwise the whole thing's worthless.

What the definition is saying is the following:

  • You call your instance of NSMutableString with appendFormat:.
  • Your first (and possibly your only) argument is a (Unicode) string.
  • After that it's up to the string itself.

appendFormat: works much the same as the printf family of C runtime functions [sprintf, fprintf - printf correctly implemented as a macro for fprintf(stdout)]: the called code inspects the first argument (the string) for occurrences of the character '%'. This is the 'escape' character for the function. [Double '%'s mean the second '%' is taken literally much as the backslash is used in similar situations.]

For each instance of a singleton '%' the called code pops another argument off the stack. The '%' is always followed by a 'type denotation' - such as 's' for string, 'u' for unsigned, and so forth. These denotations can of course be more complex. Such as '%.2X' for hexadecimal value at least two digits. And so forth.

In this regard appendFormat: and the printf functions work the same. And it's highly likely, as they use the same syntax, that the code for appendFormat: is based on printf code.

The official documentation goes on to reveal:

Discussion
The appended string is formed using NSString's stringWithFormat: method with the arguments listed.

It also makes it clear this is a very old method actually dating back to NeXTSTEP (where all Cocoa technologies come from).

Availability
Available in Mac OS X v10.0 and later.

As 10.0 is the first version one may therefore conclude it's always been around.

An interesting aside to this discussion - beyond understanding how the function code has always and should always be implemented - is mention of the first public demonstration of C code ever in the introductory chapter to K/R written by Brian Kernighan.

Today everybody's doing it but it was Brian Kernighan who started it off.

While small test programs existed since the development of programmable computers, the tradition of using the phrase 'hello world' as a test message was influenced by an example program in the seminal book 'The C Programming Language'.

main()
{
    printf("hello, world\n");
}

BWK is using a formatting function that contains no escape characters, no formatting to do. It's also obvious the call has a single argument - the character string 'hello, world\n'.

But the called function printf doesn't look for further arguments on the stack because the formatting string doesn't indicate any further arguments are needed (or therefore exist).

This is the same thing as the following in Cocoa.

main()
{
    // It's on the stack and will be autoreleased
    NSMutableString *mutableString = [NSMutableString string];

    [mutableString appendFormat:@"hello, world\n"];

    // do something with this string - show it somewhere
}

There are other NSMutableString methods available just as there are other C runtime functions available. NSMutableString also has appendString: which is the functional equivalent of the C runtime strcat(). Both append one string to another. But the following would be considered really wacky.

main()
{
    char cp[100] = "";

    printf("%s", strcat(cp, "hello, world\n"));
}

It's possible to alternate calls to appendFormat: with calls to appendString: but it's not necessary and often stupid and self-defeating. Method addresses are cached by the Objective-C runtime and any unnecessary external reference just adds bulk to a binary. The documentation for appendString: doesn't say how the method works but one may assume a similar level of code being used.

Rixstep have always been against code waste since Day One and have learned more about this when entering the world of event driven programming where shared ('dynamic') libraries hook up to client programs at runtime. You never reinvent the wheel. On Windows you resolve everything down to SendMessage as often as you can as most of the other cruft is just quilt patchwork. Likewise you don't call appendString: if your app is otherwise stuffed to the gills with calls to appendFormat:.

And since 10.0 as the documentation intimates this has been standard operating procedure.

Suddenly in 10.6 with the new ADC the above Cocoa code will provoke a compiler diagnostic: it'll indicate it was expecting arguments as appendFormat: seems to imply escape characters in the format string. The point however is they don't need to be there any more than Brian Kernighan needed them to be there in his 'hello world' tutorial. The called code itself uses the C stack protocol to get what it needs off the stack if and only if there's any need to get it.

It's true that some developers (such as the current maintainer of Vienna RSS) trip up on NSLog but that's another matter completely - that's a question of formatting strings that can be injected into an unwitting application; this is not.

Code written up to and including 10.5 with the equivalent of the above Cocoa snippet compiles and builds cleanly. And it should. Code written up to and including 10.6.1 Snow Leopard does the same. But starting with this latest glorious 10.6.2 update there's a serious issue.

Date/Time:       2009-11-10 11:55:09.684 +0100
OS Version:      Mac OS X 10.6.2 (10C540)

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   libSystem.B.dylib                0x9006ba90 strlen + 16
1   com.apple.CoreFoundation         0x90f81ec6 _CFStringAppendFormatAndArgumentsAux + 3894
2   com.apple.Foundation             0x9842c977 -[NSCFString appendFormat:] + 91
3   com.rixstep.Tracker              0x00005df5 0x1000 + 19957
4   com.apple.AppKit                 0x91f279f6 -[NSSavePanel _didEndSheet:returnCode:contextInfo:] + 295

Stack dumps are of course read from the bottom up if one wants to see what's happening chronologically. A save panel was closed. This because the application (Tracker) was going to export its listing. Tracker knew at this point it had to assemble its output as a string (using NSMutableString) and get it ready for file storage.

But -[NSCFString appendFormat:] (which didn't need to exist before the trek from Redwood City to Cupertino) now calls _CFStringAppendFormatAndArgumentsAux() which is an internal call. The application code doesn't call this directly - it's Apple code calling Apple code (and ultimately screwing up).

The cause of the crash is given as follows.

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000000fffffff0

And that certainly is a strange address - it's a runaway pointer.

Note the address is given as 64-bit. That's fine. Note further that the high 32 bits are zeroed out. This is because the app in question is 32-bit. [More on this below.] The final address is 16 bytes shy of the addressing ceiling. It's way up there.

Cutler implemented 64 KB 'pits' in low and high user memory in his 'NT' to catch wild pointers. This isn't 64 KB but 16 bytes. But whatever - it's a wild pointer.

How do you get a wild pointer? The code is looking for something that doesn't exist and just keeps on going. How can this happen? It might happen if the called code incorrectly assumes there's something on the stack when there's not. It pops it and starts reading it and keeps going to find the end of it (a string, whatever) except it can't find it. Alternatively it's looking for the beginning of something else and never finds the sentinel value it's looking for. So it just keeps on going.

Anyway. Wild pointer. Now the punchline.

Process:         Tracker [721]
Path:            /.../Tracker.app/Contents/MacOS/Tracker
Identifier:      com.rixstep.Tracker
Version:         2.0 (2.0.1)
Code Type:       X86 (Native)
Parent Process:  launchd [96]

This was a Leopard application. Compiled and built cleanly on 10.5 Leopard with no semblance of error or warning whatsoever. Given Apple's improvements in 10.6.2, it crashes. But not always - only 'sometimes'. Meaning the code is not only wrong - it's wacky. For code that consistently does the wrong thing is one thing but code that can't even do that is far far worse.

The use of appendFormat: that Apple maintainers suddenly don't love anymore does not provoke a compiler error - there's nothing really wrong with the code. There's no error. There's only a warning. So the code should work. And if the code builds perfectly on 10.5 and even works on 10.6.1 then one should be able to reasonably assume it should always work. After all, the method's been available since 10.0 and perhaps as far back as 1987 where its behaviour has presumably been consistent all these years at NeXT and afterwards.

But it's now one sees the real reason for the diagnostic that suddenly rears its head on 10.6: Apple have been reworking the underlying code which shouldn't be 'underlying' in the first place - ad hoc 'Apple' code added underneath the Cocoa classes to support Carbon which doesn't belong there. Or who knows what else. And it's constructed in a really wrong way.

The following stack unwind (found here) shows why they still haven't put the code back in NSMutableString where it's belonged since 1997.

0   com.apple.CoreFoundation          0x930c0e3e __CFStrConvertBytesToUnicode + 62
1   com.apple.CoreFoundation          0x930a700c copyBlocks + 156
2   com.apple.CoreFoundation          0x930ac091 __CFStringChangeSizeMultiple + 1617
3   com.apple.CoreFoundation          0x930b5027 CFStringAppend + 231
4   com.apple.CoreFoundation          0x930b5d63 _CFStringAppendFormatAndArgumentsAux + 2899
5   com.apple.CoreFoundation          0x930b707e CFStringAppendFormatAndArguments + 46
6   com.apple.CoreFoundation          0x930b70a9 CFStringAppendFormat + 41

The dependencies start with the Carbon CFStringAppendFormat and follow to CFStringAppendFormatAndArguments() and then to the culprit. CFStringAppendFormat is a Carbon function - it's not needed (or wanted) in object oriented NeXT/Cocoa code.

The function _CFStringAppendFormatAndArgumentsAux() is found three times in the Core Foundation binary, one for each of the three architectures 64-bit Intel, 32-bit Intel, and 32-bit PPC.

00000000002664cc __CFStringAppendFormatAndArgumentsAux
00000000004d6abc __CFStringAppendFormatAndArgumentsAux
0000000000750574 __CFStringAppendFormatAndArgumentsAux

Now here's the final punch line. Apple's skeleton code for a 'command line tool' is nothing other than their version of Brian Kernighan's 'hello world' program.

Yet no compiler warning or error is issued. The code builds cleanly.
Perhaps it's time Cupertino placed a call to Princeton.

See Also
Coldspots: The 10.6.2 Update
Coldspots: Apple Mail's Amnesia
Coldspots: 10.6.2: Still Can't Get It Right
Coldspots: Snow Leopard's Windows Executables

About | ACP | Buy | Industry Watch | Learning Curve | News | Products | Search | Substack
Copyright © Rixstep. All rights reserved.