Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jeanlain

macrumors 68020
Original poster
Mar 14, 2009
2,440
936
I'm writing a Mac app as my hobby, and I'm trying to understand a few basic things about pointers (I use objective-C). I couldn't find answers on the web.

I need to parse binary files that are structured like this:

  • bytes 0 to 3: some chars
  • bytes 4 and 5: a 16-bit short
  • ...
  • bytes 206 to 233: a 28-byte structure
  • bytes 234 to 261: another 28-byte structure
  • ...

I have defined the 28-bit structure in question like so:
Code:
typedef struct DirEntry {          
  ...    # some shorts and ints         
} DirEntry;

I ingest a file into some NSData object named "fileData", then I do:
Code:
const char *fileBytes = fileData.bytes;
DirEntry someEntry = *(const DirEntry *)(fileBytes + 206);

This works, but what garanties that the second instruction will never cause a segmentation fault? I mean, the content of fileData is not guaranteed to constitute a continuous block of memory, right? So what if bytes 206 to 233 are not in the same memory block?

Is this a valid concern, am I doing things that may cause crashes under certain conditions?

Note: I know that there are alternative (for instance using NSData's getBytes).

Thanks.
 

lloyddean

macrumors 65816
May 10, 2009
1,047
19
Des Moines, WA
Let us see a runnable version of your problematic structure layout and I/O code.

Currently I'm thinking your struct likely has layout/alignment issue that are assumed to be the same as written to, perhaps, a file that you're reading into the structure?
 
  • Like
Reactions: jdb8167

jeanlain

macrumors 68020
Original poster
Mar 14, 2009
2,440
936
The structure is
Code:
typedef struct __attribute__((__packed__)) DirEntry {      
    char itemName[4];              
    int itemNumber;                  
    short elementType;              
    short elementSize;              
    int numElements;              
    int dataSize;                  
    int dataOffset;                  
    int dataHandle;                  
} DirEntry;

As you can see it is a packed structure. I checked that its size is indeed 28 bytes, as specified in the file.

The I/O code is just:
Code:
NSData *fileData = [NSData dataWithContentsOfFile:path options:NSDataReadingMapped error:nil];
const char *fileBytes = fileData.bytes;
DirEntry someEntry = *(const DirEntry *)(fileBytes + 206);

The code works fine but I'm wondering if I'm just lucky that fileData has its bytes in a a single memory block. Is there a risk to crash the app if it's not?
 

chown33

Moderator
Staff member
Aug 9, 2009
10,766
8,466
A sea of green
You should check fileData.length to make sure it's big enough. That's a length question, though, not one of contiguity.

You may or may not need to deal with byte-ordering (endianness) of the data. That's also a separate question.

What leads you to think that fileData.bytes won't return a pointer to contiguous bytes? I've seen nothing in the docs that suggests this. Can you point to something that does?

If it's really something that concerns you, you could do file I/O with stdio funcs, such as fopen(), fseek(), fread(), etc. Obviously, you'd have to check returned values for sufficient size, errors, EOF, etc.
 
  • Like
Reactions: NoBoMac

Senor Cuete

macrumors 6502
Nov 9, 2011
424
30
In C there's no guarantee that a short, an int or a float, etc. will be some number of bytes, only that short<int<long and that float<double< double double, etc. So this code could fail on some other machine or OS. To be safe you should declare the members of the struct as uintn_t where n is the number of bytes - or some other type that specifies the number of bytes.
 
Last edited:

lloyddean

macrumors 65816
May 10, 2009
1,047
19
Des Moines, WA
A bit more information concerning processor and OS maybe of importance.

On modern hardware isn't an Int 8 bytes

Code:
typedef struct __attribute__((__packed__)) DirEntry {      
    char itemName[4];   //  0  4
    int itemNumber;     //  4  8
    short elementType;  // 12  2
    short elementSize;  // 14  2
    int numElements;    // 16  8
    int dataSize;       // 24  8    
    int dataOffset;     // 32  8
    int dataHandle;     // 40  8         
} DirEntry;
 
Last edited:

lloyddean

macrumors 65816
May 10, 2009
1,047
19
Des Moines, WA
A bit more information concerning processor and OS maybe of importance.

On modern hardware isn't an Int 8 bytes

Code:
typedef struct __attribute__((__packed__)) DirEntry {     
    char itemName[4];   //  0  4
    int itemNumber;     //  4  8
    short elementType;  // 12  2
    short elementSize;  // 14  2
    int numElements;    // 16  8
    int dataSize;       // 24  8   
    int dataOffset;     // 32  8
    int dataHandle;     // 40  8        
} DirEntry;

And then they're are possible alignment padding issues
 

NoBoMac

Moderator
Staff member
Jul 1, 2014
5,818
4,427
On modern hardware isn't an Int 8 bytes

Generally, maybe. What @Senor Cuete said is true re not guaranteed number of bytes for basic types. For example in Swift, an int is 32 or 64 bits long, depending on processor.


Wikipedia has a nice summary.


If want to get into the weeds, the standard changes are here and they are calling out an int to be at minimum 16 bits (implementation dependent: See Apple Swift example):

 
  • Like
Reactions: jeanlain

jeanlain

macrumors 68020
Original poster
Mar 14, 2009
2,440
936
Thanks for the replies. :)
Yes, I check that the NSData objet is long enough (I haven't posted all the code here, for simplicity sake) and I know that the data is in Big Endian. I convert numbers appropriately.
Thanks for the suggestion of making sure the structure and its members are of the right size. I will define it better.

But the question was not about padding or the size of the structure, it was about contiguity of the bytes in memory.
What leads you to think that fileData.bytes won't return a pointer to contiguous bytes? I've seen nothing in the docs that suggests this. Can you point to something that does?
Well no. I was just wondering.
An NSData object can contain gigabytes right? Can it ensure that all bytes are contiguous?
 

chown33

Moderator
Staff member
Aug 9, 2009
10,766
8,466
A sea of green
An NSData object can contain gigabytes right? Can it ensure that all bytes are contiguous?
First, in Apple's ref doc for NSData it says:
The size of the data is subject to a theoretical limit of about 8 exabaytes.
To me, this suggests that very large sizes are within the class's capabilities.

Second, there's only the one property that returns a pointer: .bytes . I see nothing about discontiguity, only about when the returned pointer can become invalid.

There is a method that enumerates ranges with a block, and contiguity is mentioned, so that might be worth exploring. For example, you might be able to write a block that tests for contiguity, i.e. it enumerates with exactly one execution of the block, and receives the expected range.

If you really think this is something to be concerned about, then you should write your code defensively. That might be using getBytes, or it might be using stdio, or something else. If it's not important to the near-term success of the code, then maybe do something about it later, or plan to give it a gigantic-file test case. I often make a DO LATER list, along with a DO A TEST and a TO DO list.

The degree to which one codes defensively usually depends on what the consequences of failures are. If it's for an embedded system on top of a tall tower or at a remote location, then dealing with any failures is quite different than a program that seg-faults on a single local Mac with a human in front of it.
 
Last edited:

Senor Cuete

macrumors 6502
Nov 9, 2011
424
30
I could be useful to define your struct like this:
Code:
typedef struct DirEntry {         
  ...    # some shorts and ints        
} DirEntry, *DirEntryPtr;
The DirEntryPtr can make it easier to access the struct. You could even declare a **DirEntryHandle;
 

f54da

macrumors 6502
Dec 22, 2021
355
131

The thing to keep in mind here is that this behavior was changed in iOS7: now NSData/NSMutableData are not guaranteed to keep contents as one contiguous array. It could be stored as multiple chunks.

So when you call bytes/mutableBytes, they will copy and flatten contents into one contiguous array of bytes, if needed, and then return a pointer to this contiguous chunk.

Depending of what you're trying to do, it may cause an unexpected performance penalty or excessive memory consumption for large buffers.
 
  • Like
Reactions: jeanlain

Senor Cuete

macrumors 6502
Nov 9, 2011
424
30
Pointer arithmetic is fine. Programmer's use it all the time but if you want to be sure that your code will work on different machines you can access the members of the struct like this:
Code:
typedef struct barf {
    int foo;
    int bar;
} barf, *barfPtr;

barf myBarf;
myBarf.foo = 1;
myBarf.bar = 2;
int myInt = myBarf.bar;
or:
barfPtr myBarfPtr;
myBarfPtr->foo = 1;
myBarfPtr->bar = 2;
myInt = myBarfPtr->bar;

The compiler would know the size of the members of the struct.

Or if you are using objects the getter and setter methods.

Also I failed to mention that before you use the barfPtr you have to make it point to myBarf with the asignment:
Code:
barfPtr myBarfPtr = &myBarf;
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.