More Awesome Than You!

The Bowels of Trogdor => The Small Intestines of Trogdor => Topic started by: benrg on 2007 May 03, 21:11:52



Title: New DBPF library
Post by: benrg on 2007 May 03, 21:11:52
I think this belongs in the Bowels of Trogdor, but I don't seem to have posting permission there.

This is a DBPF (*.package) library written in C++ with a C interface, GPL licensed. It has full write support with incremental and in-place updates and compression. The compression code is based on zlib's deflate code with lazy matching, and almost always compresses better than SimPE or the game itself (sometimes a lot better). I also included a sample app / useful utility called dbpf-recompress which uses the library to recompress package files. It shaves about 100 megabytes off the size of a base Sims 2 installation, and everything seems to still work.

This is a very preliminary release and some things are completely untested. Please test dbpf-recompress on every package you can lay your hands on (after backing them up!). And please use the library to build useful tools, otherwise it will all have been in vain.

Edit: New version 20070530. See the changelog in the zip archive.

Edit: New version 20070601.


Title: Re: New DBPF library
Post by: Theo on 2007 May 03, 22:57:23
I'm getting the output: "*** verify failed: reopen of new file failed: bad DBPF file (duplicate entry in compressed directory)" when I run the program with this package: http://theos.chewbakkas.net/maty/Jack-Sparrow-Hair-all-colors.7z

None of the resources of that package is previously compressed.

I think this belongs in the Bowels of Trogdor, but I don't seem to have posting permission there.

It seems you now do, welcome aboard :)



Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 03, 23:02:45
Cool. This is definitely a shiny thing. And welcome to the ranks of Peasants At Least As Tall As The Beefy Arm, whoever you are.

I'm getting the output: "*** verify failed: reopen of new file failed: bad DBPF file (duplicate entry in compressed directory)" when I run the program with this package: http://theos.chewbakkas.net/maty/Jack-Sparrow-Hair-all-colors.7z
SimPE does not like seeing files where something with the exact same instance and group values is duplicated, and then both are compressed. It makes SimPE sad.


Title: Re: New DBPF library
Post by: twojeffs on 2007 May 03, 23:55:32
Heh. It makes Simpe barf up huge scary hairballs is what it does. This has been needed for a very long time. I'll have to give it a run.


Title: Re: New DBPF library
Post by: benrg on 2007 May 04, 02:25:18
And welcome to the ranks of Peasants At Least As Tall As The Beefy Arm, whoever you are.

Um, I guess I should have introduced myself. Hi, I'm Ben. I've never posted on a Sims forum before. I'm a PhD student in the theory of programming languages. One of my hobbies is writing decompilers for obscure virtual machines, like these: http://www.darkweb.com/~benrg/if-decompilers/ (http://www.darkweb.com/~benrg/if-decompilers/). I thought that a SimAntics decompiler and recompiler might be an interesting project, and as a first step I needed a DBPF read/write library, so I looked around for one, and discovered that it was a most-wanted item, so I wrote it. Of course it took four times as long as I expected (Hofstadter's law), and the decompiler will probably never see the light of day, but in any case the library is finished (modulo bugs), so I'm happy.

As everyone else has figured out already, Jack Sparrow's hair has index entries sharing the same type/group/instance ID, and because compression status is indexed in the file by type/group/instance ID, it's impossible to have duplicate entries unless none of them are compressed. How should the library handle this? It could:

  • Refuse to even open files with duplicate index entries, because they're broken. Probably a bad idea, since at least one exists.
  • Open them but refuse to write them. Clients like dbpf-recompress would have to check for duplicates and... do what? Discard all but one? Compare them to make sure they're equal?
  • Allow opening and writing, always writing the duplicate entries uncompressed, unless the write_disposition makes that impossible, in which case fail. This is maximally flexible but seems pointlessly complicated for a rare and useless case, and makes pointlessly larger files. It wouldn't be too hard to implement, though.

-- Ben


Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 04, 02:39:07
I vote for D: Garbage In, Garbage Out.

And out of curiosity, do you actually play the game? How did you find us here?


Title: Re: New DBPF library
Post by: dizzy on 2007 May 04, 07:00:32
Hi Ben. Welcome to Hell. Feel free to reuse any of my code as you wish (and I'm sure Shy won't mind if you use his). ;)

I vote for auto-removal of dupes. That seems the best policy.

I don't think Z code is obscure. I've done a bit of Inform programming, actually. Nothing public, though.


Title: Re: New DBPF library
Post by: Delphy on 2007 May 09, 11:04:50
Actually, if the program is done right, the index entries wont matter - what counts is the offset to the file, not the TGI information.  A resource can have exactly the same Group and Instance but be a discrete chunk within the .package file and work perfectly fine.  Another point to be aware of is to not rely on the DIR index for the compression flag - I use a combination of this and checking the first four bytes of the chunk to make sure it's compressed.

The DBPFClass.dll I made with my Download Organiser (in C#) handles that file with no errors whatsoever, and even detects the type correctly.

What I would do is write either only the first or the last file with the same TGI (although don't forget to check high-instances too) just so you actually have part of the data.


Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 09, 13:12:35
So it's just a SimPE problem, then. In that case, the current behavior is correct, and SimPE Must Be Destroyed. Option D sounds better and better.


Title: Re: New DBPF library
Post by: dizzy on 2007 May 09, 20:52:35
My thinking on the matter is if the data is identical, then it's simply redundant and a waste of space. Ergo, in a program that reduces wasted space, it should remove that resource.


Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 09, 23:18:02
Yes, but there might be a perfectly good reason why someone might save two copies of a resource: Perhaps one is a backup (since I think the second one is used as it would load last).


Title: Re: New DBPF library
Post by: dizzy on 2007 May 15, 01:39:19
I changed a couple things to get this to compile with g++. Is there an easier way to deal with these gotos across uninitialized variables?


Title: Re: New DBPF library
Post by: Delphy on 2007 May 29, 09:58:41
I posted this over in the compressoriser thread but figured I'd duplicate it here.

Essentially, my DBPF code for the DDO has issues with SHPE refs in files that are re-compressed using this program.  It reads normal SHPE refs and other compressed ones fine, that I can see.

On *normal* Maxis/SimPE compressed SHPE chunks, the DDO works fine.  It is only on the SHPE chunks that are run through the Compressoriser that it borks.  The reason is that there is an extra byte inserted after the length of the sgResource string but before the actual string.

For example:

Normal compressed SHPE ref, in Hex view in SimPE: 66 42 72 61 6e 64 69... etc.  This is 1 byte for the string length (0x66 - 102 bytes) and then immediately followed by the actual string (Brandi...).
For a compressorised SHPE ref, in Hex view in SimPE: 95 01 62 72 61 6e 64 69... etc.  This is 1 byte for the string length (0x95 - 149 bytes) and then a 01 and then followed by the actual string (brandi...)

A string in the SHPE ref chunk *should* be encoded with 1 byte for the length and then the actual string.  I'm guessing Simpe either ignores this extra byte, or does something else with it - I can't find anything in the SimPE code specifically mentioning it. 

I can put a catch in the DBPFClass code to check if the first byte of compressed string is 01 and ignore it, but I'd like to know why it's there in the first place.  Perhaps my understanding of the actual string storage is not complete, but the code for the DDO works with basically every other file you can throw at it *except* the compressorised ones. :)

Can you shed any light as to why this extra 0x01 value is getting in there?

Regards
Delphy


Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 29, 10:19:55
I do not see any extra "1" byte in my SHPEs that I compressed, but I'm using the original CLI version at the very top of this thread. Is this purely an artifact of the Jfade edition, or does it exist in the original and/or the Dizzyian edition? Or am I looking in the wrong place? Does this "DDO" (What is it?) run into the same problem reading, say, Bathroom Uses You?


Title: Re: New DBPF library
Post by: jfade on 2007 May 29, 13:16:09
I do not see any extra "1" byte in my SHPEs that I compressed, but I'm using the original CLI version at the very top of this thread. Is this purely an artifact of the Jfade edition, or does it exist in the original and/or the Dizzyian edition? Or am I looking in the wrong place? Does this "DDO" (What is it?) run into the same problem reading, say, Bathroom Uses You?
I don't think my program has anything to do with it, as I shell out to the exact same command line version of the EXE attached at the top of this thread. I didn't recompile the code or anything, just shelled out to the same EXE, so it should be exactly the same, unless he updated it since I first downloaded the library. Maybe this one only gets added in certain circumstances?


Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 29, 13:23:05
Not sure what circumstances those are, then. I wasn't even aware the repacker did any changes, I thought it was entirely datatype-agnostic, and performed the double-check to make sure the data uncompresses to be the same. And, like I said, I don't see it. And what's a DDO?


Title: Re: New DBPF library
Post by: Venusy on 2007 May 29, 13:48:28
And what's a DDO?
Delphy's Download Organizer, I think there was a thread about it in the Podium a while back.

EDIT: Well, it was mentioned in a thread (http://www.moreawesomethanyou.com/smf/index.php/topic,8048.msg222547.html#msg222547) in the Podium a while back at least. Direct MTS2 link (http://www.modthesims2.com/showthread.php?t=227925).


Title: Re: New DBPF library
Post by: jfade on 2007 May 29, 13:59:54
Not sure what circumstances those are, then. I wasn't even aware the repacker did any changes, I thought it was entirely datatype-agnostic, and performed the double-check to make sure the data uncompresses to be the same. And, like I said, I don't see it. And what's a DDO?
Neither was I. I'd be interested in seeing the original and compressed versions of these files to see what's up. I wonder if it's more of a problem with how the decompression itself is performed.


Title: Re: New DBPF library
Post by: dizzy on 2007 May 29, 23:55:57
Normal compressed SHPE ref, in Hex view in SimPE: 66 42 72 61 6e 64 69... etc.  This is 1 byte for the string length (0x66 - 102 bytes) and then immediately followed by the actual string (Brandi...).
For a compressorised SHPE ref, in Hex view in SimPE: 95 01 62 72 61 6e 64 69... etc.  This is 1 byte for the string length (0x95 - 149 bytes) and then a 01 and then followed by the actual string (brandi...)

A string in the SHPE ref chunk *should* be encoded with 1 byte for the length and then the actual string.  I'm guessing Simpe either ignores this extra byte, or does something else with it - I can't find anything in the SimPE code specifically mentioning it. 

...

Can you shed any light as to why this extra 0x01 value is getting in there?

I assume you're referring to the offset+9 data stream? If so, it seems to me that expecting specific control bits for particular resources at specific offsets is a critical bug in DDO. Wouldn't a more generic unpacker routine make more sense?

Could you please provide a full example of a package that exhibits this issue?


Title: Re: New DBPF library
Post by: J. M. Pescado on 2007 May 30, 04:19:09
I assume you're referring to the offset+9 data stream? If so, it seems to me that expecting specific control bits for particular resources at specific offsets is a critical bug in DDO. Wouldn't a more generic unpacker routine make more sense?
Would seem more like the data is not compliant with the file format, but why this would happen is unknown to me: I don't see anything like this happening in files I've packed, and Jfade says his is just a front-end that outsources to the original command line tool.

Could you please provide a full example of a package that exhibits this issue?
And perhaps a screenshot of where you see the error?


Title: Re: New DBPF library
Post by: benrg on 2007 May 30, 16:41:58
Hi, following up on a bunch of stuff:

The problem with compressed duplicate entries is that there are two different entry lists in the file. One lists the type, group, instance, file offset, and (compressed) size for all entries, and the other lists the type, group, instance, and uncompressed size for all compressed entries. In order to figure out what's compressed you have to merge the two lists by type/group/instance, and there's no way to do that when there are duplicate entries. It's a stupid design, but it's not SimPE's fault.

Normal compressed SHPE ref, in Hex view in SimPE: 66 42 72 61 6e 64 69... etc.  This is 1 byte for the string length (0x66 - 102 bytes) and then immediately followed by the actual string (Brandi...).
For a compressorised SHPE ref, in Hex view in SimPE: 95 01 62 72 61 6e 64 69... etc.  This is 1 byte for the string length (0x95 - 149 bytes) and then a 01 and then followed by the actual string (brandi...)

SimPE uses System.IO.BinaryReader.ReadString() and its BinaryWriter counterpart for most string serialization. These do not use a single byte for the length -- they use a weird base-128 encoding documented here: http://msdn2.microsoft.com/en-us/library/system.io.binarywriter.write7bitencodedint.aspx (http://msdn2.microsoft.com/en-us/library/system.io.binarywriter.write7bitencodedint.aspx). In this encoding a length of 0x95 is serialized as 0x95 0x01. I'm almost certain that this is your problem, not anything to do with compression.

I don't know whether this is a bug in SimPE or whether it actually is the encoding used by The Sims. If the former, you may have to use an ugly workaround anyway.

And out of curiosity, do you actually play the game? How did you find us here?

I'm not really a gamer, I'm just interested in game design. When I do play games it's usually with as many cheats and walkthroughs as I can get my hands on, which for some reason strikes many people as somehow immoral.

I think I found you through a thread in ModTheSims2. How are people supposed to find you? Or aren't they? It isn't exactly obvious how to get here from the top page, unless paying the $40 actually works.


Title: Re: New DBPF library
Post by: wes_h on 2007 May 30, 19:15:47
I think I found you through a thread in ModTheSims2. How are people supposed to find you? Or aren't they? It isn't exactly obvious how to get here from the top page, unless paying the $40 actually works.

They better cut you some slack here. Mere mortals, as opposed to talented programmers such as yourself, would get laughed at for even considering paying the $40. Mr. Pescado is of a definite antipaysite and antiestablishment mindset. Once you know this, you can then appreciate the not-so-subtle lampooning of various sacred cows that goes on here.

As for the game string length, I never noticed any longer than 127 bytes, and all my code that deals with them would suffer similar problems. Is this the way they look before compression (i.e. 0x95 0x01)?



Title: Re: New DBPF library
Post by: dizzy on 2007 May 30, 23:10:01
I wouldn't be surprised. Even in Sims 1, the game engine used pstrings, cstrings, and ordinary non-terminated strings. Even worse, there was never any way you could reliably determine which format you should expect. Even resource type is NOT reliable for that determination.


Title: Re: New DBPF library
Post by: wes_h on 2007 May 30, 23:26:52
I never modded Sims 1. But I just thought that all the strings that were pstrings were <128 bytes.
I have been looking at the ANIM files, and all the bone names in the file are cstrings, although you have a count already on hand of how many to look for.
I will pay more attention to this. Everything I have made so far, though, parses just specific RCOL files, not the packages.
<* Wes *>


Title: Re: New DBPF library
Post by: benrg on 2007 May 30, 23:59:45
I just posted a new version of the library. The changes aren't very interesting. If you're a programmer, see the changelog in the zip archive for a list. If you're not a programmer, you can switch to the new version of dbpf-recompress.exe, but you probably won't notice any difference.

Mere mortals, as opposed to talented programmers such as yourself, would get laughed at for even considering paying the $40.

Oh, I know it's a spoof. But what it looks like is a domain with nothing on it but the spoof pages. You'd never guess that there's a real forum with real Sims 2 content on the same site.

Quote
Is this the way they look before compression (i.e. 0x95 0x01)?

Yes, that's before compression (or after decompression).


Title: Re: New DBPF library
Post by: dizzy on 2007 May 31, 02:58:01
Is there a way to fix this for GCC? I may be using Linux, but I'm compiling this in Bloodshed (in Wine).

Code:
#if defined(unix) && unix
#include <unistd.h>  // for unlink (no such header on Windows)
#endif


Title: Re: New DBPF library
Post by: Delphy on 2007 June 01, 07:07:27
Quote from: benrg
SimPE uses System.IO.BinaryReader.ReadString() and its BinaryWriter counterpart for most string serialization

That's the weird thing - I *also* use ReadString to read those strings, and it's worked in absolutely every case I've thrown at it, until now.  I think it might have something to do with the wrapper I'm using though, so I'll have to investigate further.

Thanks for the feedback. :)


Title: Re: New DBPF library
Post by: benrg on 2007 June 01, 15:36:10
New release that should fix the unistd.h problem.


Title: Re: New DBPF library
Post by: dizzy on 2007 June 02, 11:28:57
Cool. That works.  :)