micro optimization #6

Merged
takahiro0327 merged 2 commits from small-optimization into master 2024-08-26 14:52:32 +00:00
takahiro0327 commented 2024-08-26 09:14:11 +00:00 (Migrated from github.com)

The following two points have been changed.
Please review and merge.

  • Add code to call PostBuild.bat in the post-build script

  • Reduce GCAlloc
    The profiler showed it was a problem, so I did some micro-optimization of the code.
    At first I thought it was the O(N^2) code using RemoveAt, but it seemed to be caused by GCAlloc due to the reference to .name.

Changes in loading time for huge scenes:
Before, average: 36.84

Scene loading completed: 36.8[s]
Scene loading completed: 37.8[s]
Scene loading completed: 36.5[s]
Scene loading completed: 36.6[s]
Scene loading completed: 36.5[s]

After, average: 34.76 -2.08

Scene loading completed: 35.2[s]
Scene loading completed: 34.8[s]
Scene loading completed: 34.6[s]
Scene loading completed: 35.0[s]
Scene loading completed: 34.2[s]

Before profile:
image
MonoProfilerOutput_2024-08-25_13-23-16.csv

After profile:
image
MonoProfilerOutput_2024-08-26_17-31-48.csv

Memo:
There are 20 characters in this scene. The 1120 calls to RemoveCollidersFromCoordinate(ChaControl) seem excessive.

The following two points have been changed. Please review and merge. - Add code to call PostBuild.bat in the post-build script - Reduce GCAlloc The profiler showed it was a problem, so I did some micro-optimization of the code. At first I thought it was the O(N^2) code using RemoveAt, but it seemed to be caused by GCAlloc due to the reference to .name. Changes in loading time for huge scenes: Before, average: 36.84 ``` Scene loading completed: 36.8[s] Scene loading completed: 37.8[s] Scene loading completed: 36.5[s] Scene loading completed: 36.6[s] Scene loading completed: 36.5[s] ``` After, average: 34.76 -2.08 ``` Scene loading completed: 35.2[s] Scene loading completed: 34.8[s] Scene loading completed: 34.6[s] Scene loading completed: 35.0[s] Scene loading completed: 34.2[s] ``` Before profile: ![image](https://github.com/user-attachments/assets/dbd788ba-21e6-4020-833f-172751dc2236) [MonoProfilerOutput_2024-08-25_13-23-16.csv](https://github.com/user-attachments/files/16746554/MonoProfilerOutput_2024-08-25_13-23-16.csv) After profile: ![image](https://github.com/user-attachments/assets/840549be-6638-4f0c-9f56-2bff0541a300) [MonoProfilerOutput_2024-08-26_17-31-48.csv](https://github.com/user-attachments/files/16746557/MonoProfilerOutput_2024-08-26_17-31-48.csv) Memo: There are 20 characters in this scene. The 1120 calls to RemoveCollidersFromCoordinate(ChaControl) seem excessive.
takahiro0327 commented 2024-08-26 11:03:51 +00:00 (Migrated from github.com)

Memo2:
ExtensibleSaveFormat.ExtendedSave/Hooks: I tried to reduce the GCAlloc of ChaFileLoadFileHook because it was large, but it actually got slower.

I suppressed the generation of huge byte arrays using ReadBytes(), and
changed MessagePackSerializer.Deserialize@byte[] to MessagePackSerializer.Deserialize@Stream.
The execution time of ChaFileLoadFileHook alone was reduced from 500ms to 1.5ms. This was great, but the total execution time, including children, increased from 3s to 5s, and GCAlloc increased from 300MB to 3.6GB.
It seems that deserializing from a stream is quite heavy.

Before source code:

            private static void ChaFileLoadFileHook(ChaFile file, BlockHeader header, BinaryReader reader)
            {
                var info = header.SearchInfo(Marker);

                if (LoadEventsEnabled && info != null && info.version == DataVersion.ToString())
                {
                    long originalPosition = reader.BaseStream.Position;
                    long basePosition = originalPosition - header.lstInfo.Sum(x => x.size);

                    reader.BaseStream.Position = basePosition + info.pos;

                    byte[] data = reader.ReadBytes((int)info.size);

                    reader.BaseStream.Position = originalPosition;

                    cardReadEventCalled = true;

                    try
                    {
                        var dictionary = MessagePackSerializer.Deserialize<Dictionary<string, PluginData>>(data);
                        internalCharaDictionary.Set(file, dictionary);
                    }
                    catch (Exception e)
                    {
                        internalCharaDictionary.Set(file, new Dictionary<string, PluginData>());
                        Logger.LogWarning($"Invalid or corrupted extended data in card \"{file.charaFileName}\" - {e.Message}");
                    }

                    CardReadEvent(file);
                }
                else
                {
                    internalCharaDictionary.Set(file, new Dictionary<string, PluginData>());
                }
            }

After source code:

            private static void ChaFileLoadFileHook(ChaFile file, BlockHeader header, BinaryReader reader)
            {
                var info = header.SearchInfo(Marker);

                if (LoadEventsEnabled && info != null && info.version == DataVersion.ToString())
                {
                    long originalPosition = reader.BaseStream.Position;
                    long basePosition = originalPosition - header.lstInfo.Sum(x => x.size);

                    cardReadEventCalled = true;

                    try
                    {
                        reader.BaseStream.Position = basePosition + info.pos;
                        var dictionary = MessagePackSerializer.Deserialize<Dictionary<string, PluginData>>(reader.BaseStream);
                        internalCharaDictionary.Set(file, dictionary);
                    }
                    catch (Exception e)
                    {
                        internalCharaDictionary.Set(file, new Dictionary<string, PluginData>());
                        Logger.LogWarning($"Invalid or corrupted extended data in card \"{file.charaFileName}\" - {e.Message}");
                    }

                    reader.BaseStream.Position = originalPosition;

                    CardReadEvent(file);
                }
                else
                {
                    internalCharaDictionary.Set(file, new Dictionary<string, PluginData>());
                }
            }

before:
image

after:
image

Memo2: ExtensibleSaveFormat.ExtendedSave/Hooks: I tried to reduce the GCAlloc of ChaFileLoadFileHook because it was large, but it actually got slower. I suppressed the generation of huge byte arrays using ReadBytes(), and changed MessagePackSerializer.Deserialize@byte[] to MessagePackSerializer.Deserialize@Stream. The execution time of ChaFileLoadFileHook alone was reduced from 500ms to 1.5ms. This was great, but the total execution time, including children, increased from 3s to 5s, and GCAlloc increased from 300MB to 3.6GB. It seems that deserializing from a stream is quite heavy. Before source code: ``` private static void ChaFileLoadFileHook(ChaFile file, BlockHeader header, BinaryReader reader) { var info = header.SearchInfo(Marker); if (LoadEventsEnabled && info != null && info.version == DataVersion.ToString()) { long originalPosition = reader.BaseStream.Position; long basePosition = originalPosition - header.lstInfo.Sum(x => x.size); reader.BaseStream.Position = basePosition + info.pos; byte[] data = reader.ReadBytes((int)info.size); reader.BaseStream.Position = originalPosition; cardReadEventCalled = true; try { var dictionary = MessagePackSerializer.Deserialize<Dictionary<string, PluginData>>(data); internalCharaDictionary.Set(file, dictionary); } catch (Exception e) { internalCharaDictionary.Set(file, new Dictionary<string, PluginData>()); Logger.LogWarning($"Invalid or corrupted extended data in card \"{file.charaFileName}\" - {e.Message}"); } CardReadEvent(file); } else { internalCharaDictionary.Set(file, new Dictionary<string, PluginData>()); } } ``` After source code: ``` private static void ChaFileLoadFileHook(ChaFile file, BlockHeader header, BinaryReader reader) { var info = header.SearchInfo(Marker); if (LoadEventsEnabled && info != null && info.version == DataVersion.ToString()) { long originalPosition = reader.BaseStream.Position; long basePosition = originalPosition - header.lstInfo.Sum(x => x.size); cardReadEventCalled = true; try { reader.BaseStream.Position = basePosition + info.pos; var dictionary = MessagePackSerializer.Deserialize<Dictionary<string, PluginData>>(reader.BaseStream); internalCharaDictionary.Set(file, dictionary); } catch (Exception e) { internalCharaDictionary.Set(file, new Dictionary<string, PluginData>()); Logger.LogWarning($"Invalid or corrupted extended data in card \"{file.charaFileName}\" - {e.Message}"); } reader.BaseStream.Position = originalPosition; CardReadEvent(file); } else { internalCharaDictionary.Set(file, new Dictionary<string, PluginData>()); } } ``` before: ![image](https://github.com/user-attachments/assets/d1ef8ba2-fcf4-4272-a047-8c87bdb3aa96) after: ![image](https://github.com/user-attachments/assets/e782fe16-ff7c-47fc-bb3c-7316a723d284)
takahiro0327 commented 2024-08-26 12:29:36 +00:00 (Migrated from github.com)

I'll try to see if I can group the calls to the same character in the same frame.
It probably won't be as sensitive as the coordination.

I don't know the extent of the impact, so I'll stop for now.

~~I'll try to see if I can group the calls to the same character in the same frame. It probably won't be as sensitive as the coordination.~~ I don't know the extent of the impact, so I'll stop for now.
ManlyMarco (Migrated from github.com) approved these changes 2024-08-26 14:51:56 +00:00
ManlyMarco (Migrated from github.com) left a comment

That RemoveRange trick is nice, I'll be using that myself.

That RemoveRange trick is nice, I'll be using that myself.
ManlyMarco commented 2024-08-26 14:54:39 +00:00 (Migrated from github.com)

The Deserialize@Stream change is surprising, I would have guessed it would be faster. There's probably some sort of side effect of reading that stream that causes issues down the line with how the stream is cached.

This might work differently between games, especially KK and KKS, because of differences in mono and framework versions.

The Deserialize@Stream change is surprising, I would have guessed it would be faster. There's probably some sort of side effect of reading that stream that causes issues down the line with how the stream is cached. This might work differently between games, especially KK and KKS, because of differences in mono and framework versions.
Sign in to join this conversation.
No description provided.