load initialize的本质

2020-05-07

学习MJ的视频课程，整理总结知识点–load initialize的本质

这一篇通过+load和+initialize的官方文档并结合Demo来介绍这两个方法。同时会依据runtime源码来介绍+load和+initialize底层的逻辑。

[TOC]

+load介绍

关于+load，我们知道+load方法会在runtime加载类、分类时调用，apple文档中关于load介绍如下

Summary

Invoked whenever a class or category is added to the Objective-C runtime; implement this method to perform class-specific behavior upon loading.

Declaration

+ (void)load;
Discussion

The load message is sent to classes and categories that are both dynamically loaded and statically linked, but only if the newly loaded class or category implements a method that can respond.
The order of initialization is as follows:
All initializers in any framework you link to.
All +load methods in your image.
All C++ static initializers and C/C++ __attribute__(constructor) functions in your image.
All initializers in frameworks that link to you.

In addition:
A class’s +load method is called after all of its superclasses’ +load methods.
A category +load method is called after the class’s own +load method.
In a custom implementation of load you can therefore safely message other unrelated classes from the same image, but any load methods implemented by those classes may not have run yet.

Important
Custom implementations of the load method for Swift classes bridged to Objective-C are not called automatically.

结合文档我们知道

+load方法会在runtime加载类、分类时调用
每个类、分类的+load，在程序运行过程中只调用一次
先调用类的+load，再调用分类的+load
调用子类的+load之前会先调用父类的+load

+load调用原理

我们知道+load是在runtime添加到运行时的时候调用，所以我们可以从runtime源码入手，查找并分析+load的调用原理

结合上一篇文章，我们知道runtime的初始化方法是_objc_init，这个函数内部有这个方法_dyld_objc_notify_register(&map_images, load_images, unmap_image);，其中，load_images
就是我们剖析+load机制的入口。我们看一下load_images函数的源码：


/***********************************************************************
* load_images
* Process +load in the given images which are being mapped in by dyld.
*
* Locking: write-locks runtimeLock and loadMethodLock
**********************************************************************/
void
load_images(const char *path __unused, const struct mach_header *mh)
{
    // Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        rwlock_writer_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}

这个函数内部主要分3步（判断、查找并添加、调用），我们对这3个步骤用通俗易懂的语言进行概述：

判断有没有+load方法，没有的话直接return；
有的话就添加到一个数据保存起来；
遍历这个数据，调用每个元素的+load方法。

下面进行每一步的介绍。

1. hasLoadMethods

// Quick scan for +load methods that doesn't take a lock.
bool hasLoadMethods(const headerType *mhdr)
{
    size_t count;
    if (_getObjc2NonlazyClassList(mhdr, &count)  &&  count > 0) return true;
    if (_getObjc2NonlazyCategoryList(mhdr, &count)  &&  count > 0) return true;
    return false;
}

这个函数概述为就是快速查找所有的class、category里面是否包含+load方法。
(注：实现了+load方法的class才属于NonlazyClass)

2. prepare_load_methods

判断完之后就是查找所有的+load方法，prepare_load_methods函数的主要代码如下：

void prepare_load_methods(const headerType *mhdr)
{
    size_t count, i;

    runtimeLock.assertWriting();
    
    // 遍历有+load的class，添加到'loadable_classes'中(有superclass的，先递归添加superclass)，把class标记为'RW_LOADED'
    classref_t *classlist = 
        _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        schedule_class_load(remapClass(classlist[i]));
    }

    // 遍历有+load的category，做一些逻辑判断，然后添加到'loadable_categories'
    category_t **categorylist = _getObjc2NonlazyCategoryList(mhdr, &count);
    for (i = 0; i < count; i++) {
        category_t *cat = categorylist[i];
        Class cls = remapClass(cat->cls);
        if (!cls) continue;  // category for ignored weak-linked class
        realizeClass(cls);
        assert(cls->ISA()->isRealized());
        add_category_to_loadable_list(cat);
    }
}

这个过程把包含有+load方法的class添加到数组loadable_classes中，把包含有+load方法的category添加到数组loadable_categories中。

prepare_load_methods过程中，内部有一个schedule_class_load方法需要讲解一下，它的实现如下：

/***********************************************************************
* prepare_load_methods
* Schedule +load for classes in this image, any un-+load-ed 
* superclasses in other images, and any categories in this image.
**********************************************************************/
// Recursively schedule +load for cls and any un-+load-ed superclasses.
// cls must already be connected.
static void schedule_class_load(Class cls)
{
    if (!cls) return;
    assert(cls->isRealized());  // _read_images should realize
    // 判断class结构体信息中是否加载的标记
    if (cls->data()->flags & RW_LOADED) return;

    // Ensure superclass-first ordering，通过递归的方式先取出superclass
    schedule_class_load(cls->superclass);
    // 添加到数组中保存起来
    add_class_to_loadable_list(cls);
    // cls标记为已加载
    cls->setInfo(RW_LOADED); 
}

这个函数内部有个递归调用，先去class的superclass添加到数组中，这就解释了子类+load方法调用时，会先调用父类的+load这一现象。

3. call_load_methods

添加到loadable_classes数组和数组loadable_categories后，就是遍历数组，调用数组元素的+load方法。函数主要内容如下：

/***********************************************************************
* call_load_methods
* Call all pending class and category +load methods.
* Class +load methods are called superclass-first. 
* Category +load methods are not called until after the parent class's +load.
**********************************************************************/
void call_load_methods(void)
{
    static bool loading = NO;
    bool more_categories;

    loadMethodLock.assertLocked();

    // Re-entrant calls do nothing; the outermost call will finish the job.
    if (loading) return;
    loading = YES;

    void *pool = objc_autoreleasePoolPush();

    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // 2. Call category +loads ONCE
        more_categories = call_category_loads();

        // 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}

观察call_load_methods函数的实现我们得知，内部会先调用class的+load方法，之后才会调用category的+load方法。

接下来看下runtime是怎么自动调用+load方法的，在call_class_loads函数内部我们可以找到答案，call_class_loads的概要源码如下：

static void call_class_loads(void)
{
    int i;
    
    // Detach current loadable list.
    struct loadable_class *classes = loadable_classes;
    int used = loadable_classes_used;
    
    for (i = 0; i < used; i++) {
          // 取出class
        Class cls = classes[i].cls;
        // 拿到保存的'+load'方法
        load_method_t load_method = (load_method_t)classes[i].method;
        if (!cls) continue; 
        
        // 直接手动调用'+load'方法
        (*load_method)(cls, SEL_load);
    }
    
    // Destroy the detached list.
    if (classes) free(classes);
}

所以我们可以得知，+load方法是默认通过函数地址直接调用，而不是通过OC通用的runtime消息机制

+load总结

+load方法会在runtime加载类、分类时调用
每个类、分类的+load，在程序运行过程中只调用一次
调用顺序
- 先调用类的+load
  - 按照编译先后顺序调用（先编译，先调用）
  - 调用子类的+load之前会先调用父类的+load
- 再调用分类的+load
  - 按照编译先后顺序调用（先编译，先调用）

+initialize介绍

关于+initialize，我们知道+initialize方法会在类第一次接收到消息时调用，apple文档中关于initialize介绍如下

Summary

Initializes the class before it receives its first message.

Declaration

+ (void)initialize;

Discussion

The runtime sends initialize to each class in a program just before the class, or any class that inherits from it, is sent its first message from within the program. Superclasses receive this message before their subclasses.
The runtime sends the initialize message to classes in a thread-safe manner. That is, initialize is run by the first thread to send a message to a class, and any other thread that tries to send a message to that class will block until initialize completes.
The superclass implementation may be called multiple times if subclasses do not implement initialize—the runtime will call the inherited implementation—or if subclasses explicitly call [super initialize]. If you want to protect yourself from being run multiple times, you can structure your implementation along these lines:
+ (void)initialize {
  if (self == [ClassName self]) {
    // ... do the initialization ...
  }
}
Because initialize is called in a blocking manner, it’s important to limit method implementations to the minimum amount of work necessary possible. Specifically, any code that takes locks that might be required by other classes in their initialize methods is liable to lead to deadlocks. Therefore, you should not rely on initialize for complex initialization, and should instead limit it to straightforward, class local initialization.

结合文档和之前的知识我们知道

+initialize方法会在类第一次接收到消息时调用
先调用父类的+initialize，再调用子类的+initialize
(先初始化父类，再初始化子类，每个类只会初始化1次)
+initialize方法是线程安全的，要注意防止+initialize死锁，不要再+initialize内做太复杂的事情

+initialize原理

我们知道了+initialize的特性，接下来我们通过runtime源码分析+initialize的原理。我们已经知道+initialize是第一次接收到消息时调用，我们以此为切入点进行分析。

第一次接收到消息，意味着第一次objc_msgSend(cls, @selector(xxx))，怎么知道是第一次呢，猜测应该是在objc_msgSend前先进行判断有没有initialize标记，没有的话先调用+initialize，然后再继续原有流程。

还有一个查找源码的切入点是+initialize调用之后的标记值，因为我们已经知道一个class只会initialize一次，+load也是只会调用一次。同时在前面学习+load调用流程时从源码中知道，load完成时会有一步做标记cls->setInfo(RW_LOADED);，我们猜测：+initialize也会有类似的标记，仿照格式应该为RW_INITIALIZED。我们去runtime源码中搜索，果真找到这个RW_INITIALIZED，它对应的函数如下：

1
2
3

bool isInitialized() {
    return getMeta()->data()->flags & RW_INITIALIZED;
}

不过我们的目的是分析什么时候调用+initialize，已经知道了这个initialize前的判断方法isInitialized()，我们就可以顺瓜摸藤，通过在objc源码中搜索isInitialized()这个方法，同时知道这个方法应该在objc_msgSend前调用，所以我们可以进一步缩小搜索范围。

通过分析搜索结果，我们几乎可以确定唯一与结果相关的就是lookUpImpOrForward函数，该函数概要内容如下：

/***********************************************************************
* lookUpImpOrForward.
* The standard IMP lookup. 
* initialize==NO tries to avoid +initialize (but sometimes fails)
**********************************************************************/
IMP lookUpImpOrForward(Class cls, SEL sel, id inst, 
                       bool initialize, bool cache, bool resolver)
{
    if (initialize  &&  !cls->isInitialized()) {
        runtimeLock.unlockRead();
        _class_initialize (_class_getNonMetaClass(cls, inst));
        runtimeLock.read();
        // If sel == initialize, _class_initialize will send +initialize and 
        // then the messenger will send +initialize again after this 
        // procedure finishes. Of course, if this is not being called 
        // from the messenger then it won't happen. 2778172
    }
}

这个函数内部有个逻辑，就是判断有没有initialized，没有的话，就调用_class_initialize方法进行初始化。这样我们就从晦涩的源码中找到我们的线索，我们就通过这条线索来分析+initialize方法的底层原理。

我们通过搜索lookUpImpOrForward函数，发现是class_getClassMethod调用来lookUpImpOrForward的。我们正向分析一下。如果要对class发消息，肯定要判断消息对应的Method是否存在，这就需要class_getClassMethod这个函数来实现，正向推理也合理。

接下来我们继续分析_class_initialize的实现，探究initialize过程做了哪些事情。

/***********************************************************************
* class_initialize.  Send the '+initialize' message on demand to any
* uninitialized class. Force initialization of superclasses first.
**********************************************************************/
void _class_initialize(Class cls)
{
    assert(!cls->isMetaClass());

    Class supercls;
    bool reallyInitialize = NO;

    // Make sure super is done initializing BEFORE beginning to initialize cls.
    // See note about deadlock above.
    // 先判断supercls是否初始化，没有的话通过递归先初始化supercls
    supercls = cls->superclass;
    if (supercls  &&  !supercls->isInitialized()) {
        _class_initialize(supercls);
    }
    
    // Try to atomically set CLS_INITIALIZING.
    {
        monitor_locker_t lock(classInitLock);
        if (!cls->isInitialized() && !cls->isInitializing()) {
            cls->setInitializing();
            reallyInitialize = YES;
        }
    }
    
    if (reallyInitialize) {
        // We successfully set the CLS_INITIALIZING bit. Initialize the class.
        
        // Record that we're initializing this class so we can message it.
        _setThisThreadIsInitializingClass(cls);

        if (MultithreadedForkChild) {
            // LOL JK we don't really call +initialize methods after fork().
            performForkChildInitialize(cls, supercls);
            return;
        }
        
        // Send the +initialize message.
        // Note that +initialize is sent to the superclass (again) if 
        // this class doesn't implement +initialize. 2157218
        
        // Exceptions: A +initialize call that throws an exception 
        // is deemed to be a complete and successful +initialize.
        //
        // Only __OBJC2__ adds these handlers. !__OBJC2__ has a
        // bootstrapping problem of this versus CF's call to
        // objc_exception_set_functions().
#if __OBJC2__
        @try
#endif
        {
            // 真正初始化的逻辑
            callInitialize(cls);

            if (PrintInitializing) {
                _objc_inform("INITIALIZE: thread %p: finished +[%s initialize]",
                             pthread_self(), cls->nameForLogging());
            }
        }
#if __OBJC2__
        @catch (...) {
            if (PrintInitializing) {
                _objc_inform("INITIALIZE: thread %p: +[%s initialize] "
                             "threw an exception",
                             pthread_self(), cls->nameForLogging());
            }
            @throw;
        }
        @finally
#endif
        {
            // Done initializing.
            lockAndFinishInitializing(cls, supercls);
        }
        return;
    }
}

结合源码我们知道，+initialize时，先调用父类的+initialize，再调用子类的+initialize

我们看下真正执行+initialize的代码callInitialize，源码如下：

void callInitialize(Class cls)
{
    ((void(*)(Class, SEL))objc_msgSend)(cls, SEL_initialize);
    asm("");
}

callInitialize函数很简单，看源码我们得知，+initialize是通过runtime的消息机制调用的(而+load是加载时通过函数地址直接调用)

+initialize总结

结合源码分析，我们可以分析apple是如何实现+initialize方法会在类第一次接收到消息时调用(Initializes the class before it receives its first message.)

进行objc_msgSend前先进行方法查找(通过class_getClassMethod)
class_getClassMethod内部调用class_getInstanceMethod，利用元类进行查找
class_getInstanceMethod内部通过lookUpImpOrNil判断函数是否存在
lookUpImpOrNil查找函数的实现或者进行消息转发lookUpImpOrForward
lookUpImpOrForward判断class是否初始化，没有的话进行初始化_class_initialize
_class_initialize内部先初始化superclass，之后调用初始化方法callInitialize
callInitialize内部就是((void(*)(Class, SEL))objc_msgSend)(cls, SEL_initialize);，调用class的initialize方法。

+initialize整个流程完毕，接下来我们再对+initialize做一个总结

+initialize方法会在类第一次接收到消息时调用
调用顺序
- 先调用父类的+initialize，再调用子类的+initialize
- (先初始化父类，再初始化子类，每个类只会初始化1次)
虽有初始化只会一次，但是当子类没有实现+initialize时，父类的+initialize会被调用多次，但是父类仍然只会初始化1次，被调用多次是因为继承的原因(子类没有回去父类查找)

总结

综合全文，对+load和+initialize总结

load、initialize方法的区别什么？

调用方式
1> load是根据函数地址直接调用
2> initialize是通过objc_msgSend调用
调用时机
1> load是runtime加载类、分类的时候调用（只会调用1次）
2> initialize是类第一次接收到消息的时候调用，每一个类只会initialize一次（父类的initialize方法可能会被调用多次）

load、initialize的调用顺序？
1.load
1> 先调用类的load
a) 先编译的类，优先调用load
b) 调用子类的load之前，会先调用父类的load

2> 再调用分类的load
a) 先编译的分类，优先调用load

2.initialize
1> 先初始化父类
2> 再初始化子类（可能最终调用的是父类的initialize方法）